Scrape Netflix Catalog: Titles, Genres & Regional Availability (2026)

2026-04-09 [python netflix scraping streaming api]

Scrape Netflix Catalog: Titles, Genres & Regional Availability (2026)

Netflix doesn't offer a public API for catalog data anymore — they killed the public API back in 2014. But the data is still there if you know where to look. Third-party databases track catalog availability, and the Netflix web app itself loads data through internal endpoints you can intercept.

This guide covers three approaches: scraping UNOGS (the most complete third-party catalog database), intercepting Netflix's internal Shakti API, parsing the publicly available sitemap data, and enriching everything with TMDb for full metadata. Includes working Python code, SQLite storage, and error handling.

Why Netflix Catalog Data Is Useful

Netflix's catalog varies dramatically by country — sometimes 40-60% of titles available in the US aren't available in other regions. This regional variation creates data value for:

Content analytics — which titles Netflix has licensed globally vs. regionally
Streaming service comparison — building comparison tools across Netflix, Disney+, Prime, etc.
Recommendation systems — building content discovery tools that surface what's available in a user's country
Licensing pattern research — tracking which studios are making exclusive deals with which platforms
Travel content planning — finding what will be available when traveling to specific countries

The challenge: Netflix has no interest in making this easy. Their catalog data is commercially sensitive, and they actively rate-limit and block automated access.

Approach 1: UNOGS API

UNOGS (Unofficial Netflix Online Global Search) tracks Netflix catalogs across 50+ countries and updates daily. They offer an API through RapidAPI that gives you structured catalog data without directly scraping Netflix.

# unogs_client.py
import httpx
import time
import json

RAPIDAPI_KEY = "your_rapidapi_key_from_rapidapi.com"
UNOGS_HOST = "unogsng.p.rapidapi.com"

def make_unogs_client() -> httpx.Client:
    """Create authenticated UNOGS API client."""
    return httpx.Client(
        headers={
            "X-RapidAPI-Key": RAPIDAPI_KEY,
            "X-RapidAPI-Host": UNOGS_HOST,
        },
        timeout=30,
    )


# Country IDs for common Netflix regions
COUNTRY_IDS = {
    "us": 78,
    "uk": 46,
    "de": 39,
    "fr": 33,
    "jp": 267,
    "ca": 33,
    "au": 23,
    "br": 29,
    "in": 246,
    "mx": 484,
    "pl": 391,
}

def search_netflix(
    query: str = None,
    country_id: int = 78,
    offset: int = 0,
    limit: int = 100,
    order_by: str = "date",
    vtype: str = None,  # "movie" or "series"
) -> tuple:
    """
    Search Netflix catalog via UNOGS API.
    Returns (results_list, total_count).
    """
    client = make_unogs_client()
    url = "https://unogsng.p.rapidapi.com/search"
    params = {
        "country_list": str(country_id),
        "offset": str(offset),
        "limit": str(limit),
        "orderby": order_by,
    }
    if query:
        params["query"] = query
    if vtype:
        params["type"] = vtype

    response = client.get(url, params=params)
    response.raise_for_status()
    data = response.json()

    results = []
    for item in data.get("results", []):
        results.append({
            "netflix_id": item.get("nfid"),
            "title": item.get("title"),
            "year": item.get("year"),
            "type": item.get("vtype"),  # movie or show
            "imdb_id": item.get("imdbid"),
            "imdb_rating": item.get("imdbrating"),
            "synopsis": item.get("synopsis"),
            "image_url": item.get("img"),
            "titledate": item.get("titledate"),
        })

    return results, data.get("total", 0)


def get_all_titles_for_country(
    country_id: int = 78,
    vtype: str = None,
    max_results: int = 5000,
) -> list:
    """
    Get the complete catalog for a country in a single paginated fetch.
    This can take several minutes for large catalogs.
    """
    all_results = []
    offset = 0
    limit = 100

    print(f"Fetching catalog for country_id={country_id}...")

    while len(all_results) < max_results:
        results, total = search_netflix(
            country_id=country_id,
            offset=offset,
            limit=limit,
            vtype=vtype,
        )

        if not results:
            break

        all_results.extend(results)
        print(f"  Fetched {len(all_results)}/{total} titles...")

        if len(all_results) >= total:
            break

        offset += limit
        time.sleep(0.5)  # UNOGS rate limit is generous but not unlimited

    print(f"Total fetched: {len(all_results)}")
    return all_results

Getting Regional Availability

The real value in Netflix data is knowing which titles are available where:

def get_title_countries(netflix_id: int) -> list:
    """Get all countries where a Netflix title is available."""
    client = make_unogs_client()
    url = "https://unogsng.p.rapidapi.com/title"
    params = {"netflixid": str(netflix_id)}

    response = client.get(url, params=params)
    response.raise_for_status()
    data = response.json()

    results_list = data.get("results", [])
    if not results_list:
        return []

    countries = []
    for country in results_list[0].get("country_availability", []):
        countries.append({
            "country": country.get("country"),
            "country_code": country.get("cc"),
            "audio_languages": country.get("audio"),
            "subtitle_languages": country.get("subtitle"),
            "available_since": country.get("new_date"),
            "expiring_date": country.get("expire_date"),
        })

    return countries


def find_exclusive_titles(
    country_a_id: int,
    country_b_id: int,
    limit: int = 100,
) -> dict:
    """
    Find titles exclusive to one country vs another.
    Returns sets of titles available in A but not B, and vice versa.
    """
    titles_a, _ = search_netflix(country_id=country_a_id, limit=limit)
    titles_b, _ = search_netflix(country_id=country_b_id, limit=limit)

    ids_a = set(t["netflix_id"] for t in titles_a)
    ids_b = set(t["netflix_id"] for t in titles_b)

    titles_a_dict = {t["netflix_id"]: t for t in titles_a}
    titles_b_dict = {t["netflix_id"]: t for t in titles_b}

    exclusive_to_a = [titles_a_dict[nid] for nid in ids_a - ids_b]
    exclusive_to_b = [titles_b_dict[nid] for nid in ids_b - ids_a]

    return {
        "exclusive_to_a": exclusive_to_a,
        "exclusive_to_b": exclusive_to_b,
        "in_both": len(ids_a & ids_b),
    }


def get_genre_titles(genre_id: int, country_id: int = 78, limit: int = 100) -> list:
    """Get all titles in a specific Netflix genre for a country."""
    client = make_unogs_client()
    url = "https://unogsng.p.rapidapi.com/search"
    params = {
        "genrelist": str(genre_id),
        "country_list": str(country_id),
        "limit": str(limit),
        "orderby": "rating",
    }

    response = client.get(url, params=params)
    response.raise_for_status()
    return response.json().get("results", [])


# Popular Netflix genre IDs
NETFLIX_GENRES = {
    "action": 1365,
    "anime": 7424,
    "comedies": 6548,
    "documentaries": 6839,
    "horror": 8711,
    "sci_fi": 108533,
    "thrillers": 8933,
    "true_crime": 81237,
    "drama": 5763,
    "romance": 8883,
    "kids": 6796,
    "stand_up": 11559,
}

Approach 2: Netflix Sitemap Data

Netflix publishes XML sitemaps that list every title page. This won't give you metadata, but it gives you a complete list of Netflix IDs for the current catalog — useful for knowing what exists before deciding what to fetch:

# netflix_sitemap.py
import httpx
import re
import xml.etree.ElementTree as ET

def get_netflix_sitemap_titles() -> list:
    """Extract Netflix title IDs from their public sitemap."""
    sitemap_url = "https://www.netflix.com/sitemap/title.xml"

    try:
        response = httpx.get(
            sitemap_url,
            timeout=30,
            headers={
                "User-Agent": "Mozilla/5.0 (compatible; SitemapBot/1.0)",
                "Accept": "application/xml, text/xml, */*",
            },
            follow_redirects=True,
        )
        response.raise_for_status()
    except httpx.HTTPError as e:
        print(f"Failed to fetch sitemap: {e}")
        return []

    try:
        root = ET.fromstring(response.text)
    except ET.ParseError as e:
        print(f"Failed to parse sitemap XML: {e}")
        return []

    ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
    titles = []

    for url_el in root.findall(".//sm:url", ns):
        loc = url_el.find("sm:loc", ns)
        lastmod = url_el.find("sm:lastmod", ns)

        if loc is not None and loc.text:
            url = loc.text
            match = re.search(r"/title/(\d+)", url)
            if match:
                titles.append({
                    "netflix_id": int(match.group(1)),
                    "url": url,
                    "lastmod": lastmod.text if lastmod is not None else None,
                })

    print(f"Found {len(titles)} titles in Netflix sitemap")
    return titles

Approach 3: Intercepting Netflix's Shakti API

If you need data that UNOGS doesn't cover — like detailed cast info, episode-level metadata, or recommendation tags — you need to intercept Netflix's internal Shakti API. This is harder because Netflix uses heavy anti-bot protections, but the data is much richer.

# netflix_shakti.py
from playwright.sync_api import sync_playwright
import json
import time

def scrape_netflix_title(
    netflix_id: int,
    proxy_url: str = None,
) -> dict:
    """
    Scrape title details from Netflix by intercepting the Shakti API.
    Requires a valid Netflix account for full metadata.
    For catalog enumeration without account, use the sitemap + UNOGS approach.
    """
    intercepted_data = {}

    def handle_response(response):
        url = response.url
        # Shakti API paths
        if any(p in url for p in ["/pathEvaluator", "shakti", "/metadata", "/browse"]):
            try:
                if "json" in response.headers.get("content-type", ""):
                    data = response.json()
                    intercepted_data[url.split("?")[0]] = data
            except Exception:
                pass

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--no-first-run",
                "--disable-dev-shm-usage",
            ],
        )

        context_kwargs = {
            "viewport": {"width": 1920, "height": 1080},
            "user_agent": (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/126.0.0.0 Safari/537.36"
            ),
            "locale": "en-US",
            "timezone_id": "America/New_York",
        }

        if proxy_url:
            context_kwargs["proxy"] = {"server": proxy_url}

        context = browser.new_context(**context_kwargs)

        # Remove automation flags
        context.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
            window.chrome = { runtime: {} };
        """)

        page = context.new_page()
        page.on("response", handle_response)

        # Netflix title URL
        url = f"https://www.netflix.com/title/{netflix_id}"
        page.goto(url, wait_until="networkidle", timeout=30000)
        time.sleep(3)

        # Extract visible DOM metadata
        title_data = {}

        # Title
        title_el = page.query_selector("[data-uia='hero-title-text'], .title-title, h1.title-title")
        if title_el:
            title_data["title"] = title_el.inner_text().strip()

        # Synopsis
        synopsis_el = page.query_selector("[data-uia='hero-synopsis'], .title-info-synopsis")
        if synopsis_el:
            title_data["synopsis"] = synopsis_el.inner_text().strip()

        # Metadata items (year, rating, duration, etc.)
        meta_items = page.query_selector_all("[data-uia='hero-metadata-item'], .maturity-number, .duration")
        title_data["metadata"] = [m.inner_text().strip() for m in meta_items if m.inner_text().strip()]

        # Genre tags
        genre_tags = page.query_selector_all(".genreTag, [data-uia='moreLikeThis-tag']")
        title_data["genres"] = [t.inner_text().strip() for t in genre_tags]

        # Cast from "More Info" section
        cast_el = page.query_selector("[data-uia='cast-member-list'], .cast-container")
        if cast_el:
            cast_text = cast_el.inner_text()
            title_data["cast_text"] = cast_text.strip()

        # Merge with intercepted API data
        title_data["netflix_id"] = netflix_id
        title_data["shakti_data"] = intercepted_data

        browser.close()
        return title_data

Proxy Setup for Netflix

Netflix is aggressive about blocking datacenter IPs. If you're not using a residential IP, you'll hit the login wall or get empty responses.

For regional catalog checking, residential proxies from the target country let you see exactly what's in that region's catalog. ThorData's residential proxy network supports country-specific routing:

def get_country_proxy(country_code: str) -> dict:
    """
    Get ThorData proxy configuration for a specific country.
    Used to check region-specific Netflix catalogs.
    """
    return {
        "server": "http://proxy.thordata.com:9000",
        "username": f"user-country-{country_code.lower()}",
        "password": "YOUR_THORDATA_PASSWORD",
    }


# Check US Netflix catalog vs UK
for country in ["us", "uk", "jp", "de"]:
    proxy_config = get_country_proxy(country)
    # Use with Playwright context for region-specific catalog data
    print(f"Proxy for {country.upper()}: {proxy_config['server']}")

Enriching with TMDb

Once you have Netflix IDs, combine with TMDb (The Movie Database) for rich metadata — cast, crew, genres, trailers, and more:

# tmdb_enricher.py
import httpx
import time

TMDB_API_KEY = "your_tmdb_api_key"  # Free registration at themoviedb.org
TMDB_BASE = "https://api.themoviedb.org/3"

tmdb_client = httpx.Client(timeout=15)


def get_tmdb_by_title(title: str, year: int = None, media_type: str = None) -> dict:
    """Search TMDb for a title and return structured metadata."""
    search_url = f"{TMDB_BASE}/search/multi"
    params = {
        "api_key": TMDB_API_KEY,
        "query": title,
        "language": "en-US",
    }
    if year:
        params["year"] = year

    response = tmdb_client.get(search_url, params=params)
    response.raise_for_status()

    results = response.json().get("results", [])
    if not results:
        return {}

    # Filter by media_type if specified
    if media_type:
        filtered = [r for r in results if r.get("media_type") == media_type]
        if filtered:
            results = filtered

    top = results[0]
    tmdb_id = top["id"]
    mt = top.get("media_type", "movie")

    return get_tmdb_details(tmdb_id, media_type=mt)


def get_tmdb_details(tmdb_id: int, media_type: str = "movie") -> dict:
    """Fetch detailed TMDb info including credits."""
    detail_url = f"{TMDB_BASE}/{media_type}/{tmdb_id}"
    detail_params = {
        "api_key": TMDB_API_KEY,
        "append_to_response": "credits,keywords",
        "language": "en-US",
    }

    try:
        detail = tmdb_client.get(detail_url, params=detail_params).json()
    except Exception:
        return {}

    credits = detail.get("credits", {})
    keywords = detail.get("keywords", {})
    kw_list = keywords.get("keywords", keywords.get("results", []))

    return {
        "tmdb_id": tmdb_id,
        "media_type": media_type,
        "title": detail.get("title") or detail.get("name"),
        "original_title": detail.get("original_title") or detail.get("original_name"),
        "overview": detail.get("overview"),
        "tagline": detail.get("tagline"),
        "genres": [g["name"] for g in detail.get("genres", [])],
        "keywords": [k["name"] for k in kw_list[:20]],
        "cast": [
            {"name": c["name"], "character": c.get("character", ""), "order": c.get("order", 0)}
            for c in credits.get("cast", [])[:15]
        ],
        "directors": [
            c["name"]
            for c in credits.get("crew", [])
            if c.get("job") == "Director"
        ],
        "creators": [
            c["name"]
            for c in credits.get("crew", [])
            if c.get("job") in ("Creator", "Writer")
        ],
        "tmdb_rating": detail.get("vote_average"),
        "tmdb_vote_count": detail.get("vote_count"),
        "popularity": detail.get("popularity"),
        "release_date": detail.get("release_date") or detail.get("first_air_date"),
        "runtime": detail.get("runtime"),
        "number_of_seasons": detail.get("number_of_seasons"),
        "status": detail.get("status"),
        "original_language": detail.get("original_language"),
        "production_countries": [c["name"] for c in detail.get("production_countries", [])],
    }

SQLite Storage Schema

import sqlite3
import json

def init_netflix_db(db_path: str = "netflix_catalog.db") -> sqlite3.Connection:
    """Initialize database for Netflix catalog data."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS titles (
            netflix_id      INTEGER PRIMARY KEY,
            title           TEXT,
            original_title  TEXT,
            year            INTEGER,
            type            TEXT,
            imdb_id         TEXT,
            imdb_rating     REAL,
            tmdb_id         INTEGER,
            tmdb_rating     REAL,
            genres          TEXT,
            cast_data       TEXT,
            directors       TEXT,
            keywords        TEXT,
            overview        TEXT,
            runtime         INTEGER,
            number_seasons  INTEGER,
            original_language TEXT,
            added_at        TEXT DEFAULT CURRENT_TIMESTAMP,
            last_updated    TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS country_availability (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            netflix_id      INTEGER NOT NULL,
            country_code    TEXT,
            country_name    TEXT,
            available_since TEXT,
            expiring_date   TEXT,
            audio_languages TEXT,
            subtitle_languages TEXT,
            checked_at      TEXT DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (netflix_id) REFERENCES titles(netflix_id)
        );

        CREATE TABLE IF NOT EXISTS catalog_snapshots (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            country_code    TEXT,
            netflix_id      INTEGER,
            title           TEXT,
            type            TEXT,
            snapshot_date   TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE INDEX IF NOT EXISTS idx_titles_type ON titles (type);
        CREATE INDEX IF NOT EXISTS idx_availability_netflix ON country_availability (netflix_id);
        CREATE INDEX IF NOT EXISTS idx_availability_country ON country_availability (country_code);
        CREATE INDEX IF NOT EXISTS idx_snapshots_country ON catalog_snapshots (country_code);
    """)
    conn.commit()
    return conn


def save_title(conn: sqlite3.Connection, title: dict, tmdb_data: dict = None):
    """Save a Netflix title with optional TMDb enrichment."""
    merged = {**title}
    if tmdb_data:
        merged.update({
            "tmdb_id": tmdb_data.get("tmdb_id"),
            "tmdb_rating": tmdb_data.get("tmdb_rating"),
            "genres": json.dumps(tmdb_data.get("genres", [])),
            "cast_data": json.dumps(tmdb_data.get("cast", [])),
            "directors": json.dumps(tmdb_data.get("directors", [])),
            "keywords": json.dumps(tmdb_data.get("keywords", [])),
            "overview": tmdb_data.get("overview"),
            "runtime": tmdb_data.get("runtime"),
            "number_seasons": tmdb_data.get("number_of_seasons"),
            "original_language": tmdb_data.get("original_language"),
        })

    conn.execute(
        """
        INSERT OR REPLACE INTO titles
            (netflix_id, title, year, type, imdb_id, imdb_rating,
             tmdb_id, tmdb_rating, genres, cast_data, directors, keywords,
             overview, runtime, number_seasons, original_language)
        VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
        """,
        (
            merged.get("netflix_id"),
            merged.get("title"),
            merged.get("year"),
            merged.get("type"),
            merged.get("imdb_id"),
            merged.get("imdb_rating"),
            merged.get("tmdb_id"),
            merged.get("tmdb_rating"),
            merged.get("genres"),
            merged.get("cast_data"),
            merged.get("directors"),
            merged.get("keywords"),
            merged.get("overview"),
            merged.get("runtime"),
            merged.get("number_seasons"),
            merged.get("original_language"),
        ),
    )
    conn.commit()


def save_country_availability(conn: sqlite3.Connection, netflix_id: int, countries: list):
    """Save country availability data for a title."""
    # Delete old availability data first (it changes over time)
    conn.execute(
        "DELETE FROM country_availability WHERE netflix_id=?",
        (netflix_id,)
    )
    for c in countries:
        conn.execute(
            """INSERT INTO country_availability
               (netflix_id, country_code, country_name, available_since, expiring_date,
                audio_languages, subtitle_languages)
               VALUES (?,?,?,?,?,?,?)""",
            (
                netflix_id,
                c.get("country_code"),
                c.get("country"),
                c.get("available_since"),
                c.get("expiring_date"),
                c.get("audio_languages"),
                c.get("subtitle_languages"),
            ),
        )
    conn.commit()

Complete Catalog Pipeline

def build_catalog_database(
    countries: list = None,
    db_path: str = "netflix_catalog.db",
    enrich_with_tmdb: bool = True,
    max_per_country: int = 1000,
):
    """
    Build a comprehensive Netflix catalog database.
    countries: list of country IDs (defaults to major Netflix regions)
    """
    if countries is None:
        countries = [78, 46, 39, 267, 23]  # US, UK, DE, JP, AU

    conn = init_netflix_db(db_path)
    seen_ids = set()

    for country_id in countries:
        country_code = {v: k for k, v in COUNTRY_IDS.items()}.get(country_id, str(country_id))
        print(f"\nFetching catalog for {country_code.upper()} (id={country_id})...")

        titles, total = search_netflix(country_id=country_id, limit=100)
        print(f"  Total available: {total}")

        for i, title in enumerate(titles[:max_per_country]):
            netflix_id = title.get("netflix_id")
            if not netflix_id:
                continue

            # Save snapshot record
            conn.execute(
                "INSERT INTO catalog_snapshots (country_code, netflix_id, title, type) "
                "VALUES (?, ?, ?, ?)",
                (country_code, netflix_id, title.get("title"), title.get("type"))
            )

            # Only enrich each unique title once
            if netflix_id not in seen_ids:
                seen_ids.add(netflix_id)

                tmdb_data = None
                if enrich_with_tmdb and title.get("title"):
                    try:
                        tmdb_data = get_tmdb_by_title(
                            title["title"],
                            year=title.get("year"),
                            media_type="movie" if title.get("type") == "movie" else "tv",
                        )
                        time.sleep(0.25)  # TMDb rate limit: 40 req/10 sec
                    except Exception as e:
                        print(f"  TMDb error for {title['title']}: {e}")

                save_title(conn, title, tmdb_data)

                if i % 50 == 0:
                    print(f"  Processed {i+1}/{min(len(titles), max_per_country)} titles...")

        conn.commit()

    total = conn.execute("SELECT COUNT(*) FROM titles").fetchone()[0]
    print(f"\nCatalog database complete. {total:,} unique titles.")
    conn.close()


# Analytical queries
def most_available_titles(conn: sqlite3.Connection, min_countries: int = 10) -> list:
    """Find titles available in the most countries."""
    rows = conn.execute(
        """
        SELECT t.title, t.year, t.type, COUNT(ca.country_code) as country_count
        FROM titles t
        JOIN country_availability ca ON ca.netflix_id = t.netflix_id
        GROUP BY t.netflix_id
        HAVING country_count >= ?
        ORDER BY country_count DESC
        LIMIT 20
        """,
        (min_countries,)
    ).fetchall()
    return [{"title": r[0], "year": r[1], "type": r[2], "countries": r[3]} for r in rows]


def expiring_soon(conn: sqlite3.Connection, country_code: str, days: int = 30) -> list:
    """Find titles expiring soon in a country."""
    import datetime
    cutoff = (datetime.date.today() + datetime.timedelta(days=days)).isoformat()
    rows = conn.execute(
        """
        SELECT t.title, t.year, t.type, ca.expiring_date
        FROM titles t
        JOIN country_availability ca ON ca.netflix_id = t.netflix_id
        WHERE ca.country_code = ?
          AND ca.expiring_date IS NOT NULL
          AND ca.expiring_date <= ?
        ORDER BY ca.expiring_date ASC
        LIMIT 50
        """,
        (country_code, cutoff)
    ).fetchall()
    return [{"title": r[0], "year": r[1], "type": r[2], "expires": r[3]} for r in rows]

Practical Tips

UNOGS is the right first choice. For most catalog data needs, UNOGS covers the hard parts — regional availability tracking across 50+ countries, genre categorization, and daily updates. Direct Netflix scraping is only necessary when you need cast details, episode metadata, or data that UNOGS doesn't expose.

Cache regional availability separately. Catalog availability changes frequently (Netflix acquires and loses licenses monthly). Store availability with a checked_at timestamp and refresh only the titles that are oldest in your database.

TMDb enrichment is free and powerful. The TMDb API has generous rate limits (40 requests per 10 seconds) and a free tier that covers most use cases. Combining Netflix IDs with TMDb metadata gives you cast, crew, genres, keywords, trailers, and ratings without any scraping.

Country-specific proxies for direct Netflix access. If you need to verify what's actually available in a region, use residential proxies from that country. ThorData supports per-country routing, which lets you fetch the same URL through different country IP pools and compare results.

Sitemaps for ID discovery. Netflix's sitemap is the most reliable way to discover new Netflix IDs without scraping their catalog pages. Run it weekly and diff against your database to find new additions.

Legal Notes

Netflix's Terms of Service prohibit scraping. Section 4.2 explicitly prohibits automated access and circumventing technical protection measures. Netflix has pursued legal action against scrapers operating at commercial scale.

For production applications, the appropriate approaches are:

JustWatch API — has licensing arrangements with Netflix for catalog data and offers a legitimate API
UNOGS — aggregates Netflix data through their own processes; using their RapidAPI is accessing the data through a legitimate provider
TMDb — fully licensed and free; doesn't have catalog availability data but has all the metadata
Netflix Partner APIs — available to licensed technology partners, app developers, and content creators through their official program

The code in this post is documented for educational purposes. Understand the legal implications before using direct Netflix scraping in any commercial context.