Scrape Netflix Catalog: Titles, Genres & Regional Availability (2026)
Scrape Netflix Catalog: Titles, Genres & Regional Availability (2026)
Netflix doesn't offer a public API for catalog data anymore — they killed the public API back in 2014. But the data is still there if you know where to look. Third-party databases track catalog availability, and the Netflix web app itself loads data through internal endpoints you can intercept.
This guide covers three approaches: scraping UNOGS (the most complete third-party catalog database), intercepting Netflix's internal Shakti API, parsing the publicly available sitemap data, and enriching everything with TMDb for full metadata. Includes working Python code, SQLite storage, and error handling.
Why Netflix Catalog Data Is Useful
Netflix's catalog varies dramatically by country — sometimes 40-60% of titles available in the US aren't available in other regions. This regional variation creates data value for:
- Content analytics — which titles Netflix has licensed globally vs. regionally
- Streaming service comparison — building comparison tools across Netflix, Disney+, Prime, etc.
- Recommendation systems — building content discovery tools that surface what's available in a user's country
- Licensing pattern research — tracking which studios are making exclusive deals with which platforms
- Travel content planning — finding what will be available when traveling to specific countries
The challenge: Netflix has no interest in making this easy. Their catalog data is commercially sensitive, and they actively rate-limit and block automated access.
Approach 1: UNOGS API
UNOGS (Unofficial Netflix Online Global Search) tracks Netflix catalogs across 50+ countries and updates daily. They offer an API through RapidAPI that gives you structured catalog data without directly scraping Netflix.
# unogs_client.py
import httpx
import time
import json
RAPIDAPI_KEY = "your_rapidapi_key_from_rapidapi.com"
UNOGS_HOST = "unogsng.p.rapidapi.com"
def make_unogs_client() -> httpx.Client:
"""Create authenticated UNOGS API client."""
return httpx.Client(
headers={
"X-RapidAPI-Key": RAPIDAPI_KEY,
"X-RapidAPI-Host": UNOGS_HOST,
},
timeout=30,
)
# Country IDs for common Netflix regions
COUNTRY_IDS = {
"us": 78,
"uk": 46,
"de": 39,
"fr": 33,
"jp": 267,
"ca": 33,
"au": 23,
"br": 29,
"in": 246,
"mx": 484,
"pl": 391,
}
def search_netflix(
query: str = None,
country_id: int = 78,
offset: int = 0,
limit: int = 100,
order_by: str = "date",
vtype: str = None, # "movie" or "series"
) -> tuple:
"""
Search Netflix catalog via UNOGS API.
Returns (results_list, total_count).
"""
client = make_unogs_client()
url = "https://unogsng.p.rapidapi.com/search"
params = {
"country_list": str(country_id),
"offset": str(offset),
"limit": str(limit),
"orderby": order_by,
}
if query:
params["query"] = query
if vtype:
params["type"] = vtype
response = client.get(url, params=params)
response.raise_for_status()
data = response.json()
results = []
for item in data.get("results", []):
results.append({
"netflix_id": item.get("nfid"),
"title": item.get("title"),
"year": item.get("year"),
"type": item.get("vtype"), # movie or show
"imdb_id": item.get("imdbid"),
"imdb_rating": item.get("imdbrating"),
"synopsis": item.get("synopsis"),
"image_url": item.get("img"),
"titledate": item.get("titledate"),
})
return results, data.get("total", 0)
def get_all_titles_for_country(
country_id: int = 78,
vtype: str = None,
max_results: int = 5000,
) -> list:
"""
Get the complete catalog for a country in a single paginated fetch.
This can take several minutes for large catalogs.
"""
all_results = []
offset = 0
limit = 100
print(f"Fetching catalog for country_id={country_id}...")
while len(all_results) < max_results:
results, total = search_netflix(
country_id=country_id,
offset=offset,
limit=limit,
vtype=vtype,
)
if not results:
break
all_results.extend(results)
print(f" Fetched {len(all_results)}/{total} titles...")
if len(all_results) >= total:
break
offset += limit
time.sleep(0.5) # UNOGS rate limit is generous but not unlimited
print(f"Total fetched: {len(all_results)}")
return all_results
Getting Regional Availability
The real value in Netflix data is knowing which titles are available where:
def get_title_countries(netflix_id: int) -> list:
"""Get all countries where a Netflix title is available."""
client = make_unogs_client()
url = "https://unogsng.p.rapidapi.com/title"
params = {"netflixid": str(netflix_id)}
response = client.get(url, params=params)
response.raise_for_status()
data = response.json()
results_list = data.get("results", [])
if not results_list:
return []
countries = []
for country in results_list[0].get("country_availability", []):
countries.append({
"country": country.get("country"),
"country_code": country.get("cc"),
"audio_languages": country.get("audio"),
"subtitle_languages": country.get("subtitle"),
"available_since": country.get("new_date"),
"expiring_date": country.get("expire_date"),
})
return countries
def find_exclusive_titles(
country_a_id: int,
country_b_id: int,
limit: int = 100,
) -> dict:
"""
Find titles exclusive to one country vs another.
Returns sets of titles available in A but not B, and vice versa.
"""
titles_a, _ = search_netflix(country_id=country_a_id, limit=limit)
titles_b, _ = search_netflix(country_id=country_b_id, limit=limit)
ids_a = set(t["netflix_id"] for t in titles_a)
ids_b = set(t["netflix_id"] for t in titles_b)
titles_a_dict = {t["netflix_id"]: t for t in titles_a}
titles_b_dict = {t["netflix_id"]: t for t in titles_b}
exclusive_to_a = [titles_a_dict[nid] for nid in ids_a - ids_b]
exclusive_to_b = [titles_b_dict[nid] for nid in ids_b - ids_a]
return {
"exclusive_to_a": exclusive_to_a,
"exclusive_to_b": exclusive_to_b,
"in_both": len(ids_a & ids_b),
}
def get_genre_titles(genre_id: int, country_id: int = 78, limit: int = 100) -> list:
"""Get all titles in a specific Netflix genre for a country."""
client = make_unogs_client()
url = "https://unogsng.p.rapidapi.com/search"
params = {
"genrelist": str(genre_id),
"country_list": str(country_id),
"limit": str(limit),
"orderby": "rating",
}
response = client.get(url, params=params)
response.raise_for_status()
return response.json().get("results", [])
# Popular Netflix genre IDs
NETFLIX_GENRES = {
"action": 1365,
"anime": 7424,
"comedies": 6548,
"documentaries": 6839,
"horror": 8711,
"sci_fi": 108533,
"thrillers": 8933,
"true_crime": 81237,
"drama": 5763,
"romance": 8883,
"kids": 6796,
"stand_up": 11559,
}
Approach 2: Netflix Sitemap Data
Netflix publishes XML sitemaps that list every title page. This won't give you metadata, but it gives you a complete list of Netflix IDs for the current catalog — useful for knowing what exists before deciding what to fetch:
# netflix_sitemap.py
import httpx
import re
import xml.etree.ElementTree as ET
def get_netflix_sitemap_titles() -> list:
"""Extract Netflix title IDs from their public sitemap."""
sitemap_url = "https://www.netflix.com/sitemap/title.xml"
try:
response = httpx.get(
sitemap_url,
timeout=30,
headers={
"User-Agent": "Mozilla/5.0 (compatible; SitemapBot/1.0)",
"Accept": "application/xml, text/xml, */*",
},
follow_redirects=True,
)
response.raise_for_status()
except httpx.HTTPError as e:
print(f"Failed to fetch sitemap: {e}")
return []
try:
root = ET.fromstring(response.text)
except ET.ParseError as e:
print(f"Failed to parse sitemap XML: {e}")
return []
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
titles = []
for url_el in root.findall(".//sm:url", ns):
loc = url_el.find("sm:loc", ns)
lastmod = url_el.find("sm:lastmod", ns)
if loc is not None and loc.text:
url = loc.text
match = re.search(r"/title/(\d+)", url)
if match:
titles.append({
"netflix_id": int(match.group(1)),
"url": url,
"lastmod": lastmod.text if lastmod is not None else None,
})
print(f"Found {len(titles)} titles in Netflix sitemap")
return titles
Approach 3: Intercepting Netflix's Shakti API
If you need data that UNOGS doesn't cover — like detailed cast info, episode-level metadata, or recommendation tags — you need to intercept Netflix's internal Shakti API. This is harder because Netflix uses heavy anti-bot protections, but the data is much richer.
# netflix_shakti.py
from playwright.sync_api import sync_playwright
import json
import time
def scrape_netflix_title(
netflix_id: int,
proxy_url: str = None,
) -> dict:
"""
Scrape title details from Netflix by intercepting the Shakti API.
Requires a valid Netflix account for full metadata.
For catalog enumeration without account, use the sitemap + UNOGS approach.
"""
intercepted_data = {}
def handle_response(response):
url = response.url
# Shakti API paths
if any(p in url for p in ["/pathEvaluator", "shakti", "/metadata", "/browse"]):
try:
if "json" in response.headers.get("content-type", ""):
data = response.json()
intercepted_data[url.split("?")[0]] = data
except Exception:
pass
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--no-first-run",
"--disable-dev-shm-usage",
],
)
context_kwargs = {
"viewport": {"width": 1920, "height": 1080},
"user_agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/126.0.0.0 Safari/537.36"
),
"locale": "en-US",
"timezone_id": "America/New_York",
}
if proxy_url:
context_kwargs["proxy"] = {"server": proxy_url}
context = browser.new_context(**context_kwargs)
# Remove automation flags
context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
window.chrome = { runtime: {} };
""")
page = context.new_page()
page.on("response", handle_response)
# Netflix title URL
url = f"https://www.netflix.com/title/{netflix_id}"
page.goto(url, wait_until="networkidle", timeout=30000)
time.sleep(3)
# Extract visible DOM metadata
title_data = {}
# Title
title_el = page.query_selector("[data-uia='hero-title-text'], .title-title, h1.title-title")
if title_el:
title_data["title"] = title_el.inner_text().strip()
# Synopsis
synopsis_el = page.query_selector("[data-uia='hero-synopsis'], .title-info-synopsis")
if synopsis_el:
title_data["synopsis"] = synopsis_el.inner_text().strip()
# Metadata items (year, rating, duration, etc.)
meta_items = page.query_selector_all("[data-uia='hero-metadata-item'], .maturity-number, .duration")
title_data["metadata"] = [m.inner_text().strip() for m in meta_items if m.inner_text().strip()]
# Genre tags
genre_tags = page.query_selector_all(".genreTag, [data-uia='moreLikeThis-tag']")
title_data["genres"] = [t.inner_text().strip() for t in genre_tags]
# Cast from "More Info" section
cast_el = page.query_selector("[data-uia='cast-member-list'], .cast-container")
if cast_el:
cast_text = cast_el.inner_text()
title_data["cast_text"] = cast_text.strip()
# Merge with intercepted API data
title_data["netflix_id"] = netflix_id
title_data["shakti_data"] = intercepted_data
browser.close()
return title_data
Proxy Setup for Netflix
Netflix is aggressive about blocking datacenter IPs. If you're not using a residential IP, you'll hit the login wall or get empty responses.
For regional catalog checking, residential proxies from the target country let you see exactly what's in that region's catalog. ThorData's residential proxy network supports country-specific routing:
def get_country_proxy(country_code: str) -> dict:
"""
Get ThorData proxy configuration for a specific country.
Used to check region-specific Netflix catalogs.
"""
return {
"server": "http://proxy.thordata.com:9000",
"username": f"user-country-{country_code.lower()}",
"password": "YOUR_THORDATA_PASSWORD",
}
# Check US Netflix catalog vs UK
for country in ["us", "uk", "jp", "de"]:
proxy_config = get_country_proxy(country)
# Use with Playwright context for region-specific catalog data
print(f"Proxy for {country.upper()}: {proxy_config['server']}")
Enriching with TMDb
Once you have Netflix IDs, combine with TMDb (The Movie Database) for rich metadata — cast, crew, genres, trailers, and more:
# tmdb_enricher.py
import httpx
import time
TMDB_API_KEY = "your_tmdb_api_key" # Free registration at themoviedb.org
TMDB_BASE = "https://api.themoviedb.org/3"
tmdb_client = httpx.Client(timeout=15)
def get_tmdb_by_title(title: str, year: int = None, media_type: str = None) -> dict:
"""Search TMDb for a title and return structured metadata."""
search_url = f"{TMDB_BASE}/search/multi"
params = {
"api_key": TMDB_API_KEY,
"query": title,
"language": "en-US",
}
if year:
params["year"] = year
response = tmdb_client.get(search_url, params=params)
response.raise_for_status()
results = response.json().get("results", [])
if not results:
return {}
# Filter by media_type if specified
if media_type:
filtered = [r for r in results if r.get("media_type") == media_type]
if filtered:
results = filtered
top = results[0]
tmdb_id = top["id"]
mt = top.get("media_type", "movie")
return get_tmdb_details(tmdb_id, media_type=mt)
def get_tmdb_details(tmdb_id: int, media_type: str = "movie") -> dict:
"""Fetch detailed TMDb info including credits."""
detail_url = f"{TMDB_BASE}/{media_type}/{tmdb_id}"
detail_params = {
"api_key": TMDB_API_KEY,
"append_to_response": "credits,keywords",
"language": "en-US",
}
try:
detail = tmdb_client.get(detail_url, params=detail_params).json()
except Exception:
return {}
credits = detail.get("credits", {})
keywords = detail.get("keywords", {})
kw_list = keywords.get("keywords", keywords.get("results", []))
return {
"tmdb_id": tmdb_id,
"media_type": media_type,
"title": detail.get("title") or detail.get("name"),
"original_title": detail.get("original_title") or detail.get("original_name"),
"overview": detail.get("overview"),
"tagline": detail.get("tagline"),
"genres": [g["name"] for g in detail.get("genres", [])],
"keywords": [k["name"] for k in kw_list[:20]],
"cast": [
{"name": c["name"], "character": c.get("character", ""), "order": c.get("order", 0)}
for c in credits.get("cast", [])[:15]
],
"directors": [
c["name"]
for c in credits.get("crew", [])
if c.get("job") == "Director"
],
"creators": [
c["name"]
for c in credits.get("crew", [])
if c.get("job") in ("Creator", "Writer")
],
"tmdb_rating": detail.get("vote_average"),
"tmdb_vote_count": detail.get("vote_count"),
"popularity": detail.get("popularity"),
"release_date": detail.get("release_date") or detail.get("first_air_date"),
"runtime": detail.get("runtime"),
"number_of_seasons": detail.get("number_of_seasons"),
"status": detail.get("status"),
"original_language": detail.get("original_language"),
"production_countries": [c["name"] for c in detail.get("production_countries", [])],
}
SQLite Storage Schema
import sqlite3
import json
def init_netflix_db(db_path: str = "netflix_catalog.db") -> sqlite3.Connection:
"""Initialize database for Netflix catalog data."""
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.executescript("""
CREATE TABLE IF NOT EXISTS titles (
netflix_id INTEGER PRIMARY KEY,
title TEXT,
original_title TEXT,
year INTEGER,
type TEXT,
imdb_id TEXT,
imdb_rating REAL,
tmdb_id INTEGER,
tmdb_rating REAL,
genres TEXT,
cast_data TEXT,
directors TEXT,
keywords TEXT,
overview TEXT,
runtime INTEGER,
number_seasons INTEGER,
original_language TEXT,
added_at TEXT DEFAULT CURRENT_TIMESTAMP,
last_updated TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS country_availability (
id INTEGER PRIMARY KEY AUTOINCREMENT,
netflix_id INTEGER NOT NULL,
country_code TEXT,
country_name TEXT,
available_since TEXT,
expiring_date TEXT,
audio_languages TEXT,
subtitle_languages TEXT,
checked_at TEXT DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (netflix_id) REFERENCES titles(netflix_id)
);
CREATE TABLE IF NOT EXISTS catalog_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
country_code TEXT,
netflix_id INTEGER,
title TEXT,
type TEXT,
snapshot_date TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_titles_type ON titles (type);
CREATE INDEX IF NOT EXISTS idx_availability_netflix ON country_availability (netflix_id);
CREATE INDEX IF NOT EXISTS idx_availability_country ON country_availability (country_code);
CREATE INDEX IF NOT EXISTS idx_snapshots_country ON catalog_snapshots (country_code);
""")
conn.commit()
return conn
def save_title(conn: sqlite3.Connection, title: dict, tmdb_data: dict = None):
"""Save a Netflix title with optional TMDb enrichment."""
merged = {**title}
if tmdb_data:
merged.update({
"tmdb_id": tmdb_data.get("tmdb_id"),
"tmdb_rating": tmdb_data.get("tmdb_rating"),
"genres": json.dumps(tmdb_data.get("genres", [])),
"cast_data": json.dumps(tmdb_data.get("cast", [])),
"directors": json.dumps(tmdb_data.get("directors", [])),
"keywords": json.dumps(tmdb_data.get("keywords", [])),
"overview": tmdb_data.get("overview"),
"runtime": tmdb_data.get("runtime"),
"number_seasons": tmdb_data.get("number_of_seasons"),
"original_language": tmdb_data.get("original_language"),
})
conn.execute(
"""
INSERT OR REPLACE INTO titles
(netflix_id, title, year, type, imdb_id, imdb_rating,
tmdb_id, tmdb_rating, genres, cast_data, directors, keywords,
overview, runtime, number_seasons, original_language)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
""",
(
merged.get("netflix_id"),
merged.get("title"),
merged.get("year"),
merged.get("type"),
merged.get("imdb_id"),
merged.get("imdb_rating"),
merged.get("tmdb_id"),
merged.get("tmdb_rating"),
merged.get("genres"),
merged.get("cast_data"),
merged.get("directors"),
merged.get("keywords"),
merged.get("overview"),
merged.get("runtime"),
merged.get("number_seasons"),
merged.get("original_language"),
),
)
conn.commit()
def save_country_availability(conn: sqlite3.Connection, netflix_id: int, countries: list):
"""Save country availability data for a title."""
# Delete old availability data first (it changes over time)
conn.execute(
"DELETE FROM country_availability WHERE netflix_id=?",
(netflix_id,)
)
for c in countries:
conn.execute(
"""INSERT INTO country_availability
(netflix_id, country_code, country_name, available_since, expiring_date,
audio_languages, subtitle_languages)
VALUES (?,?,?,?,?,?,?)""",
(
netflix_id,
c.get("country_code"),
c.get("country"),
c.get("available_since"),
c.get("expiring_date"),
c.get("audio_languages"),
c.get("subtitle_languages"),
),
)
conn.commit()
Complete Catalog Pipeline
def build_catalog_database(
countries: list = None,
db_path: str = "netflix_catalog.db",
enrich_with_tmdb: bool = True,
max_per_country: int = 1000,
):
"""
Build a comprehensive Netflix catalog database.
countries: list of country IDs (defaults to major Netflix regions)
"""
if countries is None:
countries = [78, 46, 39, 267, 23] # US, UK, DE, JP, AU
conn = init_netflix_db(db_path)
seen_ids = set()
for country_id in countries:
country_code = {v: k for k, v in COUNTRY_IDS.items()}.get(country_id, str(country_id))
print(f"\nFetching catalog for {country_code.upper()} (id={country_id})...")
titles, total = search_netflix(country_id=country_id, limit=100)
print(f" Total available: {total}")
for i, title in enumerate(titles[:max_per_country]):
netflix_id = title.get("netflix_id")
if not netflix_id:
continue
# Save snapshot record
conn.execute(
"INSERT INTO catalog_snapshots (country_code, netflix_id, title, type) "
"VALUES (?, ?, ?, ?)",
(country_code, netflix_id, title.get("title"), title.get("type"))
)
# Only enrich each unique title once
if netflix_id not in seen_ids:
seen_ids.add(netflix_id)
tmdb_data = None
if enrich_with_tmdb and title.get("title"):
try:
tmdb_data = get_tmdb_by_title(
title["title"],
year=title.get("year"),
media_type="movie" if title.get("type") == "movie" else "tv",
)
time.sleep(0.25) # TMDb rate limit: 40 req/10 sec
except Exception as e:
print(f" TMDb error for {title['title']}: {e}")
save_title(conn, title, tmdb_data)
if i % 50 == 0:
print(f" Processed {i+1}/{min(len(titles), max_per_country)} titles...")
conn.commit()
total = conn.execute("SELECT COUNT(*) FROM titles").fetchone()[0]
print(f"\nCatalog database complete. {total:,} unique titles.")
conn.close()
# Analytical queries
def most_available_titles(conn: sqlite3.Connection, min_countries: int = 10) -> list:
"""Find titles available in the most countries."""
rows = conn.execute(
"""
SELECT t.title, t.year, t.type, COUNT(ca.country_code) as country_count
FROM titles t
JOIN country_availability ca ON ca.netflix_id = t.netflix_id
GROUP BY t.netflix_id
HAVING country_count >= ?
ORDER BY country_count DESC
LIMIT 20
""",
(min_countries,)
).fetchall()
return [{"title": r[0], "year": r[1], "type": r[2], "countries": r[3]} for r in rows]
def expiring_soon(conn: sqlite3.Connection, country_code: str, days: int = 30) -> list:
"""Find titles expiring soon in a country."""
import datetime
cutoff = (datetime.date.today() + datetime.timedelta(days=days)).isoformat()
rows = conn.execute(
"""
SELECT t.title, t.year, t.type, ca.expiring_date
FROM titles t
JOIN country_availability ca ON ca.netflix_id = t.netflix_id
WHERE ca.country_code = ?
AND ca.expiring_date IS NOT NULL
AND ca.expiring_date <= ?
ORDER BY ca.expiring_date ASC
LIMIT 50
""",
(country_code, cutoff)
).fetchall()
return [{"title": r[0], "year": r[1], "type": r[2], "expires": r[3]} for r in rows]
Practical Tips
UNOGS is the right first choice. For most catalog data needs, UNOGS covers the hard parts — regional availability tracking across 50+ countries, genre categorization, and daily updates. Direct Netflix scraping is only necessary when you need cast details, episode metadata, or data that UNOGS doesn't expose.
Cache regional availability separately. Catalog availability changes frequently (Netflix acquires and loses licenses monthly). Store availability with a checked_at timestamp and refresh only the titles that are oldest in your database.
TMDb enrichment is free and powerful. The TMDb API has generous rate limits (40 requests per 10 seconds) and a free tier that covers most use cases. Combining Netflix IDs with TMDb metadata gives you cast, crew, genres, keywords, trailers, and ratings without any scraping.
Country-specific proxies for direct Netflix access. If you need to verify what's actually available in a region, use residential proxies from that country. ThorData supports per-country routing, which lets you fetch the same URL through different country IP pools and compare results.
Sitemaps for ID discovery. Netflix's sitemap is the most reliable way to discover new Netflix IDs without scraping their catalog pages. Run it weekly and diff against your database to find new additions.
Legal Notes
Netflix's Terms of Service prohibit scraping. Section 4.2 explicitly prohibits automated access and circumventing technical protection measures. Netflix has pursued legal action against scrapers operating at commercial scale.
For production applications, the appropriate approaches are:
- JustWatch API — has licensing arrangements with Netflix for catalog data and offers a legitimate API
- UNOGS — aggregates Netflix data through their own processes; using their RapidAPI is accessing the data through a legitimate provider
- TMDb — fully licensed and free; doesn't have catalog availability data but has all the metadata
- Netflix Partner APIs — available to licensed technology partners, app developers, and content creators through their official program
The code in this post is documented for educational purposes. Understand the legal implications before using direct Netflix scraping in any commercial context.