YouTube Video Stats Without the API Key (innertube approach)
The official YouTube Data API v3 requires a Google Cloud account, OAuth credentials, and a quota system that caps you at 10,000 units per day. For many tasks — checking view counts, monitoring a playlist, building a lightweight dashboard — that overhead is not worth it.
YouTube's own web client uses an internal JSON API called innertube. It is not documented publicly, but it has been stable enough to use with care for several years. This post walks through every approach from simple oEmbed to full innertube /player calls with rotating residential proxies, SQLite-backed storage, and production-grade error handling.
Why Skip the Official API?
The official YouTube Data API v3 imposes real constraints on what you can do:
- You need a Google Cloud project with a billing account attached — even for free-tier quota
- OAuth credentials must be set up for any user-specific data (subscriptions, playlists, watch history)
- The quota system is opaque: a single
/videosrequest withstatisticspart costs 1 unit, but a/searchcall costs 100 units — you can burn through 10,000 units in 100 search requests - API credentials can be revoked by Google at any time for Terms of Service violations, leaving your application broken
For public video statistics (view counts, duration, channel information, tags), none of this overhead is necessary. The same data is available through YouTube's internal endpoints used by their own web player. Billions of requests hit these endpoints every day from legitimate browser sessions.
The tradeoff is clear: innertube has no SLA, no official support, and the response schema can change without warning. YouTube has broken unofficial clients before when rolling out changes — the clientVersion string sometimes needs to be updated to keep responses working. For production systems handling business-critical data, the official API is still the right choice. For scripts, internal tools, research, and moderate-scale monitoring, innertube is the fastest path.
What Data Is Actually Available
Before writing a single line of code, it helps to know what data you can actually get from each endpoint.
oEmbed endpoint (no auth, no key): - Video title - Author/channel name - Thumbnail URL (multiple resolutions) - Embed HTML - Width and height hints
innertube /player endpoint (no auth, no key):
- View count (total plays)
- Video duration in seconds
- Full description text
- Video title
- Channel name and channel ID
- Keywords/tags array
- Publication date
- Category
- Family safe flag
- Live stream flag
- Whether the video allows embedding
- Whether the video is private or unlisted
Not available anywhere without a logged-in account: - Like count (removed from public API surface in 2021) - Dislike count (removed even earlier) - Comment count (requires API key via official API) - Subscriber count for the channel (available via separate channel lookup)
Approach 1: oEmbed Endpoint
YouTube exposes an oEmbed endpoint that returns basic metadata about any public video. No API key, no auth, no setup.
https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=VIDEO_ID&format=json
The response includes the title, author, thumbnail URL, and embed HTML — but not view count, likes, or description. Good for link previews and thumbnails, not for statistics.
# youtube_oembed.py
import httpx
def get_oembed(video_id: str) -> dict:
"""
Fetch basic metadata via YouTube's oEmbed endpoint.
No API key required. Works for any public video.
"""
url = "https://www.youtube.com/oembed"
params = {
"url": f"https://www.youtube.com/watch?v={video_id}",
"format": "json",
}
resp = httpx.get(url, params=params, timeout=10)
resp.raise_for_status()
return resp.json()
info = get_oembed("dQw4w9WgXcQ")
print(info["title"]) # video title
print(info["author_name"]) # channel name
print(info["thumbnail_url"]) # high-res thumbnail URL
print(info["width"]) # recommended embed width
print(info["height"]) # recommended embed height
This endpoint is effectively official — it is documented in the oEmbed spec and YouTube lists it in their discovery document. Rate limits are lenient for reasonable usage. The thumbnail URL returned is the standard hqdefault resolution; for higher resolution thumbnails, construct the URL directly:
def get_thumbnail_urls(video_id: str) -> dict:
"""Return all thumbnail resolution URLs for a video."""
base = f"https://i.ytimg.com/vi/{video_id}"
return {
"default": f"{base}/default.jpg", # 120x90
"medium": f"{base}/mqdefault.jpg", # 320x180
"high": f"{base}/hqdefault.jpg", # 480x360
"standard": f"{base}/sddefault.jpg", # 640x480
"maxres": f"{base}/maxresdefault.jpg", # 1280x720 (not always available)
}
The maxresdefault.jpg URL exists only for videos uploaded at 720p or higher. Test with a HEAD request before downloading to avoid 404 errors.
Approach 2: innertube /player Endpoint
The innertube API is what the YouTube web player uses internally to fetch video metadata and the streaming manifest. The endpoint accepts a POST request with a JSON body describing the client context. No API key is required for public videos.
The key endpoint is:
POST https://www.youtube.com/youtubei/v1/player
The request body needs a videoId and a context block identifying the client. Using the WEB client returns the full player response including statistics:
# youtube_innertube_basic.py
import httpx
INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"
def get_video_stats(video_id: str) -> dict:
"""
Fetch full video metadata via YouTube's innertube /player endpoint.
No API key required. Returns views, duration, channel, and more.
"""
payload = {
"videoId": video_id,
"context": {
"client": {
"clientName": "WEB",
"clientVersion": "2.20240101.00.00",
}
},
}
headers = {
"Content-Type": "application/json",
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/121.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
"Origin": "https://www.youtube.com",
"Referer": "https://www.youtube.com/",
}
resp = httpx.post(INNERTUBE_URL, json=payload, headers=headers, timeout=15)
resp.raise_for_status()
return resp.json()
data = get_video_stats("dQw4w9WgXcQ")
print(data.get("videoDetails", {}).get("title"))
print(data.get("videoDetails", {}).get("viewCount"))
Parsing the innertube Response
The innertube response is a large JSON object with many nested layers. The fields you most likely want are nested under videoDetails and microformat. Always check that a key exists before accessing it — YouTube A/B tests cause some fields to be absent in certain response variants.
# youtube_innertube_parse.py
import httpx
from typing import Optional
INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"
def get_video_stats(video_id: str, proxy_url: Optional[str] = None) -> dict:
"""Fetch video stats with optional proxy support."""
payload = {
"videoId": video_id,
"context": {
"client": {
"clientName": "WEB",
"clientVersion": "2.20240101.00.00",
"hl": "en",
"gl": "US",
}
},
}
headers = {
"Content-Type": "application/json",
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/121.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
"Origin": "https://www.youtube.com",
"Referer": "https://www.youtube.com/",
"X-Youtube-Client-Name": "1",
"X-Youtube-Client-Version": "2.20240101.00.00",
}
client_kwargs: dict = {"timeout": 15}
if proxy_url:
client_kwargs["proxy"] = proxy_url
with httpx.Client(**client_kwargs) as client:
resp = client.post(INNERTUBE_URL, json=payload, headers=headers)
resp.raise_for_status()
return resp.json()
def parse_stats(data: dict) -> dict:
"""
Parse innertube /player response into a flat statistics dict.
All fields use .get() with safe defaults to handle A/B response variants.
"""
details = data.get("videoDetails", {})
microformat = (
data.get("microformat", {})
.get("playerMicroformatRenderer", {})
)
streaming = data.get("streamingData", {})
# Duration parsing
length_sec = int(details.get("lengthSeconds", 0))
hours = length_sec // 3600
minutes = (length_sec % 3600) // 60
seconds = length_sec % 60
duration_str = f"{hours}h {minutes}m {seconds}s" if hours else f"{minutes}m {seconds}s"
# Available stream quality levels
formats = streaming.get("formats", [])
qualities = sorted(
set(f.get("qualityLabel", "") for f in formats if f.get("qualityLabel")),
reverse=True,
)
return {
"video_id": details.get("videoId"),
"title": details.get("title"),
"channel": details.get("author"),
"channel_id": details.get("channelId"),
"view_count": int(details.get("viewCount", 0)),
"length_sec": length_sec,
"duration": duration_str,
"description": details.get("shortDescription", ""),
"is_live": details.get("isLiveContent", False),
"is_private": details.get("isPrivate", False),
"is_unlisted": details.get("isUnlisted", False),
"keywords": details.get("keywords", []),
"thumbnail_url": (
details.get("thumbnail", {}).get("thumbnails", [{}])[-1].get("url", "")
),
"published": microformat.get("publishDate"),
"upload_date": microformat.get("uploadDate"),
"category": microformat.get("category"),
"family_safe": microformat.get("isFamilySafe"),
"allow_embed": microformat.get("allowEmbed"),
"available_qualities": qualities,
}
# Usage example
raw = get_video_stats("dQw4w9WgXcQ")
stats = parse_stats(raw)
print(f"Title: {stats['title']}")
print(f"Channel: {stats['channel']}")
print(f"Views: {stats['view_count']:,}")
print(f"Published: {stats['published']}")
print(f"Duration: {stats['duration']}")
print(f"Category: {stats['category']}")
print(f"Keywords: {', '.join(stats['keywords'][:5])}")
Note on likes: The innertube
/playerendpoint does not return like counts. YouTube removed public like counts from the API surface in 2021. The like count is rendered on the page via a separate innertube call (/next), but parsing it requires handling additional response layers and it is no longer guaranteed to be accurate. For most analytics use cases, view count and engagement metrics derived from comments or description are sufficient.
Anti-Detection Techniques
YouTube's bot detection on innertube is primarily IP-based and timing-based. Here are the techniques that extend how long a single session stays functional.
Request Headers
The two most important headers are Origin and Referer. Without them, YouTube treats the request as coming from a non-browser context and is more likely to rate-limit aggressively:
headers = {
"Content-Type": "application/json",
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/121.0.0.0 Safari/537.36"
),
"Accept": "application/json, text/plain, */*",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Origin": "https://www.youtube.com",
"Referer": "https://www.youtube.com/",
"X-Youtube-Client-Name": "1",
"X-Youtube-Client-Version": "2.20240101.00.00",
"Sec-Ch-Ua": '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": '"Windows"',
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
}
Timing and Jitter
Never make requests in tight loops. The most effective anti-detection technique is simply waiting:
import time
import random
def jittered_sleep(base_seconds: float = 1.5, variance: float = 0.5):
"""Sleep for base_seconds plus or minus variance to mimic human timing."""
delay = base_seconds + random.uniform(-variance, variance)
time.sleep(max(0.1, delay)) # never sleep less than 100ms
Between consecutive video fetches, a jittered_sleep(2.0, 1.0) call gives you 1-3 seconds of delay with random variance that defeats simple rate-limit detection patterns.
Session Warming
Starting each session with a request to the YouTube homepage collects cookies that subsequent innertube calls benefit from:
import httpx
import time
import random
def create_warmed_session(proxy_url: str = None) -> httpx.Client:
"""
Create an HTTP client that has collected YouTube session cookies.
Improves success rate on subsequent innertube calls.
"""
client_kwargs = {
"follow_redirects": True,
"timeout": 20,
}
if proxy_url:
client_kwargs["proxy"] = proxy_url
client = httpx.Client(**client_kwargs)
# Visit homepage to collect VISITOR_INFO1_LIVE, YSC, and consent cookies
try:
client.get(
"https://www.youtube.com/",
headers={
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/121.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
},
)
time.sleep(random.uniform(1.0, 2.0))
except httpx.RequestError:
pass
return client
Rate Limits and Terms of Service
YouTube does not publish rate limits for innertube. From practical testing:
- Requests from a single residential IP: roughly 100-300 requests before you see 429 responses
- Datacenter IPs (AWS, GCP, etc.) get blocked much faster — sometimes within 10-20 requests
- Adding a 1-2 second delay between requests significantly extends how long a single IP remains functional
- Responses are cached by YouTube; fetching the same video repeatedly counts against your quota but does not refetch uncached data
Terms of Service: YouTube's ToS (section 5B) prohibits circumventing technical measures and automated access to the service outside of the official API. Using innertube directly is a gray area — it is the same endpoint the official web client uses, but you are not the intended consumer. For personal use and research it is widely practiced. For commercial products that resell YouTube data, seriously consider the official API or a compliant managed service. Never cache data publicly in a way that reproduces YouTube's content, and always attribute data sources.
Proxy Rotation for Higher Volume
If you are fetching stats for hundreds or thousands of videos, you will need proxy rotation to avoid IP-based throttling.
# youtube_with_proxy.py
import httpx
import time
import random
from typing import Optional
INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"
def get_video_stats_proxied(
video_id: str,
proxy_url: Optional[str] = None,
) -> dict:
"""
Fetch video stats with optional rotating proxy support.
For high-volume scraping, pass a new proxy_url for each request
or small batch of requests.
"""
payload = {
"videoId": video_id,
"context": {
"client": {
"clientName": "WEB",
"clientVersion": "2.20240101.00.00",
"hl": "en",
"gl": "US",
}
},
}
headers = {
"Content-Type": "application/json",
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/121.0.0.0 Safari/537.36"
),
"Origin": "https://www.youtube.com",
"Referer": "https://www.youtube.com/",
"Accept-Language": "en-US,en;q=0.9",
}
client_kwargs: dict = {"timeout": 20, "follow_redirects": True}
if proxy_url:
client_kwargs["proxy"] = proxy_url
with httpx.Client(**client_kwargs) as client:
resp = client.post(INNERTUBE_URL, json=payload, headers=headers)
if resp.status_code == 429:
raise RuntimeError(f"Rate limited (429) for video {video_id}. Rotate proxy.")
if resp.status_code == 403:
raise RuntimeError(f"Forbidden (403) for video {video_id}. IP may be blocked.")
resp.raise_for_status()
return resp.json()
def fetch_batch(
video_ids: list,
proxy_url: Optional[str] = None,
delay_range: tuple = (1.5, 3.0),
) -> dict:
"""
Fetch stats for a list of video IDs with jitter between requests.
Returns a dict mapping video_id to parsed stats or error string.
"""
results = {}
for i, vid in enumerate(video_ids):
try:
raw = get_video_stats_proxied(vid, proxy_url=proxy_url)
results[vid] = parse_stats(raw)
print(f"[{i+1}/{len(video_ids)}] {vid}: {results[vid]['view_count']:,} views")
except RuntimeError as e:
print(f"[{i+1}/{len(video_ids)}] {vid}: ERROR - {e}")
results[vid] = {"error": str(e)}
except Exception as e:
print(f"[{i+1}/{len(video_ids)}] {vid}: UNEXPECTED - {e}")
results[vid] = {"error": str(e)}
if i < len(video_ids) - 1:
time.sleep(random.uniform(*delay_range))
return results
# Example: fetch a batch of videos through a rotating proxy
proxy = "http://user:[email protected]:9000"
video_ids = [
"dQw4w9WgXcQ",
"jNQXAC9IVRw",
"kJQP7kiw5Fk",
"9bZkp7q19f0",
"OPf0YbXqDm0",
]
batch_results = fetch_batch(video_ids, proxy_url=proxy, delay_range=(2.0, 4.0))
Residential proxies work significantly better than datacenter proxies for YouTube. For proxy providers, ThorData has a rotating residential pool that handles YouTube well — their per-GB pricing is competitive, and the rotating gateway means you do not have to manage proxy lists yourself.
A few practical notes when running at volume:
- Rotate proxies per request, not per session — reusing an IP across many requests defeats the purpose
- Add jitter to your request timing (random 0.5-2 second delays)
- Monitor 429 response rates; if they exceed 5-10%, your proxy pool is being detected
- Cache responses locally — if you need the same video stats multiple times in a day, do not re-fetch
SQLite Storage Schema
For any serious monitoring pipeline, store results in SQLite rather than flat JSON files. This schema handles both raw responses and parsed stats:
# youtube_storage.py
import sqlite3
import json
from datetime import datetime
def init_db(db_path: str = "youtube_stats.db") -> sqlite3.Connection:
"""Initialize SQLite database with schema for YouTube video stats."""
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL") # better write concurrency
conn.execute("PRAGMA synchronous=NORMAL")
conn.executescript("""
CREATE TABLE IF NOT EXISTS videos (
video_id TEXT PRIMARY KEY,
title TEXT,
channel TEXT,
channel_id TEXT,
category TEXT,
published TEXT,
upload_date TEXT,
duration_sec INTEGER,
description TEXT,
keywords TEXT,
is_live INTEGER DEFAULT 0,
is_private INTEGER DEFAULT 0,
family_safe INTEGER DEFAULT 1,
allow_embed INTEGER DEFAULT 1,
first_seen TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS view_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
video_id TEXT NOT NULL,
view_count INTEGER,
snapshot_at TEXT DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (video_id) REFERENCES videos(video_id)
);
CREATE TABLE IF NOT EXISTS fetch_errors (
id INTEGER PRIMARY KEY AUTOINCREMENT,
video_id TEXT,
error_type TEXT,
error_msg TEXT,
proxy_used TEXT,
occurred_at TEXT DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_snapshots_video_id
ON view_snapshots (video_id);
CREATE INDEX IF NOT EXISTS idx_snapshots_at
ON view_snapshots (snapshot_at);
""")
conn.commit()
return conn
def upsert_video(conn: sqlite3.Connection, stats: dict):
"""Insert or update video metadata. Does NOT overwrite view count."""
conn.execute(
"""
INSERT INTO videos
(video_id, title, channel, channel_id, category,
published, upload_date, duration_sec, description,
keywords, is_live, is_private, family_safe, allow_embed)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(video_id) DO UPDATE SET
title = excluded.title,
channel = excluded.channel,
category = excluded.category,
description = excluded.description,
keywords = excluded.keywords
""",
(
stats.get("video_id"),
stats.get("title"),
stats.get("channel"),
stats.get("channel_id"),
stats.get("category"),
stats.get("published"),
stats.get("upload_date"),
stats.get("length_sec"),
stats.get("description", "")[:2000],
json.dumps(stats.get("keywords", [])),
int(stats.get("is_live", False)),
int(stats.get("is_private", False)),
int(stats.get("family_safe", True)),
int(stats.get("allow_embed", True)),
),
)
conn.commit()
def record_views(conn: sqlite3.Connection, video_id: str, view_count: int):
"""Record a view count snapshot."""
conn.execute(
"INSERT INTO view_snapshots (video_id, view_count) VALUES (?, ?)",
(video_id, view_count),
)
conn.commit()
def get_view_history(conn: sqlite3.Connection, video_id: str, limit: int = 30) -> list:
"""Retrieve recent view count history for a video."""
rows = conn.execute(
"""
SELECT view_count, snapshot_at
FROM view_snapshots
WHERE video_id = ?
ORDER BY snapshot_at DESC
LIMIT ?
""",
(video_id, limit),
).fetchall()
return [{"views": r[0], "at": r[1]} for r in rows]
def compute_growth_rate(history: list) -> float:
"""
Compute daily view growth rate from snapshot history.
Returns views/day as a float, or None if not enough data.
"""
if len(history) < 2:
return None
oldest = history[-1]
newest = history[0]
delta_views = newest["views"] - oldest["views"]
fmt = "%Y-%m-%d %H:%M:%S"
try:
t_old = datetime.strptime(oldest["at"][:19], fmt)
t_new = datetime.strptime(newest["at"][:19], fmt)
delta_days = (t_new - t_old).total_seconds() / 86400
if delta_days <= 0:
return None
return delta_views / delta_days
except Exception:
return None
Error Handling Patterns
Production innertube scraping needs robust error handling because failures are common and varied:
import httpx
import time
import random
def safe_get_video_stats(
video_id: str,
proxy_url: str = None,
max_retries: int = 3,
) -> tuple:
"""
Fetch video stats with full error handling and retry logic.
Returns (stats_dict, status_string).
On failure, stats_dict is None and status_string describes the problem.
"""
INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"
for attempt in range(1, max_retries + 1):
try:
payload = {
"videoId": video_id,
"context": {"client": {"clientName": "WEB", "clientVersion": "2.20240101.00.00"}},
}
headers = {
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Origin": "https://www.youtube.com",
}
client_kwargs = {"timeout": 20}
if proxy_url:
client_kwargs["proxy"] = proxy_url
with httpx.Client(**client_kwargs) as client:
resp = client.post(INNERTUBE_URL, json=payload, headers=headers)
if resp.status_code == 429:
backoff = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited for {video_id}, waiting {backoff:.1f}s (attempt {attempt})")
time.sleep(backoff)
if attempt == max_retries:
return None, "rate_limited"
continue
if resp.status_code == 403:
return None, "blocked"
resp.raise_for_status()
raw = resp.json()
# Check playability status
playability = raw.get("playabilityStatus", {})
status = playability.get("status", "")
if status in ("ERROR", "LOGIN_REQUIRED", "UNPLAYABLE"):
reason = playability.get("reason", "unknown reason")
return None, f"video_unavailable:{reason}"
if "videoDetails" not in raw:
return None, "parse_error:missing videoDetails"
return parse_stats(raw), "success"
except httpx.RequestError as e:
if attempt < max_retries:
time.sleep(2 ** attempt)
else:
return None, f"network_error:{e}"
except Exception as e:
return None, f"unexpected:{e}"
return None, "max_retries_exceeded"
Complete Monitoring Pipeline
Putting it all together: a pipeline that monitors a list of videos and tracks view count over time.
# youtube_monitor.py
import time
import random
def monitor_videos(
video_ids: list,
db_path: str = "youtube_stats.db",
proxy_url: str = None,
delay_range: tuple = (2.0, 5.0),
):
"""
Run one monitoring cycle: fetch stats for all video IDs,
store results, and report on growth.
"""
conn = init_db(db_path)
success_count = 0
error_count = 0
print(f"Starting monitoring run for {len(video_ids)} videos...")
for i, video_id in enumerate(video_ids):
stats, status = safe_get_video_stats(video_id, proxy_url=proxy_url)
if stats and status == "success":
upsert_video(conn, stats)
record_views(conn, video_id, stats["view_count"])
history = get_view_history(conn, video_id, limit=10)
growth = compute_growth_rate(history)
growth_str = f"{growth:,.0f} views/day" if growth else "first snapshot"
print(
f"[{i+1}/{len(video_ids)}] {stats['title'][:50]}... "
f"| {stats['view_count']:,} views | {growth_str}"
)
success_count += 1
else:
error_count += 1
conn.execute(
"INSERT INTO fetch_errors (video_id, error_type, error_msg, proxy_used) "
"VALUES (?, ?, ?, ?)",
(video_id, "fetch_failed", status, proxy_url),
)
conn.commit()
print(f"[{i+1}/{len(video_ids)}] {video_id}: FAILED - {status}")
if i < len(video_ids) - 1:
time.sleep(random.uniform(*delay_range))
conn.close()
print(f"\nDone: {success_count} ok, {error_count} errors")
# Run it
PROXY = "http://user:[email protected]:9000"
VIDEO_IDS = [
"dQw4w9WgXcQ",
"jNQXAC9IVRw",
"kJQP7kiw5Fk",
"9bZkp7q19f0",
"OPf0YbXqDm0",
]
monitor_videos(VIDEO_IDS, proxy_url=PROXY, delay_range=(2.5, 5.0))
Channel Video Discovery
The innertube API also supports channel video lists via the /browse endpoint. This lets you discover all videos on a channel without the official API:
def get_channel_videos(
channel_id: str,
max_results: int = 50,
) -> list:
"""
Fetch recent video IDs from a YouTube channel using innertube /browse.
channel_id: UCxxxxxxxxxxxxxxxxxx format.
Returns a list of video ID strings.
"""
url = "https://www.youtube.com/youtubei/v1/browse"
payload = {
"browseId": channel_id,
"params": "EgZ2aWRlb3M%3D", # base64-encoded "videos" tab selector
"context": {
"client": {
"clientName": "WEB",
"clientVersion": "2.20240101.00.00",
}
},
}
headers = {
"Content-Type": "application/json",
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/121.0.0.0 Safari/537.36"
),
"Origin": "https://www.youtube.com",
}
resp = httpx.post(url, json=payload, headers=headers, timeout=20)
resp.raise_for_status()
data = resp.json()
# Walk the deeply nested response to extract video IDs
video_ids = []
_extract_video_ids(data, video_ids)
return video_ids[:max_results]
def _extract_video_ids(obj, results: list):
"""Recursively search innertube browse response for videoId fields."""
if isinstance(obj, dict):
if "videoId" in obj and isinstance(obj["videoId"], str):
vid = obj["videoId"]
if vid not in results:
results.append(vid)
for v in obj.values():
_extract_video_ids(v, results)
elif isinstance(obj, list):
for item in obj:
_extract_video_ids(item, results)
When to Use Each Approach
| Approach | Data available | Volume | Complexity |
|---|---|---|---|
| oEmbed | Title, thumbnail, author | High (lenient limits) | Minimal |
| innertube /player (no proxy) | Views, duration, description, channel | Low (~100-300/IP/day) | Low |
| innertube /player + rotating proxies | Views, duration, description, channel | Medium (1k-10k/day) | Medium |
| Official YouTube Data API v3 | Views, likes, comments, full metadata | 10k units/day free | Medium (auth setup) |
| Managed scraper (Apify) | Full stats + comments | Unlimited | Low (pay per run) |
For one-off scripts and internal tools, innertube direct is the fastest path. The oEmbed endpoint is the right choice when you only need titles and thumbnails. When you hit volume limits or need a stable production pipeline, the official API or a managed service is worth the setup time.
Legal Notes
YouTube's Terms of Service prohibit automated access outside the official API. The innertube approach described here operates in a gray zone — you are accessing YouTube's own internal endpoint, but as an unintended client.
In practice: - Personal use and research: Generally tolerated. No known enforcement against individuals running small scripts. - Commercial products reselling YouTube data: High legal risk. YouTube has pursued enforcement against larger-scale scrapers. Use the official API or licensed data providers. - Building a product that competes with YouTube: The ToS explicitly prohibits this use case.
The right approach for any production system with commercial use is to start with the official API. Innertube is appropriate for tooling, research pipelines, and monitoring workflows where you are the end user of the data.
Summary
The innertube approach has been working reliably for several years, but build your integration defensively: validate that expected keys exist, log raw responses when parsing fails, and pin the clientVersion string rather than auto-generating it — YouTube occasionally returns different response shapes for newer client versions.
The oEmbed endpoint is the safest choice for basic metadata. Innertube /player gives you full statistics without any API key or Google Cloud account. For volume above a few hundred videos per day, rotating residential proxies via ThorData are the difference between a functional pipeline and a constant IP-blocking battle. Pair everything with SQLite for local caching and you have a solid, self-contained YouTube monitoring system.