YouTube Video Stats Without the API Key (innertube approach)

2026-03-29 scraping youtube python api

The official YouTube Data API v3 requires a Google Cloud account, OAuth credentials, and a quota system that caps you at 10,000 units per day. For many tasks — checking view counts, monitoring a playlist, building a lightweight dashboard — that overhead is not worth it.

YouTube's own web client uses an internal JSON API called innertube. It is not documented publicly, but it has been stable enough to use with care for several years. This post walks through every approach from simple oEmbed to full innertube /player calls with rotating residential proxies, SQLite-backed storage, and production-grade error handling.

Why Skip the Official API?

The official YouTube Data API v3 imposes real constraints on what you can do:

You need a Google Cloud project with a billing account attached — even for free-tier quota
OAuth credentials must be set up for any user-specific data (subscriptions, playlists, watch history)
The quota system is opaque: a single /videos request with statistics part costs 1 unit, but a /search call costs 100 units — you can burn through 10,000 units in 100 search requests
API credentials can be revoked by Google at any time for Terms of Service violations, leaving your application broken

For public video statistics (view counts, duration, channel information, tags), none of this overhead is necessary. The same data is available through YouTube's internal endpoints used by their own web player. Billions of requests hit these endpoints every day from legitimate browser sessions.

The tradeoff is clear: innertube has no SLA, no official support, and the response schema can change without warning. YouTube has broken unofficial clients before when rolling out changes — the clientVersion string sometimes needs to be updated to keep responses working. For production systems handling business-critical data, the official API is still the right choice. For scripts, internal tools, research, and moderate-scale monitoring, innertube is the fastest path.

What Data Is Actually Available

Before writing a single line of code, it helps to know what data you can actually get from each endpoint.

oEmbed endpoint (no auth, no key): - Video title - Author/channel name - Thumbnail URL (multiple resolutions) - Embed HTML - Width and height hints

innertube /player endpoint (no auth, no key): - View count (total plays) - Video duration in seconds - Full description text - Video title - Channel name and channel ID - Keywords/tags array - Publication date - Category - Family safe flag - Live stream flag - Whether the video allows embedding - Whether the video is private or unlisted

Not available anywhere without a logged-in account: - Like count (removed from public API surface in 2021) - Dislike count (removed even earlier) - Comment count (requires API key via official API) - Subscriber count for the channel (available via separate channel lookup)

Approach 1: oEmbed Endpoint

YouTube exposes an oEmbed endpoint that returns basic metadata about any public video. No API key, no auth, no setup.

https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=VIDEO_ID&format=json

The response includes the title, author, thumbnail URL, and embed HTML — but not view count, likes, or description. Good for link previews and thumbnails, not for statistics.

# youtube_oembed.py
import httpx

def get_oembed(video_id: str) -> dict:
    """
    Fetch basic metadata via YouTube's oEmbed endpoint.
    No API key required. Works for any public video.
    """
    url = "https://www.youtube.com/oembed"
    params = {
        "url": f"https://www.youtube.com/watch?v={video_id}",
        "format": "json",
    }
    resp = httpx.get(url, params=params, timeout=10)
    resp.raise_for_status()
    return resp.json()

info = get_oembed("dQw4w9WgXcQ")
print(info["title"])           # video title
print(info["author_name"])     # channel name
print(info["thumbnail_url"])   # high-res thumbnail URL
print(info["width"])           # recommended embed width
print(info["height"])          # recommended embed height

This endpoint is effectively official — it is documented in the oEmbed spec and YouTube lists it in their discovery document. Rate limits are lenient for reasonable usage. The thumbnail URL returned is the standard hqdefault resolution; for higher resolution thumbnails, construct the URL directly:

def get_thumbnail_urls(video_id: str) -> dict:
    """Return all thumbnail resolution URLs for a video."""
    base = f"https://i.ytimg.com/vi/{video_id}"
    return {
        "default": f"{base}/default.jpg",       # 120x90
        "medium": f"{base}/mqdefault.jpg",       # 320x180
        "high": f"{base}/hqdefault.jpg",         # 480x360
        "standard": f"{base}/sddefault.jpg",     # 640x480
        "maxres": f"{base}/maxresdefault.jpg",   # 1280x720 (not always available)
    }

The maxresdefault.jpg URL exists only for videos uploaded at 720p or higher. Test with a HEAD request before downloading to avoid 404 errors.

Approach 2: innertube /player Endpoint

The innertube API is what the YouTube web player uses internally to fetch video metadata and the streaming manifest. The endpoint accepts a POST request with a JSON body describing the client context. No API key is required for public videos.

The key endpoint is:

POST https://www.youtube.com/youtubei/v1/player

The request body needs a videoId and a context block identifying the client. Using the WEB client returns the full player response including statistics:

# youtube_innertube_basic.py
import httpx

INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"

def get_video_stats(video_id: str) -> dict:
    """
    Fetch full video metadata via YouTube's innertube /player endpoint.
    No API key required. Returns views, duration, channel, and more.
    """
    payload = {
        "videoId": video_id,
        "context": {
            "client": {
                "clientName": "WEB",
                "clientVersion": "2.20240101.00.00",
            }
        },
    }
    headers = {
        "Content-Type": "application/json",
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/121.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
        "Origin": "https://www.youtube.com",
        "Referer": "https://www.youtube.com/",
    }
    resp = httpx.post(INNERTUBE_URL, json=payload, headers=headers, timeout=15)
    resp.raise_for_status()
    return resp.json()

data = get_video_stats("dQw4w9WgXcQ")
print(data.get("videoDetails", {}).get("title"))
print(data.get("videoDetails", {}).get("viewCount"))

Parsing the innertube Response

The innertube response is a large JSON object with many nested layers. The fields you most likely want are nested under videoDetails and microformat. Always check that a key exists before accessing it — YouTube A/B tests cause some fields to be absent in certain response variants.

# youtube_innertube_parse.py
import httpx
from typing import Optional

INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"

def get_video_stats(video_id: str, proxy_url: Optional[str] = None) -> dict:
    """Fetch video stats with optional proxy support."""
    payload = {
        "videoId": video_id,
        "context": {
            "client": {
                "clientName": "WEB",
                "clientVersion": "2.20240101.00.00",
                "hl": "en",
                "gl": "US",
            }
        },
    }
    headers = {
        "Content-Type": "application/json",
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/121.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
        "Origin": "https://www.youtube.com",
        "Referer": "https://www.youtube.com/",
        "X-Youtube-Client-Name": "1",
        "X-Youtube-Client-Version": "2.20240101.00.00",
    }

    client_kwargs: dict = {"timeout": 15}
    if proxy_url:
        client_kwargs["proxy"] = proxy_url

    with httpx.Client(**client_kwargs) as client:
        resp = client.post(INNERTUBE_URL, json=payload, headers=headers)
        resp.raise_for_status()
        return resp.json()


def parse_stats(data: dict) -> dict:
    """
    Parse innertube /player response into a flat statistics dict.
    All fields use .get() with safe defaults to handle A/B response variants.
    """
    details = data.get("videoDetails", {})
    microformat = (
        data.get("microformat", {})
            .get("playerMicroformatRenderer", {})
    )
    streaming = data.get("streamingData", {})

    # Duration parsing
    length_sec = int(details.get("lengthSeconds", 0))
    hours = length_sec // 3600
    minutes = (length_sec % 3600) // 60
    seconds = length_sec % 60
    duration_str = f"{hours}h {minutes}m {seconds}s" if hours else f"{minutes}m {seconds}s"

    # Available stream quality levels
    formats = streaming.get("formats", [])
    qualities = sorted(
        set(f.get("qualityLabel", "") for f in formats if f.get("qualityLabel")),
        reverse=True,
    )

    return {
        "video_id":          details.get("videoId"),
        "title":             details.get("title"),
        "channel":           details.get("author"),
        "channel_id":        details.get("channelId"),
        "view_count":        int(details.get("viewCount", 0)),
        "length_sec":        length_sec,
        "duration":          duration_str,
        "description":       details.get("shortDescription", ""),
        "is_live":           details.get("isLiveContent", False),
        "is_private":        details.get("isPrivate", False),
        "is_unlisted":       details.get("isUnlisted", False),
        "keywords":          details.get("keywords", []),
        "thumbnail_url":     (
            details.get("thumbnail", {}).get("thumbnails", [{}])[-1].get("url", "")
        ),
        "published":         microformat.get("publishDate"),
        "upload_date":       microformat.get("uploadDate"),
        "category":          microformat.get("category"),
        "family_safe":       microformat.get("isFamilySafe"),
        "allow_embed":       microformat.get("allowEmbed"),
        "available_qualities": qualities,
    }


# Usage example
raw = get_video_stats("dQw4w9WgXcQ")
stats = parse_stats(raw)
print(f"Title:     {stats['title']}")
print(f"Channel:   {stats['channel']}")
print(f"Views:     {stats['view_count']:,}")
print(f"Published: {stats['published']}")
print(f"Duration:  {stats['duration']}")
print(f"Category:  {stats['category']}")
print(f"Keywords:  {', '.join(stats['keywords'][:5])}")

Note on likes: The innertube /player endpoint does not return like counts. YouTube removed public like counts from the API surface in 2021. The like count is rendered on the page via a separate innertube call (/next), but parsing it requires handling additional response layers and it is no longer guaranteed to be accurate. For most analytics use cases, view count and engagement metrics derived from comments or description are sufficient.

Anti-Detection Techniques

YouTube's bot detection on innertube is primarily IP-based and timing-based. Here are the techniques that extend how long a single session stays functional.

Request Headers

The two most important headers are Origin and Referer. Without them, YouTube treats the request as coming from a non-browser context and is more likely to rate-limit aggressively:

headers = {
    "Content-Type": "application/json",
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/121.0.0.0 Safari/537.36"
    ),
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://www.youtube.com",
    "Referer": "https://www.youtube.com/",
    "X-Youtube-Client-Name": "1",
    "X-Youtube-Client-Version": "2.20240101.00.00",
    "Sec-Ch-Ua": '"Not A(Brand";v="99", "Google Chrome";v="121", "Chromium";v="121"',
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": '"Windows"',
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
}

Timing and Jitter

Never make requests in tight loops. The most effective anti-detection technique is simply waiting:

import time
import random

def jittered_sleep(base_seconds: float = 1.5, variance: float = 0.5):
    """Sleep for base_seconds plus or minus variance to mimic human timing."""
    delay = base_seconds + random.uniform(-variance, variance)
    time.sleep(max(0.1, delay))  # never sleep less than 100ms

Between consecutive video fetches, a jittered_sleep(2.0, 1.0) call gives you 1-3 seconds of delay with random variance that defeats simple rate-limit detection patterns.

Session Warming

Starting each session with a request to the YouTube homepage collects cookies that subsequent innertube calls benefit from:

import httpx
import time
import random

def create_warmed_session(proxy_url: str = None) -> httpx.Client:
    """
    Create an HTTP client that has collected YouTube session cookies.
    Improves success rate on subsequent innertube calls.
    """
    client_kwargs = {
        "follow_redirects": True,
        "timeout": 20,
    }
    if proxy_url:
        client_kwargs["proxy"] = proxy_url

    client = httpx.Client(**client_kwargs)

    # Visit homepage to collect VISITOR_INFO1_LIVE, YSC, and consent cookies
    try:
        client.get(
            "https://www.youtube.com/",
            headers={
                "User-Agent": (
                    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/121.0.0.0 Safari/537.36"
                ),
                "Accept-Language": "en-US,en;q=0.9",
            },
        )
        time.sleep(random.uniform(1.0, 2.0))
    except httpx.RequestError:
        pass

    return client

Rate Limits and Terms of Service

YouTube does not publish rate limits for innertube. From practical testing:

Requests from a single residential IP: roughly 100-300 requests before you see 429 responses
Datacenter IPs (AWS, GCP, etc.) get blocked much faster — sometimes within 10-20 requests
Adding a 1-2 second delay between requests significantly extends how long a single IP remains functional
Responses are cached by YouTube; fetching the same video repeatedly counts against your quota but does not refetch uncached data

Terms of Service: YouTube's ToS (section 5B) prohibits circumventing technical measures and automated access to the service outside of the official API. Using innertube directly is a gray area — it is the same endpoint the official web client uses, but you are not the intended consumer. For personal use and research it is widely practiced. For commercial products that resell YouTube data, seriously consider the official API or a compliant managed service. Never cache data publicly in a way that reproduces YouTube's content, and always attribute data sources.

Proxy Rotation for Higher Volume

If you are fetching stats for hundreds or thousands of videos, you will need proxy rotation to avoid IP-based throttling.

# youtube_with_proxy.py
import httpx
import time
import random
from typing import Optional

INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"

def get_video_stats_proxied(
    video_id: str,
    proxy_url: Optional[str] = None,
) -> dict:
    """
    Fetch video stats with optional rotating proxy support.
    For high-volume scraping, pass a new proxy_url for each request
    or small batch of requests.
    """
    payload = {
        "videoId": video_id,
        "context": {
            "client": {
                "clientName": "WEB",
                "clientVersion": "2.20240101.00.00",
                "hl": "en",
                "gl": "US",
            }
        },
    }
    headers = {
        "Content-Type": "application/json",
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 Chrome/121.0.0.0 Safari/537.36"
        ),
        "Origin": "https://www.youtube.com",
        "Referer": "https://www.youtube.com/",
        "Accept-Language": "en-US,en;q=0.9",
    }

    client_kwargs: dict = {"timeout": 20, "follow_redirects": True}
    if proxy_url:
        client_kwargs["proxy"] = proxy_url

    with httpx.Client(**client_kwargs) as client:
        resp = client.post(INNERTUBE_URL, json=payload, headers=headers)

        if resp.status_code == 429:
            raise RuntimeError(f"Rate limited (429) for video {video_id}. Rotate proxy.")
        if resp.status_code == 403:
            raise RuntimeError(f"Forbidden (403) for video {video_id}. IP may be blocked.")

        resp.raise_for_status()
        return resp.json()


def fetch_batch(
    video_ids: list,
    proxy_url: Optional[str] = None,
    delay_range: tuple = (1.5, 3.0),
) -> dict:
    """
    Fetch stats for a list of video IDs with jitter between requests.
    Returns a dict mapping video_id to parsed stats or error string.
    """
    results = {}

    for i, vid in enumerate(video_ids):
        try:
            raw = get_video_stats_proxied(vid, proxy_url=proxy_url)
            results[vid] = parse_stats(raw)
            print(f"[{i+1}/{len(video_ids)}] {vid}: {results[vid]['view_count']:,} views")
        except RuntimeError as e:
            print(f"[{i+1}/{len(video_ids)}] {vid}: ERROR - {e}")
            results[vid] = {"error": str(e)}
        except Exception as e:
            print(f"[{i+1}/{len(video_ids)}] {vid}: UNEXPECTED - {e}")
            results[vid] = {"error": str(e)}

        if i < len(video_ids) - 1:
            time.sleep(random.uniform(*delay_range))

    return results


# Example: fetch a batch of videos through a rotating proxy
proxy = "http://user:[email protected]:9000"
video_ids = [
    "dQw4w9WgXcQ",
    "jNQXAC9IVRw",
    "kJQP7kiw5Fk",
    "9bZkp7q19f0",
    "OPf0YbXqDm0",
]

batch_results = fetch_batch(video_ids, proxy_url=proxy, delay_range=(2.0, 4.0))

Residential proxies work significantly better than datacenter proxies for YouTube. For proxy providers, ThorData has a rotating residential pool that handles YouTube well — their per-GB pricing is competitive, and the rotating gateway means you do not have to manage proxy lists yourself.

A few practical notes when running at volume:

Rotate proxies per request, not per session — reusing an IP across many requests defeats the purpose
Add jitter to your request timing (random 0.5-2 second delays)
Monitor 429 response rates; if they exceed 5-10%, your proxy pool is being detected
Cache responses locally — if you need the same video stats multiple times in a day, do not re-fetch

SQLite Storage Schema

For any serious monitoring pipeline, store results in SQLite rather than flat JSON files. This schema handles both raw responses and parsed stats:

# youtube_storage.py
import sqlite3
import json
from datetime import datetime

def init_db(db_path: str = "youtube_stats.db") -> sqlite3.Connection:
    """Initialize SQLite database with schema for YouTube video stats."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")  # better write concurrency
    conn.execute("PRAGMA synchronous=NORMAL")

    conn.executescript("""
        CREATE TABLE IF NOT EXISTS videos (
            video_id        TEXT PRIMARY KEY,
            title           TEXT,
            channel         TEXT,
            channel_id      TEXT,
            category        TEXT,
            published       TEXT,
            upload_date     TEXT,
            duration_sec    INTEGER,
            description     TEXT,
            keywords        TEXT,
            is_live         INTEGER DEFAULT 0,
            is_private      INTEGER DEFAULT 0,
            family_safe     INTEGER DEFAULT 1,
            allow_embed     INTEGER DEFAULT 1,
            first_seen      TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS view_snapshots (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            video_id        TEXT NOT NULL,
            view_count      INTEGER,
            snapshot_at     TEXT DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (video_id) REFERENCES videos(video_id)
        );

        CREATE TABLE IF NOT EXISTS fetch_errors (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            video_id        TEXT,
            error_type      TEXT,
            error_msg       TEXT,
            proxy_used      TEXT,
            occurred_at     TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE INDEX IF NOT EXISTS idx_snapshots_video_id
            ON view_snapshots (video_id);
        CREATE INDEX IF NOT EXISTS idx_snapshots_at
            ON view_snapshots (snapshot_at);
    """)
    conn.commit()
    return conn


def upsert_video(conn: sqlite3.Connection, stats: dict):
    """Insert or update video metadata. Does NOT overwrite view count."""
    conn.execute(
        """
        INSERT INTO videos
            (video_id, title, channel, channel_id, category,
             published, upload_date, duration_sec, description,
             keywords, is_live, is_private, family_safe, allow_embed)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        ON CONFLICT(video_id) DO UPDATE SET
            title        = excluded.title,
            channel      = excluded.channel,
            category     = excluded.category,
            description  = excluded.description,
            keywords     = excluded.keywords
        """,
        (
            stats.get("video_id"),
            stats.get("title"),
            stats.get("channel"),
            stats.get("channel_id"),
            stats.get("category"),
            stats.get("published"),
            stats.get("upload_date"),
            stats.get("length_sec"),
            stats.get("description", "")[:2000],
            json.dumps(stats.get("keywords", [])),
            int(stats.get("is_live", False)),
            int(stats.get("is_private", False)),
            int(stats.get("family_safe", True)),
            int(stats.get("allow_embed", True)),
        ),
    )
    conn.commit()


def record_views(conn: sqlite3.Connection, video_id: str, view_count: int):
    """Record a view count snapshot."""
    conn.execute(
        "INSERT INTO view_snapshots (video_id, view_count) VALUES (?, ?)",
        (video_id, view_count),
    )
    conn.commit()


def get_view_history(conn: sqlite3.Connection, video_id: str, limit: int = 30) -> list:
    """Retrieve recent view count history for a video."""
    rows = conn.execute(
        """
        SELECT view_count, snapshot_at
        FROM view_snapshots
        WHERE video_id = ?
        ORDER BY snapshot_at DESC
        LIMIT ?
        """,
        (video_id, limit),
    ).fetchall()
    return [{"views": r[0], "at": r[1]} for r in rows]


def compute_growth_rate(history: list) -> float:
    """
    Compute daily view growth rate from snapshot history.
    Returns views/day as a float, or None if not enough data.
    """
    if len(history) < 2:
        return None

    oldest = history[-1]
    newest = history[0]

    delta_views = newest["views"] - oldest["views"]
    fmt = "%Y-%m-%d %H:%M:%S"
    try:
        t_old = datetime.strptime(oldest["at"][:19], fmt)
        t_new = datetime.strptime(newest["at"][:19], fmt)
        delta_days = (t_new - t_old).total_seconds() / 86400
        if delta_days <= 0:
            return None
        return delta_views / delta_days
    except Exception:
        return None

Error Handling Patterns

Production innertube scraping needs robust error handling because failures are common and varied:

import httpx
import time
import random

def safe_get_video_stats(
    video_id: str,
    proxy_url: str = None,
    max_retries: int = 3,
) -> tuple:
    """
    Fetch video stats with full error handling and retry logic.
    Returns (stats_dict, status_string).
    On failure, stats_dict is None and status_string describes the problem.
    """
    INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/player"

    for attempt in range(1, max_retries + 1):
        try:
            payload = {
                "videoId": video_id,
                "context": {"client": {"clientName": "WEB", "clientVersion": "2.20240101.00.00"}},
            }
            headers = {
                "Content-Type": "application/json",
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
                "Origin": "https://www.youtube.com",
            }
            client_kwargs = {"timeout": 20}
            if proxy_url:
                client_kwargs["proxy"] = proxy_url

            with httpx.Client(**client_kwargs) as client:
                resp = client.post(INNERTUBE_URL, json=payload, headers=headers)

            if resp.status_code == 429:
                backoff = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited for {video_id}, waiting {backoff:.1f}s (attempt {attempt})")
                time.sleep(backoff)
                if attempt == max_retries:
                    return None, "rate_limited"
                continue

            if resp.status_code == 403:
                return None, "blocked"

            resp.raise_for_status()
            raw = resp.json()

            # Check playability status
            playability = raw.get("playabilityStatus", {})
            status = playability.get("status", "")
            if status in ("ERROR", "LOGIN_REQUIRED", "UNPLAYABLE"):
                reason = playability.get("reason", "unknown reason")
                return None, f"video_unavailable:{reason}"

            if "videoDetails" not in raw:
                return None, "parse_error:missing videoDetails"

            return parse_stats(raw), "success"

        except httpx.RequestError as e:
            if attempt < max_retries:
                time.sleep(2 ** attempt)
            else:
                return None, f"network_error:{e}"

        except Exception as e:
            return None, f"unexpected:{e}"

    return None, "max_retries_exceeded"

Complete Monitoring Pipeline

Putting it all together: a pipeline that monitors a list of videos and tracks view count over time.

# youtube_monitor.py
import time
import random

def monitor_videos(
    video_ids: list,
    db_path: str = "youtube_stats.db",
    proxy_url: str = None,
    delay_range: tuple = (2.0, 5.0),
):
    """
    Run one monitoring cycle: fetch stats for all video IDs,
    store results, and report on growth.
    """
    conn = init_db(db_path)
    success_count = 0
    error_count = 0

    print(f"Starting monitoring run for {len(video_ids)} videos...")

    for i, video_id in enumerate(video_ids):
        stats, status = safe_get_video_stats(video_id, proxy_url=proxy_url)

        if stats and status == "success":
            upsert_video(conn, stats)
            record_views(conn, video_id, stats["view_count"])

            history = get_view_history(conn, video_id, limit=10)
            growth = compute_growth_rate(history)

            growth_str = f"{growth:,.0f} views/day" if growth else "first snapshot"
            print(
                f"[{i+1}/{len(video_ids)}] {stats['title'][:50]}... "
                f"| {stats['view_count']:,} views | {growth_str}"
            )
            success_count += 1
        else:
            error_count += 1
            conn.execute(
                "INSERT INTO fetch_errors (video_id, error_type, error_msg, proxy_used) "
                "VALUES (?, ?, ?, ?)",
                (video_id, "fetch_failed", status, proxy_url),
            )
            conn.commit()
            print(f"[{i+1}/{len(video_ids)}] {video_id}: FAILED - {status}")

        if i < len(video_ids) - 1:
            time.sleep(random.uniform(*delay_range))

    conn.close()
    print(f"\nDone: {success_count} ok, {error_count} errors")


# Run it
PROXY = "http://user:[email protected]:9000"

VIDEO_IDS = [
    "dQw4w9WgXcQ",
    "jNQXAC9IVRw",
    "kJQP7kiw5Fk",
    "9bZkp7q19f0",
    "OPf0YbXqDm0",
]

monitor_videos(VIDEO_IDS, proxy_url=PROXY, delay_range=(2.5, 5.0))

Channel Video Discovery

The innertube API also supports channel video lists via the /browse endpoint. This lets you discover all videos on a channel without the official API:

def get_channel_videos(
    channel_id: str,
    max_results: int = 50,
) -> list:
    """
    Fetch recent video IDs from a YouTube channel using innertube /browse.
    channel_id: UCxxxxxxxxxxxxxxxxxx format.
    Returns a list of video ID strings.
    """
    url = "https://www.youtube.com/youtubei/v1/browse"
    payload = {
        "browseId": channel_id,
        "params": "EgZ2aWRlb3M%3D",  # base64-encoded "videos" tab selector
        "context": {
            "client": {
                "clientName": "WEB",
                "clientVersion": "2.20240101.00.00",
            }
        },
    }
    headers = {
        "Content-Type": "application/json",
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 Chrome/121.0.0.0 Safari/537.36"
        ),
        "Origin": "https://www.youtube.com",
    }

    resp = httpx.post(url, json=payload, headers=headers, timeout=20)
    resp.raise_for_status()
    data = resp.json()

    # Walk the deeply nested response to extract video IDs
    video_ids = []
    _extract_video_ids(data, video_ids)
    return video_ids[:max_results]


def _extract_video_ids(obj, results: list):
    """Recursively search innertube browse response for videoId fields."""
    if isinstance(obj, dict):
        if "videoId" in obj and isinstance(obj["videoId"], str):
            vid = obj["videoId"]
            if vid not in results:
                results.append(vid)
        for v in obj.values():
            _extract_video_ids(v, results)
    elif isinstance(obj, list):
        for item in obj:
            _extract_video_ids(item, results)

When to Use Each Approach

Approach	Data available	Volume	Complexity
oEmbed	Title, thumbnail, author	High (lenient limits)	Minimal
innertube /player (no proxy)	Views, duration, description, channel	Low (~100-300/IP/day)	Low
innertube /player + rotating proxies	Views, duration, description, channel	Medium (1k-10k/day)	Medium
Official YouTube Data API v3	Views, likes, comments, full metadata	10k units/day free	Medium (auth setup)
Managed scraper (Apify)	Full stats + comments	Unlimited	Low (pay per run)

For one-off scripts and internal tools, innertube direct is the fastest path. The oEmbed endpoint is the right choice when you only need titles and thumbnails. When you hit volume limits or need a stable production pipeline, the official API or a managed service is worth the setup time.

Legal Notes

YouTube's Terms of Service prohibit automated access outside the official API. The innertube approach described here operates in a gray zone — you are accessing YouTube's own internal endpoint, but as an unintended client.

In practice: - Personal use and research: Generally tolerated. No known enforcement against individuals running small scripts. - Commercial products reselling YouTube data: High legal risk. YouTube has pursued enforcement against larger-scale scrapers. Use the official API or licensed data providers. - Building a product that competes with YouTube: The ToS explicitly prohibits this use case.

The right approach for any production system with commercial use is to start with the official API. Innertube is appropriate for tooling, research pipelines, and monitoring workflows where you are the end user of the data.

Summary

The innertube approach has been working reliably for several years, but build your integration defensively: validate that expected keys exist, log raw responses when parsing fails, and pin the clientVersion string rather than auto-generating it — YouTube occasionally returns different response shapes for newer client versions.

The oEmbed endpoint is the safest choice for basic metadata. Innertube /player gives you full statistics without any API key or Google Cloud account. For volume above a few hundred videos per day, rotating residential proxies via ThorData are the difference between a functional pipeline and a constant IP-blocking battle. Pair everything with SQLite for local caching and you have a solid, self-contained YouTube monitoring system.