How to Scrape YouTube Comments Without the API Key

2026-03-30 [youtube scraping api python]

How to Scrape YouTube Comments Without the API Key

If you've ever tried to pull YouTube comments at scale using the official Data API v3, you've hit the quota wall. The default is 10,000 units per day — which sounds generous until you realize that listing videos from a channel costs 100 units per call, and you're burning through your budget before you've even started pulling comments.

The good news: there's a better way. YouTube's own apps use an internal endpoint called InnerTube, and it doesn't require an API key.

The Problem With YouTube Data API v3

YouTube Data API v3 is the official route. You register a project in Google Cloud Console, generate an API key, and start making requests. But the quota system is brutal:

Video list requests: 100 units each
Comment thread requests: 1 unit each — but you can only fetch 100 comments per page
Search queries: 100 units each
Default daily quota: 10,000 units

If you're monitoring a popular channel with thousands of videos and need fresh comment data, you'll exhaust your daily quota within minutes. Requesting a quota increase requires a formal review process that can take weeks and often gets rejected for scraping use cases.

For context: scraping all comments from a single viral video with 50,000 comments would cost 500 quota units just for the comment pages — plus another 100 for the initial video lookup. Across 20 videos, you're already at half your daily limit.

Enter InnerTube: YouTube's Internal API

InnerTube is the API that YouTube's own web app, Android app, and iOS app all use under the hood. It's not publicly documented, but it's also not secret — you can observe it in browser devtools on any YouTube page.

The key endpoint for fetching comments is:

POST https://www.youtube.com/youtubei/v1/next

No API key. No OAuth. Just a client context in the request body and the right headers.

How to find it yourself

Open any YouTube video in Chrome, then:

Open DevTools (F12) and go to the Network tab
Filter requests by "next" or "youtubei"
Scroll down to the comments section on the video page
Watch the network requests — you'll see the next endpoint fire
Right-click the request, then Copy as cURL to get the exact payload

This is how YouTube loads comments lazily as you scroll. The first request fetches the initial comment batch. Each subsequent scroll triggers another next call with a continuation token.

Complete YouTube Comment Scraper

Here's a full working scraper that handles first-page loading, pagination, reply threads, and CSV/JSON export:

#!/usr/bin/env python3
"""
YouTube Comment Scraper via InnerTube API (no API key required)

Fetches all comments from a YouTube video, including:
- Comment text, author, likes, timestamps
- Reply threads (nested comments)
- Pagination via continuation tokens
- Export to JSON or CSV

Usage:
    python yt_comments.py VIDEO_ID [--format json|csv] [--output filename]
    python yt_comments.py dQw4w9WgXcQ --format csv --output comments.csv
    python yt_comments.py "https://youtube.com/watch?v=dQw4w9WgXcQ" --max 500
"""

import requests
import json
import csv
import time
import random
import re
import argparse
from datetime import datetime


CLIENT_VERSION = "2.20260101.00.00"

INNERTUBE_CONTEXT = {
    "client": {
        "clientName": "WEB",
        "clientVersion": CLIENT_VERSION,
        "hl": "en",
        "gl": "US",
        "originalUrl": "https://www.youtube.com",
        "platform": "DESKTOP",
        "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    }
}

HEADERS = {
    "Content-Type": "application/json",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                  "(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "X-YouTube-Client-Name": "1",
    "X-YouTube-Client-Version": CLIENT_VERSION,
    "Origin": "https://www.youtube.com",
    "Referer": "https://www.youtube.com/",
    "Accept-Language": "en-US,en;q=0.9",
}

INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/next"


def extract_video_id(input_str: str) -> str:
    """Extract video ID from URL or return as-is if already an ID."""
    patterns = [
        r'(?:v=|/v/|youtu\.be/|/embed/|/shorts/)([a-zA-Z0-9_-]{11})',
        r'^([a-zA-Z0-9_-]{11})$',
    ]
    for pattern in patterns:
        match = re.search(pattern, input_str)
        if match:
            return match.group(1)
    raise ValueError(f"Could not extract video ID from: {input_str}")


def parse_count(text: str) -> int:
    """Parse '1.2K', '5M', '320' into integers."""
    text = text.strip().replace(",", "")
    if not text or text == "0":
        return 0
    multipliers = {"K": 1_000, "M": 1_000_000, "B": 1_000_000_000}
    for suffix, mult in multipliers.items():
        if text.upper().endswith(suffix):
            return int(float(text[:-1]) * mult)
    try:
        return int(text)
    except ValueError:
        return 0


def parse_comment_renderer(renderer: dict) -> dict:
    """Extract structured data from a commentRenderer node."""
    text_runs = renderer.get("contentText", {}).get("runs", [])
    text = "".join(run.get("text", "") for run in text_runs)

    author = renderer.get("authorText", {}).get("simpleText", "Unknown")
    author_channel_id = (
        renderer.get("authorEndpoint", {})
        .get("browseEndpoint", {})
        .get("browseId", "")
    )

    likes_text = renderer.get("voteCount", {}).get("simpleText", "0")
    likes = parse_count(likes_text)

    published = renderer.get("publishedTimeText", {}).get("runs", [{}])
    published_text = published[0].get("text", "") if published else ""

    comment_id = renderer.get("commentId", "")

    is_pinned = "pinnedCommentBadge" in renderer
    is_hearted = bool(
        renderer.get("actionButtons", {})
        .get("commentActionButtonsRenderer", {})
        .get("creatorHeart", {})
    )

    return {
        "comment_id": comment_id,
        "author": author,
        "author_channel_id": author_channel_id,
        "text": text,
        "likes": likes,
        "likes_display": likes_text,
        "published": published_text,
        "is_pinned": is_pinned,
        "is_hearted": is_hearted,
        "is_reply": False,
        "parent_id": "",
    }


def fetch_with_backoff(payload: dict, max_retries: int = 5) -> dict:
    """Make InnerTube request with exponential backoff on rate limits."""
    for attempt in range(max_retries):
        try:
            response = requests.post(
                INNERTUBE_URL, json=payload, headers=HEADERS, timeout=30
            )
            if response.status_code == 200:
                return response.json()
            elif response.status_code == 429:
                wait = (2 ** attempt) + random.uniform(0.5, 2.0)
                print(f"  Rate limited (429). Waiting {wait:.1f}s... (attempt {attempt+1})")
                time.sleep(wait)
            else:
                print(f"  HTTP {response.status_code} on attempt {attempt+1}")
                time.sleep(2)
        except requests.exceptions.RequestException as e:
            print(f"  Request error: {e}")
            time.sleep(3)
    raise Exception("Max retries exceeded - YouTube may be blocking this IP")


def is_soft_blocked(data: dict) -> bool:
    """Detect soft blocks where YouTube returns 200 but empty data."""
    endpoints = data.get("onResponseReceivedEndpoints", [])
    if not endpoints:
        return True
    alerts = data.get("alerts", [])
    for alert in alerts:
        alert_text = (
            alert.get("alertWithButtonRenderer", {})
            .get("text", {}).get("simpleText", "")
        )
        if "error" in alert_text.lower():
            return True
    return False


def get_initial_comments(video_id: str) -> tuple[list[dict], str | None]:
    """Fetch the first page of comments for a video."""
    payload = {
        "context": INNERTUBE_CONTEXT,
        "videoId": video_id,
    }

    data = fetch_with_backoff(payload)
    comments = []
    next_token = None

    endpoints = data.get("onResponseReceivedEndpoints", [])
    for endpoint in endpoints:
        # First page uses reloadContinuationItemsCommand
        reload = endpoint.get("reloadContinuationItemsCommand", {})
        items = reload.get("continuationItems", [])

        # Sometimes appendContinuationItemsAction instead
        if not items:
            append = endpoint.get("appendContinuationItemsAction", {})
            items = append.get("continuationItems", [])

        for item in items:
            thread = item.get("commentThreadRenderer", {})
            if thread:
                comment_renderer = (
                    thread.get("comment", {}).get("commentRenderer", {})
                )
                if comment_renderer:
                    comment = parse_comment_renderer(comment_renderer)
                    comments.append(comment)

            # Continuation token for next page
            cont = item.get("continuationItemRenderer", {})
            token = (
                cont.get("continuationEndpoint", {})
                .get("continuationCommand", {})
                .get("token")
            )
            if token:
                next_token = token

    return comments, next_token


def get_comment_page(continuation_token: str) -> tuple[list[dict], str | None]:
    """Fetch a subsequent page of comments using a continuation token."""
    payload = {
        "context": INNERTUBE_CONTEXT,
        "continuation": continuation_token,
    }

    data = fetch_with_backoff(payload)
    comments = []
    next_token = None

    for endpoint in data.get("onResponseReceivedEndpoints", []):
        items = (
            endpoint.get("appendContinuationItemsAction", {})
            .get("continuationItems", [])
        )
        for item in items:
            thread = item.get("commentThreadRenderer", {})
            if thread:
                cr = thread.get("comment", {}).get("commentRenderer", {})
                if cr:
                    comments.append(parse_comment_renderer(cr))

            # Direct commentRenderer (in reply threads)
            direct = item.get("commentRenderer", {})
            if direct and "commentId" in direct:
                reply = parse_comment_renderer(direct)
                reply["is_reply"] = True
                comments.append(reply)

            cont = item.get("continuationItemRenderer", {})
            token = (
                cont.get("continuationEndpoint", {})
                .get("continuationCommand", {})
                .get("token")
            )
            if not token:
                token = (
                    cont.get("button", {}).get("buttonRenderer", {})
                    .get("command", {}).get("continuationCommand", {})
                    .get("token")
                )
            if token:
                next_token = token

    return comments, next_token


def scrape_all_comments(
    video_id: str, max_comments: int = 0, delay_range: tuple = (1.0, 2.5)
) -> list[dict]:
    """
    Scrape all comments from a video with pagination.

    Args:
        video_id: YouTube video ID or URL
        max_comments: Stop after N comments (0 = unlimited)
        delay_range: Random delay between requests in seconds
    """
    video_id = extract_video_id(video_id)
    print(f"Fetching comments for video: {video_id}")

    all_comments = []
    page = 0

    # First page
    comments, next_token = get_initial_comments(video_id)
    all_comments.extend(comments)
    page += 1
    print(f"  Page {page}: {len(comments)} comments (total: {len(all_comments)})")

    # Paginate through remaining pages
    while next_token:
        if max_comments and len(all_comments) >= max_comments:
            all_comments = all_comments[:max_comments]
            print(f"  Reached max_comments limit ({max_comments})")
            break

        time.sleep(random.uniform(*delay_range))

        comments, next_token = get_comment_page(next_token)
        if not comments:
            break

        all_comments.extend(comments)
        page += 1
        print(f"  Page {page}: {len(comments)} comments (total: {len(all_comments)})")

    print(f"\nDone. Collected {len(all_comments)} comments across {page} pages.")
    return all_comments


def export_json(comments: list[dict], filename: str):
    """Export comments to JSON file."""
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(comments, f, indent=2, ensure_ascii=False)
    print(f"Exported {len(comments)} comments to {filename}")


def export_csv(comments: list[dict], filename: str):
    """Export comments to CSV file."""
    if not comments:
        print("No comments to export")
        return
    fieldnames = [
        "comment_id", "author", "author_channel_id", "text",
        "likes", "likes_display", "published", "is_pinned",
        "is_hearted", "is_reply", "parent_id"
    ]
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
        writer.writeheader()
        writer.writerows(comments)
    print(f"Exported {len(comments)} comments to {filename}")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Scrape YouTube comments via InnerTube")
    parser.add_argument("video", help="Video ID or full YouTube URL")
    parser.add_argument("--format", choices=["json", "csv"], default="json")
    parser.add_argument("--output", "-o", help="Output filename")
    parser.add_argument("--max", type=int, default=0, help="Max comments (0=all)")
    args = parser.parse_args()

    vid = extract_video_id(args.video)
    results = scrape_all_comments(vid, max_comments=args.max)

    out_file = args.output or f"comments_{vid}.{args.format}"
    if args.format == "csv":
        export_csv(results, out_file)
    else:
        export_json(results, out_file)

Expected output

Running python yt_comments.py dQw4w9WgXcQ --max 100 --format json produces:

[
  {
    "comment_id": "UgxKz...",
    "author": "MusicFan2024",
    "author_channel_id": "UCa1b2c...",
    "text": "I can't believe this is still getting views in 2026. Legend.",
    "likes": 45200,
    "likes_display": "45K",
    "published": "2 months ago",
    "is_pinned": false,
    "is_hearted": false,
    "is_reply": false,
    "parent_id": ""
  },
  {
    "comment_id": "UgyBm...",
    "author": "RickAstleyOfficial",
    "author_channel_id": "UCuA...",
    "text": "Thank you all for the love",
    "likes": 312000,
    "likes_display": "312K",
    "published": "1 year ago",
    "is_pinned": true,
    "is_hearted": true,
    "is_reply": false,
    "parent_id": ""
  }
]

CSV output looks like:

comment_id,author,author_channel_id,text,likes,likes_display,published,is_pinned,is_hearted,is_reply,parent_id
UgxKz...,MusicFan2024,UCa1b2c...,"I can't believe this...",45200,45K,2 months ago,False,False,False,

Use Case 1: Sentiment Analysis Pipeline

Feed YouTube comments into a sentiment classifier — useful for brand monitoring, product feedback analysis, or gauging audience reaction to content:

"""
YouTube Comment Sentiment Analyzer
Classifies comments as positive/negative/neutral using keyword scoring.
No ML dependencies required — works with just the standard library.
"""
from collections import Counter

POSITIVE_SIGNALS = [
    "love", "great", "amazing", "awesome", "best", "perfect",
    "beautiful", "excellent", "fantastic", "brilliant", "helpful",
    "thank", "thanks", "masterpiece", "goat", "fire", "legendary",
]
NEGATIVE_SIGNALS = [
    "hate", "worst", "terrible", "awful", "trash", "garbage",
    "boring", "cringe", "bad", "horrible", "disgusting", "waste",
    "clickbait", "scam", "fake", "stolen", "copied",
]

def classify_sentiment(text: str) -> str:
    text_lower = text.lower()
    pos = sum(1 for w in POSITIVE_SIGNALS if w in text_lower)
    neg = sum(1 for w in NEGATIVE_SIGNALS if w in text_lower)
    if pos > neg:
        return "positive"
    elif neg > pos:
        return "negative"
    return "neutral"


def analyze_video_sentiment(video_id: str, max_comments: int = 500):
    comments = scrape_all_comments(video_id, max_comments=max_comments)

    sentiments = Counter()
    for comment in comments:
        sentiment = classify_sentiment(comment["text"])
        comment["sentiment"] = sentiment
        sentiments[sentiment] += 1

    total = len(comments)
    print(f"\n--- Sentiment Analysis for {video_id} ---")
    print(f"Total comments analyzed: {total}")
    for sentiment, count in sentiments.most_common():
        pct = (count / total) * 100
        print(f"  {sentiment}: {count} ({pct:.1f}%)")

    # Top positive comments by engagement
    positive = sorted(
        [c for c in comments if c.get("sentiment") == "positive"],
        key=lambda x: x["likes"], reverse=True
    )
    print(f"\nTop 5 positive comments:")
    for c in positive[:5]:
        print(f"  [{c['likes_display']} likes] {c['text'][:100]}")

    return comments

Expected output:

--- Sentiment Analysis for dQw4w9WgXcQ ---
Total comments analyzed: 500
  positive: 287 (57.4%)
  neutral: 156 (31.2%)
  negative: 57 (11.4%)

Top 5 positive comments:
  [312K likes] Thank you all for the love
  [45K likes] I can't believe this is still getting views in 2026. Legend.
  [12K likes] This song is genuinely amazing. No irony.
  [8.2K likes] Best music video of all time, no debate.
  [5.1K likes] Love how this became a universal internet moment.

Use Case 2: Channel Comment Monitor

Track comments across all recent videos from a channel. Useful for brand monitoring, community management, or competitive analysis:

"""
Channel Comment Monitor
Fetches recent videos from a YouTube channel and collects
comments from each, producing a per-video summary report.
"""

def get_channel_video_ids(channel_handle: str, max_videos: int = 10) -> list[str]:
    """
    Get recent video IDs from a channel using the InnerTube browse endpoint.
    Pass the channel handle like '@mkbhd' or a channel ID like 'UCBJycsmduvYEL83R_U4JriQ'.
    """
    payload = {
        "context": INNERTUBE_CONTEXT,
        "browseId": channel_handle,
        "params": "EgZ2aWRlb3PyBgQKAjoA",  # Videos tab
    }

    url = "https://www.youtube.com/youtubei/v1/browse"
    resp = requests.post(url, json=payload, headers=HEADERS, timeout=30)
    data = resp.json()

    video_ids = []
    tabs = (
        data.get("contents", {})
        .get("twoColumnBrowseResultsRenderer", {})
        .get("tabs", [])
    )
    for tab in tabs:
        contents = (
            tab.get("tabRenderer", {})
            .get("content", {})
            .get("richGridRenderer", {})
            .get("contents", [])
        )
        for item in contents:
            video = (
                item.get("richItemRenderer", {})
                .get("content", {})
                .get("videoRenderer", {})
            )
            vid_id = video.get("videoId")
            if vid_id:
                video_ids.append(vid_id)
                if len(video_ids) >= max_videos:
                    return video_ids
    return video_ids


def monitor_channel(channel_handle: str, max_videos: int = 5,
                    comments_per_video: int = 100):
    """Fetch and summarize comments from a channel's recent videos."""
    print(f"Fetching recent videos from {channel_handle}...")
    video_ids = get_channel_video_ids(channel_handle, max_videos)
    print(f"Found {len(video_ids)} videos\n")

    results = []
    for i, vid in enumerate(video_ids):
        print(f"[{i+1}/{len(video_ids)}] Video: https://youtube.com/watch?v={vid}")
        comments = scrape_all_comments(vid, max_comments=comments_per_video)
        total_likes = sum(c["likes"] for c in comments)
        results.append({
            "video_id": vid,
            "url": f"https://youtube.com/watch?v={vid}",
            "comment_count": len(comments),
            "total_engagement": total_likes,
            "comments": comments,
        })
        if i < len(video_ids) - 1:
            time.sleep(random.uniform(3, 6))

    # Print summary
    print("\n" + "=" * 60)
    print("CHANNEL COMMENT SUMMARY")
    print("=" * 60)
    for entry in results:
        print(f"\n  Video: {entry['url']}")
        print(f"  Comments: {entry['comment_count']}")
        print(f"  Total engagement: {entry['total_engagement']:,} likes")
        if entry["comments"]:
            top = max(entry["comments"], key=lambda c: c["likes"])
            print(f"  Top comment: [{top['likes_display']}] {top['text'][:80]}")

    # Export full results
    export_json(
        [{"video": r["url"], "comments": r["comments"]} for r in results],
        f"channel_comments_{channel_handle.strip('@')}.json"
    )
    return results

Use Case 3: Keyword Alert System

Monitor YouTube comments for specific keywords — brand mentions, competitor names, or trending topics:

"""
YouTube Comment Keyword Monitor
Watches for specific keywords in comments and logs matches.
Useful for brand monitoring or tracking discussions.
"""

def keyword_scan(video_id: str, keywords: list[str],
                 max_comments: int = 1000) -> list[dict]:
    """Scan video comments for keyword matches."""
    comments = scrape_all_comments(video_id, max_comments=max_comments)

    matches = []
    keywords_lower = [k.lower() for k in keywords]

    for comment in comments:
        text_lower = comment["text"].lower()
        matched_keywords = [k for k in keywords_lower if k in text_lower]
        if matched_keywords:
            matches.append({
                **comment,
                "matched_keywords": matched_keywords,
            })

    print(f"\nKeyword scan results for {video_id}:")
    print(f"  Total comments scanned: {len(comments)}")
    print(f"  Keyword matches found: {len(matches)}")
    for kw in keywords_lower:
        count = sum(1 for m in matches if kw in m["matched_keywords"])
        print(f"    '{kw}': {count} mentions")

    print(f"\nTop keyword mentions by engagement:")
    matches.sort(key=lambda x: x["likes"], reverse=True)
    for m in matches[:10]:
        print(f"  [{m['likes_display']} likes] [{', '.join(m['matched_keywords'])}] "
              f"{m['text'][:80]}")

    return matches


# Example: scan a tech review video for brand mentions
# matches = keyword_scan("VIDEO_ID", ["iphone", "samsung", "pixel", "oneplus"])

Rate Limiting and Anti-Detection

InnerTube is more permissive than the Data API, but it's not unlimited. Here are practical rules based on real-world testing:

Request timing guidelines

Scenario	Recommended delay	Notes
Paginating same video	1-2 seconds	Random jitter via `random.uniform()`
Switching between videos	3-6 seconds	Avoids pattern detection
After 50 consecutive requests	15-30 second pause	Prevents soft throttling
After hitting a 429	Exponential backoff	Start at 4s, double each retry

Never use fixed intervals — that's a bot signature. Always add randomization.

IP management for scale

A single residential IP handles ~200-300 requests before soft throttling
Datacenter IPs work for low volume but get flagged faster
For 10,000+ comments/day, rotate IPs per video
ThorData residential proxies work well for YouTube — their rotating pool avoids the pattern detection that sticky IPs trigger

Adding proxy support

PROXY_URL = "http://user:[email protected]:9000"

session = requests.Session()
session.proxies = {"http": PROXY_URL, "https": PROXY_URL}
# Replace requests.post() with session.post() throughout the scraper

Detection signals to avoid

Same clientVersion for months — update it from YouTube's page source periodically
Requesting the same video repeatedly from one IP within an hour
Perfectly regular timing between requests (real users don't scroll at fixed intervals)
Missing browser headers — include Accept-Language, Origin, and Referer

Troubleshooting

Problem	Cause	Fix
Empty comment list on first page	Comments disabled on video	Check `commentDisabled` field in response
429 errors after ~50 requests	IP throttled by YouTube	Increase delays, rotate proxies
`KeyError` on response fields	YouTube updated their schema	Check browser DevTools for current field names
Comments load but no continuation	Reached the end	Normal — all comments have been fetched
Garbled text in output	Emoji/unicode encoding	Use `ensure_ascii=False` in JSON export
Only 20 comments returned	Didn't follow pagination	Use the continuation token loop
200 OK but empty `endpoints`	Soft block / IP flagged	Switch IP, wait 10+ minutes

Skip the Setup: Ready-Made Scraper

If you'd rather not maintain this yourself, there's a free YouTube Comments Scraper on Apify that handles pagination, rate limiting, and comment threading out of the box. You specify a video URL or channel, and it returns structured JSON. Good starting point if you want results fast without debugging InnerTube's evolving response schema.

Wrapping Up

The InnerTube approach gives you YouTube comment access without API quotas. It's the same data path the official YouTube app uses, which means it tends to be stable — YouTube can't break their own app. The schema does shift occasionally when YouTube rolls out frontend changes, but the core structure has been consistent since 2021.

The complete scraper above handles pagination, rate limiting, soft-block detection, CSV/JSON export, and reply threads. For lightweight use, run it against a single video. For production workloads, pair it with rotating residential proxies and the channel monitoring or keyword alert use cases to build a real comment intelligence pipeline.