How to Scrape YouTube Comments Without the API Key
How to Scrape YouTube Comments Without the API Key
If you've ever tried to pull YouTube comments at scale using the official Data API v3, you've hit the quota wall. The default is 10,000 units per day — which sounds generous until you realize that listing videos from a channel costs 100 units per call, and you're burning through your budget before you've even started pulling comments.
The good news: there's a better way. YouTube's own apps use an internal endpoint called InnerTube, and it doesn't require an API key.
The Problem With YouTube Data API v3
YouTube Data API v3 is the official route. You register a project in Google Cloud Console, generate an API key, and start making requests. But the quota system is brutal:
- Video list requests: 100 units each
- Comment thread requests: 1 unit each — but you can only fetch 100 comments per page
- Search queries: 100 units each
- Default daily quota: 10,000 units
If you're monitoring a popular channel with thousands of videos and need fresh comment data, you'll exhaust your daily quota within minutes. Requesting a quota increase requires a formal review process that can take weeks and often gets rejected for scraping use cases.
For context: scraping all comments from a single viral video with 50,000 comments would cost 500 quota units just for the comment pages — plus another 100 for the initial video lookup. Across 20 videos, you're already at half your daily limit.
Enter InnerTube: YouTube's Internal API
InnerTube is the API that YouTube's own web app, Android app, and iOS app all use under the hood. It's not publicly documented, but it's also not secret — you can observe it in browser devtools on any YouTube page.
The key endpoint for fetching comments is:
POST https://www.youtube.com/youtubei/v1/next
No API key. No OAuth. Just a client context in the request body and the right headers.
How to find it yourself
Open any YouTube video in Chrome, then:
- Open DevTools (F12) and go to the Network tab
- Filter requests by "next" or "youtubei"
- Scroll down to the comments section on the video page
- Watch the network requests — you'll see the
nextendpoint fire - Right-click the request, then Copy as cURL to get the exact payload
This is how YouTube loads comments lazily as you scroll. The first request fetches the initial comment batch. Each subsequent scroll triggers another next call with a continuation token.
Complete YouTube Comment Scraper
Here's a full working scraper that handles first-page loading, pagination, reply threads, and CSV/JSON export:
#!/usr/bin/env python3
"""
YouTube Comment Scraper via InnerTube API (no API key required)
Fetches all comments from a YouTube video, including:
- Comment text, author, likes, timestamps
- Reply threads (nested comments)
- Pagination via continuation tokens
- Export to JSON or CSV
Usage:
python yt_comments.py VIDEO_ID [--format json|csv] [--output filename]
python yt_comments.py dQw4w9WgXcQ --format csv --output comments.csv
python yt_comments.py "https://youtube.com/watch?v=dQw4w9WgXcQ" --max 500
"""
import requests
import json
import csv
import time
import random
import re
import argparse
from datetime import datetime
CLIENT_VERSION = "2.20260101.00.00"
INNERTUBE_CONTEXT = {
"client": {
"clientName": "WEB",
"clientVersion": CLIENT_VERSION,
"hl": "en",
"gl": "US",
"originalUrl": "https://www.youtube.com",
"platform": "DESKTOP",
"userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
}
HEADERS = {
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"X-YouTube-Client-Name": "1",
"X-YouTube-Client-Version": CLIENT_VERSION,
"Origin": "https://www.youtube.com",
"Referer": "https://www.youtube.com/",
"Accept-Language": "en-US,en;q=0.9",
}
INNERTUBE_URL = "https://www.youtube.com/youtubei/v1/next"
def extract_video_id(input_str: str) -> str:
"""Extract video ID from URL or return as-is if already an ID."""
patterns = [
r'(?:v=|/v/|youtu\.be/|/embed/|/shorts/)([a-zA-Z0-9_-]{11})',
r'^([a-zA-Z0-9_-]{11})$',
]
for pattern in patterns:
match = re.search(pattern, input_str)
if match:
return match.group(1)
raise ValueError(f"Could not extract video ID from: {input_str}")
def parse_count(text: str) -> int:
"""Parse '1.2K', '5M', '320' into integers."""
text = text.strip().replace(",", "")
if not text or text == "0":
return 0
multipliers = {"K": 1_000, "M": 1_000_000, "B": 1_000_000_000}
for suffix, mult in multipliers.items():
if text.upper().endswith(suffix):
return int(float(text[:-1]) * mult)
try:
return int(text)
except ValueError:
return 0
def parse_comment_renderer(renderer: dict) -> dict:
"""Extract structured data from a commentRenderer node."""
text_runs = renderer.get("contentText", {}).get("runs", [])
text = "".join(run.get("text", "") for run in text_runs)
author = renderer.get("authorText", {}).get("simpleText", "Unknown")
author_channel_id = (
renderer.get("authorEndpoint", {})
.get("browseEndpoint", {})
.get("browseId", "")
)
likes_text = renderer.get("voteCount", {}).get("simpleText", "0")
likes = parse_count(likes_text)
published = renderer.get("publishedTimeText", {}).get("runs", [{}])
published_text = published[0].get("text", "") if published else ""
comment_id = renderer.get("commentId", "")
is_pinned = "pinnedCommentBadge" in renderer
is_hearted = bool(
renderer.get("actionButtons", {})
.get("commentActionButtonsRenderer", {})
.get("creatorHeart", {})
)
return {
"comment_id": comment_id,
"author": author,
"author_channel_id": author_channel_id,
"text": text,
"likes": likes,
"likes_display": likes_text,
"published": published_text,
"is_pinned": is_pinned,
"is_hearted": is_hearted,
"is_reply": False,
"parent_id": "",
}
def fetch_with_backoff(payload: dict, max_retries: int = 5) -> dict:
"""Make InnerTube request with exponential backoff on rate limits."""
for attempt in range(max_retries):
try:
response = requests.post(
INNERTUBE_URL, json=payload, headers=HEADERS, timeout=30
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
wait = (2 ** attempt) + random.uniform(0.5, 2.0)
print(f" Rate limited (429). Waiting {wait:.1f}s... (attempt {attempt+1})")
time.sleep(wait)
else:
print(f" HTTP {response.status_code} on attempt {attempt+1}")
time.sleep(2)
except requests.exceptions.RequestException as e:
print(f" Request error: {e}")
time.sleep(3)
raise Exception("Max retries exceeded - YouTube may be blocking this IP")
def is_soft_blocked(data: dict) -> bool:
"""Detect soft blocks where YouTube returns 200 but empty data."""
endpoints = data.get("onResponseReceivedEndpoints", [])
if not endpoints:
return True
alerts = data.get("alerts", [])
for alert in alerts:
alert_text = (
alert.get("alertWithButtonRenderer", {})
.get("text", {}).get("simpleText", "")
)
if "error" in alert_text.lower():
return True
return False
def get_initial_comments(video_id: str) -> tuple[list[dict], str | None]:
"""Fetch the first page of comments for a video."""
payload = {
"context": INNERTUBE_CONTEXT,
"videoId": video_id,
}
data = fetch_with_backoff(payload)
comments = []
next_token = None
endpoints = data.get("onResponseReceivedEndpoints", [])
for endpoint in endpoints:
# First page uses reloadContinuationItemsCommand
reload = endpoint.get("reloadContinuationItemsCommand", {})
items = reload.get("continuationItems", [])
# Sometimes appendContinuationItemsAction instead
if not items:
append = endpoint.get("appendContinuationItemsAction", {})
items = append.get("continuationItems", [])
for item in items:
thread = item.get("commentThreadRenderer", {})
if thread:
comment_renderer = (
thread.get("comment", {}).get("commentRenderer", {})
)
if comment_renderer:
comment = parse_comment_renderer(comment_renderer)
comments.append(comment)
# Continuation token for next page
cont = item.get("continuationItemRenderer", {})
token = (
cont.get("continuationEndpoint", {})
.get("continuationCommand", {})
.get("token")
)
if token:
next_token = token
return comments, next_token
def get_comment_page(continuation_token: str) -> tuple[list[dict], str | None]:
"""Fetch a subsequent page of comments using a continuation token."""
payload = {
"context": INNERTUBE_CONTEXT,
"continuation": continuation_token,
}
data = fetch_with_backoff(payload)
comments = []
next_token = None
for endpoint in data.get("onResponseReceivedEndpoints", []):
items = (
endpoint.get("appendContinuationItemsAction", {})
.get("continuationItems", [])
)
for item in items:
thread = item.get("commentThreadRenderer", {})
if thread:
cr = thread.get("comment", {}).get("commentRenderer", {})
if cr:
comments.append(parse_comment_renderer(cr))
# Direct commentRenderer (in reply threads)
direct = item.get("commentRenderer", {})
if direct and "commentId" in direct:
reply = parse_comment_renderer(direct)
reply["is_reply"] = True
comments.append(reply)
cont = item.get("continuationItemRenderer", {})
token = (
cont.get("continuationEndpoint", {})
.get("continuationCommand", {})
.get("token")
)
if not token:
token = (
cont.get("button", {}).get("buttonRenderer", {})
.get("command", {}).get("continuationCommand", {})
.get("token")
)
if token:
next_token = token
return comments, next_token
def scrape_all_comments(
video_id: str, max_comments: int = 0, delay_range: tuple = (1.0, 2.5)
) -> list[dict]:
"""
Scrape all comments from a video with pagination.
Args:
video_id: YouTube video ID or URL
max_comments: Stop after N comments (0 = unlimited)
delay_range: Random delay between requests in seconds
"""
video_id = extract_video_id(video_id)
print(f"Fetching comments for video: {video_id}")
all_comments = []
page = 0
# First page
comments, next_token = get_initial_comments(video_id)
all_comments.extend(comments)
page += 1
print(f" Page {page}: {len(comments)} comments (total: {len(all_comments)})")
# Paginate through remaining pages
while next_token:
if max_comments and len(all_comments) >= max_comments:
all_comments = all_comments[:max_comments]
print(f" Reached max_comments limit ({max_comments})")
break
time.sleep(random.uniform(*delay_range))
comments, next_token = get_comment_page(next_token)
if not comments:
break
all_comments.extend(comments)
page += 1
print(f" Page {page}: {len(comments)} comments (total: {len(all_comments)})")
print(f"\nDone. Collected {len(all_comments)} comments across {page} pages.")
return all_comments
def export_json(comments: list[dict], filename: str):
"""Export comments to JSON file."""
with open(filename, "w", encoding="utf-8") as f:
json.dump(comments, f, indent=2, ensure_ascii=False)
print(f"Exported {len(comments)} comments to {filename}")
def export_csv(comments: list[dict], filename: str):
"""Export comments to CSV file."""
if not comments:
print("No comments to export")
return
fieldnames = [
"comment_id", "author", "author_channel_id", "text",
"likes", "likes_display", "published", "is_pinned",
"is_hearted", "is_reply", "parent_id"
]
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
writer.writeheader()
writer.writerows(comments)
print(f"Exported {len(comments)} comments to {filename}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Scrape YouTube comments via InnerTube")
parser.add_argument("video", help="Video ID or full YouTube URL")
parser.add_argument("--format", choices=["json", "csv"], default="json")
parser.add_argument("--output", "-o", help="Output filename")
parser.add_argument("--max", type=int, default=0, help="Max comments (0=all)")
args = parser.parse_args()
vid = extract_video_id(args.video)
results = scrape_all_comments(vid, max_comments=args.max)
out_file = args.output or f"comments_{vid}.{args.format}"
if args.format == "csv":
export_csv(results, out_file)
else:
export_json(results, out_file)
Expected output
Running python yt_comments.py dQw4w9WgXcQ --max 100 --format json produces:
[
{
"comment_id": "UgxKz...",
"author": "MusicFan2024",
"author_channel_id": "UCa1b2c...",
"text": "I can't believe this is still getting views in 2026. Legend.",
"likes": 45200,
"likes_display": "45K",
"published": "2 months ago",
"is_pinned": false,
"is_hearted": false,
"is_reply": false,
"parent_id": ""
},
{
"comment_id": "UgyBm...",
"author": "RickAstleyOfficial",
"author_channel_id": "UCuA...",
"text": "Thank you all for the love",
"likes": 312000,
"likes_display": "312K",
"published": "1 year ago",
"is_pinned": true,
"is_hearted": true,
"is_reply": false,
"parent_id": ""
}
]
CSV output looks like:
comment_id,author,author_channel_id,text,likes,likes_display,published,is_pinned,is_hearted,is_reply,parent_id
UgxKz...,MusicFan2024,UCa1b2c...,"I can't believe this...",45200,45K,2 months ago,False,False,False,
Use Case 1: Sentiment Analysis Pipeline
Feed YouTube comments into a sentiment classifier — useful for brand monitoring, product feedback analysis, or gauging audience reaction to content:
"""
YouTube Comment Sentiment Analyzer
Classifies comments as positive/negative/neutral using keyword scoring.
No ML dependencies required — works with just the standard library.
"""
from collections import Counter
POSITIVE_SIGNALS = [
"love", "great", "amazing", "awesome", "best", "perfect",
"beautiful", "excellent", "fantastic", "brilliant", "helpful",
"thank", "thanks", "masterpiece", "goat", "fire", "legendary",
]
NEGATIVE_SIGNALS = [
"hate", "worst", "terrible", "awful", "trash", "garbage",
"boring", "cringe", "bad", "horrible", "disgusting", "waste",
"clickbait", "scam", "fake", "stolen", "copied",
]
def classify_sentiment(text: str) -> str:
text_lower = text.lower()
pos = sum(1 for w in POSITIVE_SIGNALS if w in text_lower)
neg = sum(1 for w in NEGATIVE_SIGNALS if w in text_lower)
if pos > neg:
return "positive"
elif neg > pos:
return "negative"
return "neutral"
def analyze_video_sentiment(video_id: str, max_comments: int = 500):
comments = scrape_all_comments(video_id, max_comments=max_comments)
sentiments = Counter()
for comment in comments:
sentiment = classify_sentiment(comment["text"])
comment["sentiment"] = sentiment
sentiments[sentiment] += 1
total = len(comments)
print(f"\n--- Sentiment Analysis for {video_id} ---")
print(f"Total comments analyzed: {total}")
for sentiment, count in sentiments.most_common():
pct = (count / total) * 100
print(f" {sentiment}: {count} ({pct:.1f}%)")
# Top positive comments by engagement
positive = sorted(
[c for c in comments if c.get("sentiment") == "positive"],
key=lambda x: x["likes"], reverse=True
)
print(f"\nTop 5 positive comments:")
for c in positive[:5]:
print(f" [{c['likes_display']} likes] {c['text'][:100]}")
return comments
Expected output:
--- Sentiment Analysis for dQw4w9WgXcQ ---
Total comments analyzed: 500
positive: 287 (57.4%)
neutral: 156 (31.2%)
negative: 57 (11.4%)
Top 5 positive comments:
[312K likes] Thank you all for the love
[45K likes] I can't believe this is still getting views in 2026. Legend.
[12K likes] This song is genuinely amazing. No irony.
[8.2K likes] Best music video of all time, no debate.
[5.1K likes] Love how this became a universal internet moment.
Use Case 2: Channel Comment Monitor
Track comments across all recent videos from a channel. Useful for brand monitoring, community management, or competitive analysis:
"""
Channel Comment Monitor
Fetches recent videos from a YouTube channel and collects
comments from each, producing a per-video summary report.
"""
def get_channel_video_ids(channel_handle: str, max_videos: int = 10) -> list[str]:
"""
Get recent video IDs from a channel using the InnerTube browse endpoint.
Pass the channel handle like '@mkbhd' or a channel ID like 'UCBJycsmduvYEL83R_U4JriQ'.
"""
payload = {
"context": INNERTUBE_CONTEXT,
"browseId": channel_handle,
"params": "EgZ2aWRlb3PyBgQKAjoA", # Videos tab
}
url = "https://www.youtube.com/youtubei/v1/browse"
resp = requests.post(url, json=payload, headers=HEADERS, timeout=30)
data = resp.json()
video_ids = []
tabs = (
data.get("contents", {})
.get("twoColumnBrowseResultsRenderer", {})
.get("tabs", [])
)
for tab in tabs:
contents = (
tab.get("tabRenderer", {})
.get("content", {})
.get("richGridRenderer", {})
.get("contents", [])
)
for item in contents:
video = (
item.get("richItemRenderer", {})
.get("content", {})
.get("videoRenderer", {})
)
vid_id = video.get("videoId")
if vid_id:
video_ids.append(vid_id)
if len(video_ids) >= max_videos:
return video_ids
return video_ids
def monitor_channel(channel_handle: str, max_videos: int = 5,
comments_per_video: int = 100):
"""Fetch and summarize comments from a channel's recent videos."""
print(f"Fetching recent videos from {channel_handle}...")
video_ids = get_channel_video_ids(channel_handle, max_videos)
print(f"Found {len(video_ids)} videos\n")
results = []
for i, vid in enumerate(video_ids):
print(f"[{i+1}/{len(video_ids)}] Video: https://youtube.com/watch?v={vid}")
comments = scrape_all_comments(vid, max_comments=comments_per_video)
total_likes = sum(c["likes"] for c in comments)
results.append({
"video_id": vid,
"url": f"https://youtube.com/watch?v={vid}",
"comment_count": len(comments),
"total_engagement": total_likes,
"comments": comments,
})
if i < len(video_ids) - 1:
time.sleep(random.uniform(3, 6))
# Print summary
print("\n" + "=" * 60)
print("CHANNEL COMMENT SUMMARY")
print("=" * 60)
for entry in results:
print(f"\n Video: {entry['url']}")
print(f" Comments: {entry['comment_count']}")
print(f" Total engagement: {entry['total_engagement']:,} likes")
if entry["comments"]:
top = max(entry["comments"], key=lambda c: c["likes"])
print(f" Top comment: [{top['likes_display']}] {top['text'][:80]}")
# Export full results
export_json(
[{"video": r["url"], "comments": r["comments"]} for r in results],
f"channel_comments_{channel_handle.strip('@')}.json"
)
return results
Use Case 3: Keyword Alert System
Monitor YouTube comments for specific keywords — brand mentions, competitor names, or trending topics:
"""
YouTube Comment Keyword Monitor
Watches for specific keywords in comments and logs matches.
Useful for brand monitoring or tracking discussions.
"""
def keyword_scan(video_id: str, keywords: list[str],
max_comments: int = 1000) -> list[dict]:
"""Scan video comments for keyword matches."""
comments = scrape_all_comments(video_id, max_comments=max_comments)
matches = []
keywords_lower = [k.lower() for k in keywords]
for comment in comments:
text_lower = comment["text"].lower()
matched_keywords = [k for k in keywords_lower if k in text_lower]
if matched_keywords:
matches.append({
**comment,
"matched_keywords": matched_keywords,
})
print(f"\nKeyword scan results for {video_id}:")
print(f" Total comments scanned: {len(comments)}")
print(f" Keyword matches found: {len(matches)}")
for kw in keywords_lower:
count = sum(1 for m in matches if kw in m["matched_keywords"])
print(f" '{kw}': {count} mentions")
print(f"\nTop keyword mentions by engagement:")
matches.sort(key=lambda x: x["likes"], reverse=True)
for m in matches[:10]:
print(f" [{m['likes_display']} likes] [{', '.join(m['matched_keywords'])}] "
f"{m['text'][:80]}")
return matches
# Example: scan a tech review video for brand mentions
# matches = keyword_scan("VIDEO_ID", ["iphone", "samsung", "pixel", "oneplus"])
Rate Limiting and Anti-Detection
InnerTube is more permissive than the Data API, but it's not unlimited. Here are practical rules based on real-world testing:
Request timing guidelines
| Scenario | Recommended delay | Notes |
|---|---|---|
| Paginating same video | 1-2 seconds | Random jitter via random.uniform() |
| Switching between videos | 3-6 seconds | Avoids pattern detection |
| After 50 consecutive requests | 15-30 second pause | Prevents soft throttling |
| After hitting a 429 | Exponential backoff | Start at 4s, double each retry |
Never use fixed intervals — that's a bot signature. Always add randomization.
IP management for scale
- A single residential IP handles ~200-300 requests before soft throttling
- Datacenter IPs work for low volume but get flagged faster
- For 10,000+ comments/day, rotate IPs per video
- ThorData residential proxies work well for YouTube — their rotating pool avoids the pattern detection that sticky IPs trigger
Adding proxy support
PROXY_URL = "http://user:[email protected]:9000"
session = requests.Session()
session.proxies = {"http": PROXY_URL, "https": PROXY_URL}
# Replace requests.post() with session.post() throughout the scraper
Detection signals to avoid
- Same clientVersion for months — update it from YouTube's page source periodically
- Requesting the same video repeatedly from one IP within an hour
- Perfectly regular timing between requests (real users don't scroll at fixed intervals)
- Missing browser headers — include
Accept-Language,Origin, andReferer
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| Empty comment list on first page | Comments disabled on video | Check commentDisabled field in response |
| 429 errors after ~50 requests | IP throttled by YouTube | Increase delays, rotate proxies |
KeyError on response fields |
YouTube updated their schema | Check browser DevTools for current field names |
| Comments load but no continuation | Reached the end | Normal — all comments have been fetched |
| Garbled text in output | Emoji/unicode encoding | Use ensure_ascii=False in JSON export |
| Only 20 comments returned | Didn't follow pagination | Use the continuation token loop |
200 OK but empty endpoints |
Soft block / IP flagged | Switch IP, wait 10+ minutes |
Skip the Setup: Ready-Made Scraper
If you'd rather not maintain this yourself, there's a free YouTube Comments Scraper on Apify that handles pagination, rate limiting, and comment threading out of the box. You specify a video URL or channel, and it returns structured JSON. Good starting point if you want results fast without debugging InnerTube's evolving response schema.
Wrapping Up
The InnerTube approach gives you YouTube comment access without API quotas. It's the same data path the official YouTube app uses, which means it tends to be stable — YouTube can't break their own app. The schema does shift occasionally when YouTube rolls out frontend changes, but the core structure has been consistent since 2021.
The complete scraper above handles pagination, rate limiting, soft-block detection, CSV/JSON export, and reply threads. For lightweight use, run it against a single video. For production workloads, pair it with rotating residential proxies and the channel monitoring or keyword alert use cases to build a real comment intelligence pipeline.