How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)
How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)
TikTok has made it notoriously difficult to access data programmatically. The official Research API exists, but unless you're affiliated with an academic institution or a verified business, you're not getting in. Even if you qualify, the application review process takes weeks and approval is far from guaranteed.
Here's how developers actually get TikTok data in 2026 — covering video metadata, comments, user profiles, and trending sounds.
Why Scrape TikTok? Real Use Cases
Before diving into code, here's what people actually build with TikTok data:
- Brand monitoring — Track mentions and sentiment across thousands of videos without paying $2k/month for social listening tools
- Competitor analysis — Compare posting frequency, engagement rates, and content themes across accounts in your niche
- Trend detection — Identify rising sounds, hashtags, and content formats before they peak (useful for content creators and marketers)
- Academic research — Study misinformation spread, content recommendation patterns, or cultural trends at scale
- Influencer vetting — Verify engagement metrics before signing sponsorship deals (fake followers are rampant)
- Market research — Analyze product review videos and comments to understand consumer sentiment
The Official Route and Why It Fails Most Developers
TikTok launched a Research API for "qualified researchers" in 2023. In practice this means:
- Academic affiliation required: You need a university email or institutional backing
- Business API: Available to "eligible businesses" but requires verification and a formal use-case review
- Rate limits: Even approved users get throttled aggressively
- Review timeline: 2–6 weeks from application to first token
- Data restrictions: The Research API only returns metadata — no video downloads, no comment text in many cases
For most developers — hobbyists, indie hackers, small analytics tools — this route is completely closed. So let's look at what actually works.
TikTok's Public Web Endpoints
TikTok's web app loads data from internal endpoints you can observe in browser devtools. The most useful ones don't require login for public content:
GET https://www.tiktok.com/api/post/item_list/?aid=1988&secUid={USER_SEC_UID}&count=30
GET https://www.tiktok.com/api/comment/list/?aid=1988&aweme_id={VIDEO_ID}&count=20
These work for public profiles and videos. The catch is the secUid — it's a long opaque identifier for each user that you need to resolve first from the username. You can get it from the user's profile page response.
However, TikTok rotates these endpoints and adds signature parameters (_signature, X-Bogus) that are generated client-side using obfuscated JavaScript. These signatures change with app versions and are intentionally hard to reverse-engineer.
The Mobile API Approach
TikTok's Android app communicates with a separate endpoint base (api16-normal-c-useast1a.tiktokv.com) that uses device fingerprints for auth. This approach involves:
- Capturing the device registration flow from a real or emulated Android device
- Replaying the device token with requests
- Parsing the protobuf or JSON responses
It works, but it's fragile. TikTok pushes app updates frequently and device tokens get flagged if request patterns look robotic. Maintaining this takes ongoing effort.
Playwright Browser Automation: The Practical Approach
For most use cases, Playwright automation against the TikTok web app is the most reliable option in 2026. It runs a real browser, so TikTok sees legitimate browser signals.
Critical note on page loading: Use domcontentloaded instead of networkidle as your wait condition. TikTok's pages never fully reach "network idle" — they keep making background requests for recommendations, ads, and analytics. Waiting for networkidle will cause your scraper to time out every single time.
Complete Working Script: Profile Scraper
Save this as tiktok_scraper.py and run with python3 tiktok_scraper.py <username>:
#!/usr/bin/env python3
"""TikTok profile + video scraper using Playwright.
Usage:
pip install playwright
playwright install chromium
python3 tiktok_scraper.py charlidamelio
"""
import asyncio
import json
import sys
from datetime import datetime
from playwright.async_api import async_playwright
import random
async def scrape_tiktok_profile(username: str, proxy: dict = None) -> dict:
"""Scrape a TikTok profile for video metadata."""
async with async_playwright() as p:
launch_args = {"headless": True}
context_args = {
"user_agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
),
"viewport": {"width": 1280, "height": 800},
"locale": "en-US",
}
if proxy:
context_args["proxy"] = proxy
browser = await p.chromium.launch(**launch_args)
context = await browser.new_context(**context_args)
page = await context.new_page()
videos = []
profile_info = {}
async def handle_response(response):
url = response.url
if response.status != 200:
return
try:
if "item_list" in url or "/api/post" in url:
data = await response.json()
for item in data.get("itemList", []):
stats = item.get("stats", {})
author = item.get("author", {})
music = item.get("music", {})
videos.append({
"id": item.get("id"),
"description": item.get("desc", ""),
"likes": stats.get("diggCount", 0),
"comments": stats.get("commentCount", 0),
"shares": stats.get("shareCount", 0),
"views": stats.get("playCount", 0),
"saves": stats.get("collectCount", 0),
"created": datetime.fromtimestamp(
item.get("createTime", 0)
).isoformat(),
"duration": item.get("video", {}).get(
"duration", 0
),
"sound": music.get("title", ""),
"sound_author": music.get("authorName", ""),
"hashtags": [
t.get("hashtagName", "")
for t in item.get("textExtra", [])
if t.get("hashtagName")
],
"url": (
f"https://www.tiktok.com/"
f"@{author.get('uniqueId', username)}/"
f"video/{item.get('id')}"
),
})
elif "/user/detail" in url or "uniqueId" in url:
data = await response.json()
user = data.get("userInfo", {})
if user:
u = user.get("user", {})
s = user.get("stats", {})
profile_info.update({
"username": u.get("uniqueId"),
"nickname": u.get("nickname"),
"bio": u.get("signature", ""),
"verified": u.get("verified", False),
"followers": s.get("followerCount", 0),
"following": s.get("followingCount", 0),
"total_likes": s.get("heartCount", 0),
"total_videos": s.get("videoCount", 0),
})
except Exception:
pass
page.on("response", handle_response)
url = f"https://www.tiktok.com/@{username}"
await page.goto(url, wait_until="domcontentloaded", timeout=30000)
await asyncio.sleep(3)
# Scroll with human-like variation to trigger lazy loads
for i in range(5):
scroll_amount = random.randint(600, 1200)
await page.evaluate(f"window.scrollBy(0, {scroll_amount})")
await asyncio.sleep(random.uniform(1.2, 3.0))
await browser.close()
return {
"profile": profile_info,
"videos": sorted(
videos, key=lambda v: v["views"], reverse=True
),
"scraped_at": datetime.now().isoformat(),
}
async def main():
username = sys.argv[1] if len(sys.argv) > 1 else "tiktok"
print(f"Scraping @{username}...")
data = await scrape_tiktok_profile(username)
# Print profile summary
p = data["profile"]
if p:
print(f"\n{'='*60}")
print(f"@{p.get('username', username)}")
print(f" {p.get('nickname', '')} "
f"{'✓' if p.get('verified') else ''}")
print(f" Followers: {p.get('followers', 0):,}")
print(f" Total likes: {p.get('total_likes', 0):,}")
print(f" Videos: {p.get('total_videos', 0):,}")
print(f"{'='*60}")
# Print top videos
print(f"\nFound {len(data['videos'])} videos:\n")
for i, v in enumerate(data["videos"][:10], 1):
print(f" {i}. {v['views']:>12,} views | "
f"{v['likes']:>8,} likes | "
f"{v['comments']:>6,} comments")
desc = v["description"][:80]
if len(v["description"]) > 80:
desc += "..."
print(f" {desc}")
if v["hashtags"]:
print(f" Tags: {', '.join(v['hashtags'][:5])}")
print()
# Save to JSON
outfile = f"tiktok_{username}.json"
with open(outfile, "w") as f:
json.dump(data, f, indent=2, ensure_ascii=False)
print(f"Saved full data to {outfile}")
if __name__ == "__main__":
asyncio.run(main())
Example Output
Scraping @charlidamelio...
============================================================
@charlidamelio
Charli D'Amelio ✓
Followers: 155,200,000
Total likes: 11,800,000,000
Videos: 2,847
============================================================
Found 30 videos:
1. 342,100,000 views | 28,400,000 likes | 185,000 comments
New dance with @landonbarker #couple #dance
Tags: couple, dance, fyp
2. 289,000,000 views | 22,100,000 likes | 142,000 comments
POV: your mom walks in while you're filming #relatable
Tags: relatable, fyp, comedy
3. 198,500,000 views | 15,800,000 likes | 98,400 comments
Trying the viral pasta recipe everyone's talking about
Tags: cooking, pasta, viral, foodtok
Saved full data to tiktok_charlidamelio.json
Scraping Comments
Comments require navigating to the individual video page. The same response-intercept approach works:
async def scrape_video_comments(
video_url: str, max_scroll: int = 5
) -> list[dict]:
"""Scrape comments from a single TikTok video."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
)
)
page = await context.new_page()
comments = []
async def handle_response(response):
if "/api/comment/list" not in response.url:
return
if response.status != 200:
return
try:
data = await response.json()
for c in data.get("comments", []):
user = c.get("user", {})
comments.append({
"text": c.get("text", ""),
"likes": c.get("digg_count", 0),
"author": user.get("unique_id", ""),
"author_verified": (
user.get("custom_verify", "") != ""
),
"reply_count": c.get(
"reply_comment_total", 0
),
"created": datetime.fromtimestamp(
c.get("create_time", 0)
).isoformat(),
})
except Exception:
pass
page.on("response", handle_response)
await page.goto(
video_url, wait_until="domcontentloaded", timeout=30000
)
await asyncio.sleep(4)
# Scroll comment section to load more
for _ in range(max_scroll):
await page.evaluate("""
const el = document.querySelector(
'[class*="CommentListContainer"]'
);
if (el) el.scrollTop += 500;
""")
await asyncio.sleep(2)
await browser.close()
return sorted(
comments, key=lambda c: c["likes"], reverse=True
)
Comment Output Example
[
{
"text": "This is the best tutorial I've ever seen",
"likes": 45200,
"author": "codingfan99",
"author_verified": false,
"reply_count": 23,
"created": "2026-03-28T14:22:00"
},
{
"text": "Can you do a part 2?",
"likes": 12800,
"author": "webdev_sarah",
"author_verified": true,
"reply_count": 5,
"created": "2026-03-28T15:45:00"
}
]
Scraping Trending Sounds and Hashtags
Trending data is valuable for content strategy. TikTok's discover page loads trending hashtags and sounds through interceptable API calls:
async def scrape_trending(proxy: dict = None) -> dict:
"""Scrape trending hashtags and sounds from TikTok's
discover page."""
async with async_playwright() as p:
context_args = {
"user_agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
),
"viewport": {"width": 1280, "height": 800},
"locale": "en-US",
}
if proxy:
context_args["proxy"] = proxy
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(**context_args)
page = await context.new_page()
trending_hashtags = []
trending_sounds = []
async def handle_response(response):
if response.status != 200:
return
try:
url = response.url
if "trending" in url or "discover" in url:
data = await response.json()
for item in data.get("challengeInfoList", []):
ch = item.get("challengeInfo", {})
trending_hashtags.append({
"name": ch.get("challengeName", ""),
"views": ch.get("stats", {}).get(
"videoCount", 0
),
"desc": ch.get("desc", ""),
})
for item in data.get("musicInfoList", []):
mu = item.get("musicInfo", {})
trending_sounds.append({
"title": mu.get("title", ""),
"author": mu.get("authorName", ""),
"video_count": mu.get("stats", {}).get(
"videoCount", 0
),
})
except Exception:
pass
page.on("response", handle_response)
await page.goto(
"https://www.tiktok.com/discover",
wait_until="domcontentloaded",
timeout=30000,
)
await asyncio.sleep(4)
await browser.close()
return {
"hashtags": trending_hashtags,
"sounds": trending_sounds,
}
Trending Output Example
{
"hashtags": [
{"name": "BookTok", "views": 218000000, "desc": "Book recommendations and reviews"},
{"name": "CleanTok", "views": 95000000, "desc": "Cleaning tips and satisfying content"},
{"name": "FitTok", "views": 78000000, "desc": "Fitness routines and transformations"}
],
"sounds": [
{"title": "original sound - trending", "author": "creator_xyz", "video_count": 4200000},
{"title": "Espresso", "author": "Sabrina Carpenter", "video_count": 3800000}
]
}
Anti-Bot Detection: What TikTok Actually Checks
TikTok's bot detection in 2026 is among the most sophisticated of any social platform. Here's exactly what they look for and how to handle each vector:
Browser Fingerprint Checks
| Signal | What TikTok Checks | How to Handle |
|---|---|---|
| Canvas fingerprint | Consistent canvas rendering | Use full Chromium (not headless-shell) |
| WebGL renderer | Headless browsers report "SwiftShader" | playwright-extra with stealth plugin |
| Navigator properties | navigator.webdriver is true in automation |
Stealth plugin patches this |
| Screen dimensions | Must match viewport | Set realistic viewport (1280x800, 1920x1080) |
| Timezone | Must match IP geolocation | Set timezone_id in browser context |
| Language | Must match Accept-Language header |
Set locale in browser context |
Behavioral Analysis
TikTok tracks mouse movement patterns, scroll velocity, and time between actions. A scraper that instantly scrolls at perfect intervals looks robotic.
# Bad: fixed timing (easily detected)
for _ in range(5):
await page.evaluate("window.scrollBy(0, 800)")
await asyncio.sleep(1.5) # too regular
# Good: human-like variation
for i in range(5):
scroll_amount = random.randint(400, 1200)
await page.evaluate(f"window.scrollBy(0, {scroll_amount})")
await asyncio.sleep(random.uniform(1.0, 3.5))
# Occasionally pause longer (humans get distracted)
if random.random() < 0.2:
await asyncio.sleep(random.uniform(3.0, 8.0))
IP Reputation
Datacenter IPs get flagged immediately. For production use, residential proxies are essential. ThorData provides residential proxies with rotating IPs that work well for TikTok — their pool covers consumer ISP addresses that TikTok doesn't block.
# Using a proxy with Playwright
context = await browser.new_context(
proxy={
"server": "http://proxy.thordata.com:9000",
"username": "your_username",
"password": "your_password",
},
# Match the browser fingerprint to proxy location
timezone_id="America/New_York",
locale="en-US",
)
Pro tip: Match the browser's timezone and locale to the proxy's geographic location. A request from a New York residential IP with Asia/Tokyo timezone is an obvious red flag.
Rate Limiting: The Hard Numbers
TikTok rate limits are aggressive. Based on community observations in 2026:
| Action | Safe Rate | Danger Zone | Hard Block |
|---|---|---|---|
| Profile views | 1 per 3 seconds | > 1/sec | > 5/sec |
| Video page loads | 1 per 2 seconds | > 2/sec | > 10/sec |
| Comment fetching | 1 per 4 seconds | > 1/sec | > 3/sec |
| Same profile repeat | Max 50 req/session | > 100/session | > 200/session |
| Session length | Under 10 min per IP | 10-30 min | > 30 min |
Exceeding these doesn't immediately block you — TikTok often returns empty results or serves a CAPTCHA page instead of a hard 429. Always check that your responses actually contain data, not just HTTP 200s with empty arrays.
# Defensive check — TikTok returns 200 with empty data
# when rate-limited
data = await response.json()
items = data.get("itemList", [])
if not items and data.get("statusCode") == 0:
print("WARNING: Empty response — likely rate-limited")
await asyncio.sleep(30) # back off significantly
Exporting Data for Analysis
Once you have the data, here's how to export it for common use cases:
import csv
import json
def export_to_csv(
videos: list[dict], filename: str = "tiktok_data.csv"
):
"""Export video data to CSV for spreadsheet analysis."""
if not videos:
return
fieldnames = [
"id", "description", "views", "likes", "comments",
"shares", "saves", "duration", "sound", "hashtags",
"created", "url",
]
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for v in videos:
row = {
**v,
"hashtags": ", ".join(v.get("hashtags", [])),
}
writer.writerow(
{k: row.get(k, "") for k in fieldnames}
)
print(f"Exported {len(videos)} videos to {filename}")
def engagement_summary(videos: list[dict]):
"""Print engagement rate analysis."""
if not videos:
return
total_views = sum(v["views"] for v in videos)
total_likes = sum(v["likes"] for v in videos)
total_comments = sum(v["comments"] for v in videos)
avg_engagement = (
(total_likes + total_comments) / total_views * 100
if total_views > 0
else 0
)
print(f"\nEngagement Summary ({len(videos)} videos):")
print(f" Total views: {total_views:>15,}")
print(f" Total likes: {total_likes:>15,}")
print(f" Total comments: {total_comments:>15,}")
print(f" Avg engagement: {avg_engagement:>14.2f}%")
print(f" Avg views/video:{total_views // len(videos):>15,}")
Example Engagement Output
Engagement Summary (30 videos):
Total views: 2,847,300,000
Total likes: 198,500,000
Total comments: 12,400,000
Avg engagement: 7.40%
Avg views/video: 94,910,000
Skip the Setup: Ready-Made Scraper
If you need TikTok data without maintaining infrastructure, there's a free TikTok Scraper on Apify that handles anti-bot measures, IP rotation, and pagination. You provide usernames or video URLs and get back structured JSON with video stats, comments, and profile data. Useful as a baseline before deciding whether to build your own pipeline.
Summary
| Method | Effort | Reliability | Cost | Best For |
|---|---|---|---|---|
| Official Research API | High (application) | High | Free (if approved) | Academic research |
| Web endpoint scraping | Medium | Medium | Proxy cost | Quick prototypes |
| Playwright automation | Medium | High | Proxy + compute | Production scraping |
| Mobile API replay | High | Low (fragile) | Dev time | Specific data points |
| Apify/third-party | Low | Medium | Usage-based | One-off data pulls |
For most projects, Playwright with residential proxies hits the right balance. Use domcontentloaded, intercept API responses rather than parsing the DOM, keep your request rate under 1 per 3 seconds, and rotate IPs regularly. Match your browser fingerprint to your proxy location, add human-like timing variation, and always validate that responses contain actual data rather than empty 200s.