How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)

2026-03-30 [tiktok scraping social-media python]

How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)

TikTok has made it notoriously difficult to access data programmatically. The official Research API exists, but unless you're affiliated with an academic institution or a verified business, you're not getting in. Even if you qualify, the application review process takes weeks and approval is far from guaranteed.

Here's how developers actually get TikTok data in 2026 — covering video metadata, comments, user profiles, and trending sounds.

Why Scrape TikTok? Real Use Cases

Before diving into code, here's what people actually build with TikTok data:

Brand monitoring — Track mentions and sentiment across thousands of videos without paying $2k/month for social listening tools
Competitor analysis — Compare posting frequency, engagement rates, and content themes across accounts in your niche
Trend detection — Identify rising sounds, hashtags, and content formats before they peak (useful for content creators and marketers)
Academic research — Study misinformation spread, content recommendation patterns, or cultural trends at scale
Influencer vetting — Verify engagement metrics before signing sponsorship deals (fake followers are rampant)
Market research — Analyze product review videos and comments to understand consumer sentiment

The Official Route and Why It Fails Most Developers

TikTok launched a Research API for "qualified researchers" in 2023. In practice this means:

Academic affiliation required: You need a university email or institutional backing
Business API: Available to "eligible businesses" but requires verification and a formal use-case review
Rate limits: Even approved users get throttled aggressively
Review timeline: 2–6 weeks from application to first token
Data restrictions: The Research API only returns metadata — no video downloads, no comment text in many cases

For most developers — hobbyists, indie hackers, small analytics tools — this route is completely closed. So let's look at what actually works.

TikTok's Public Web Endpoints

TikTok's web app loads data from internal endpoints you can observe in browser devtools. The most useful ones don't require login for public content:

GET https://www.tiktok.com/api/post/item_list/?aid=1988&secUid={USER_SEC_UID}&count=30
GET https://www.tiktok.com/api/comment/list/?aid=1988&aweme_id={VIDEO_ID}&count=20

These work for public profiles and videos. The catch is the secUid — it's a long opaque identifier for each user that you need to resolve first from the username. You can get it from the user's profile page response.

However, TikTok rotates these endpoints and adds signature parameters (_signature, X-Bogus) that are generated client-side using obfuscated JavaScript. These signatures change with app versions and are intentionally hard to reverse-engineer.

The Mobile API Approach

TikTok's Android app communicates with a separate endpoint base (api16-normal-c-useast1a.tiktokv.com) that uses device fingerprints for auth. This approach involves:

Capturing the device registration flow from a real or emulated Android device
Replaying the device token with requests
Parsing the protobuf or JSON responses

It works, but it's fragile. TikTok pushes app updates frequently and device tokens get flagged if request patterns look robotic. Maintaining this takes ongoing effort.

Playwright Browser Automation: The Practical Approach

For most use cases, Playwright automation against the TikTok web app is the most reliable option in 2026. It runs a real browser, so TikTok sees legitimate browser signals.

Critical note on page loading: Use domcontentloaded instead of networkidle as your wait condition. TikTok's pages never fully reach "network idle" — they keep making background requests for recommendations, ads, and analytics. Waiting for networkidle will cause your scraper to time out every single time.

Complete Working Script: Profile Scraper

Save this as tiktok_scraper.py and run with python3 tiktok_scraper.py <username>:

#!/usr/bin/env python3
"""TikTok profile + video scraper using Playwright.

Usage:
    pip install playwright
    playwright install chromium
    python3 tiktok_scraper.py charlidamelio
"""

import asyncio
import json
import sys
from datetime import datetime
from playwright.async_api import async_playwright
import random


async def scrape_tiktok_profile(username: str, proxy: dict = None) -> dict:
    """Scrape a TikTok profile for video metadata."""
    async with async_playwright() as p:
        launch_args = {"headless": True}
        context_args = {
            "user_agent": (
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
            ),
            "viewport": {"width": 1280, "height": 800},
            "locale": "en-US",
        }
        if proxy:
            context_args["proxy"] = proxy

        browser = await p.chromium.launch(**launch_args)
        context = await browser.new_context(**context_args)
        page = await context.new_page()

        videos = []
        profile_info = {}

        async def handle_response(response):
            url = response.url
            if response.status != 200:
                return
            try:
                if "item_list" in url or "/api/post" in url:
                    data = await response.json()
                    for item in data.get("itemList", []):
                        stats = item.get("stats", {})
                        author = item.get("author", {})
                        music = item.get("music", {})
                        videos.append({
                            "id": item.get("id"),
                            "description": item.get("desc", ""),
                            "likes": stats.get("diggCount", 0),
                            "comments": stats.get("commentCount", 0),
                            "shares": stats.get("shareCount", 0),
                            "views": stats.get("playCount", 0),
                            "saves": stats.get("collectCount", 0),
                            "created": datetime.fromtimestamp(
                                item.get("createTime", 0)
                            ).isoformat(),
                            "duration": item.get("video", {}).get(
                                "duration", 0
                            ),
                            "sound": music.get("title", ""),
                            "sound_author": music.get("authorName", ""),
                            "hashtags": [
                                t.get("hashtagName", "")
                                for t in item.get("textExtra", [])
                                if t.get("hashtagName")
                            ],
                            "url": (
                                f"https://www.tiktok.com/"
                                f"@{author.get('uniqueId', username)}/"
                                f"video/{item.get('id')}"
                            ),
                        })
                elif "/user/detail" in url or "uniqueId" in url:
                    data = await response.json()
                    user = data.get("userInfo", {})
                    if user:
                        u = user.get("user", {})
                        s = user.get("stats", {})
                        profile_info.update({
                            "username": u.get("uniqueId"),
                            "nickname": u.get("nickname"),
                            "bio": u.get("signature", ""),
                            "verified": u.get("verified", False),
                            "followers": s.get("followerCount", 0),
                            "following": s.get("followingCount", 0),
                            "total_likes": s.get("heartCount", 0),
                            "total_videos": s.get("videoCount", 0),
                        })
            except Exception:
                pass

        page.on("response", handle_response)

        url = f"https://www.tiktok.com/@{username}"
        await page.goto(url, wait_until="domcontentloaded", timeout=30000)
        await asyncio.sleep(3)

        # Scroll with human-like variation to trigger lazy loads
        for i in range(5):
            scroll_amount = random.randint(600, 1200)
            await page.evaluate(f"window.scrollBy(0, {scroll_amount})")
            await asyncio.sleep(random.uniform(1.2, 3.0))

        await browser.close()

        return {
            "profile": profile_info,
            "videos": sorted(
                videos, key=lambda v: v["views"], reverse=True
            ),
            "scraped_at": datetime.now().isoformat(),
        }


async def main():
    username = sys.argv[1] if len(sys.argv) > 1 else "tiktok"

    print(f"Scraping @{username}...")
    data = await scrape_tiktok_profile(username)

    # Print profile summary
    p = data["profile"]
    if p:
        print(f"\n{'='*60}")
        print(f"@{p.get('username', username)}")
        print(f"  {p.get('nickname', '')} "
              f"{'✓' if p.get('verified') else ''}")
        print(f"  Followers: {p.get('followers', 0):,}")
        print(f"  Total likes: {p.get('total_likes', 0):,}")
        print(f"  Videos: {p.get('total_videos', 0):,}")
        print(f"{'='*60}")

    # Print top videos
    print(f"\nFound {len(data['videos'])} videos:\n")
    for i, v in enumerate(data["videos"][:10], 1):
        print(f"  {i}. {v['views']:>12,} views | "
              f"{v['likes']:>8,} likes | "
              f"{v['comments']:>6,} comments")
        desc = v["description"][:80]
        if len(v["description"]) > 80:
            desc += "..."
        print(f"     {desc}")
        if v["hashtags"]:
            print(f"     Tags: {', '.join(v['hashtags'][:5])}")
        print()

    # Save to JSON
    outfile = f"tiktok_{username}.json"
    with open(outfile, "w") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    print(f"Saved full data to {outfile}")


if __name__ == "__main__":
    asyncio.run(main())

Example Output

Scraping @charlidamelio...

============================================================
@charlidamelio
  Charli D'Amelio ✓
  Followers: 155,200,000
  Total likes: 11,800,000,000
  Videos: 2,847
============================================================

Found 30 videos:

  1.  342,100,000 views | 28,400,000 likes | 185,000 comments
     New dance with @landonbarker #couple #dance
     Tags: couple, dance, fyp

  2.  289,000,000 views | 22,100,000 likes | 142,000 comments
     POV: your mom walks in while you're filming #relatable
     Tags: relatable, fyp, comedy

  3.  198,500,000 views | 15,800,000 likes |  98,400 comments
     Trying the viral pasta recipe everyone's talking about
     Tags: cooking, pasta, viral, foodtok

Saved full data to tiktok_charlidamelio.json

Scraping Comments

Comments require navigating to the individual video page. The same response-intercept approach works:

async def scrape_video_comments(
    video_url: str, max_scroll: int = 5
) -> list[dict]:
    """Scrape comments from a single TikTok video."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
            )
        )
        page = await context.new_page()
        comments = []

        async def handle_response(response):
            if "/api/comment/list" not in response.url:
                return
            if response.status != 200:
                return
            try:
                data = await response.json()
                for c in data.get("comments", []):
                    user = c.get("user", {})
                    comments.append({
                        "text": c.get("text", ""),
                        "likes": c.get("digg_count", 0),
                        "author": user.get("unique_id", ""),
                        "author_verified": (
                            user.get("custom_verify", "") != ""
                        ),
                        "reply_count": c.get(
                            "reply_comment_total", 0
                        ),
                        "created": datetime.fromtimestamp(
                            c.get("create_time", 0)
                        ).isoformat(),
                    })
            except Exception:
                pass

        page.on("response", handle_response)
        await page.goto(
            video_url, wait_until="domcontentloaded", timeout=30000
        )
        await asyncio.sleep(4)

        # Scroll comment section to load more
        for _ in range(max_scroll):
            await page.evaluate("""
                const el = document.querySelector(
                    '[class*="CommentListContainer"]'
                );
                if (el) el.scrollTop += 500;
            """)
            await asyncio.sleep(2)

        await browser.close()
        return sorted(
            comments, key=lambda c: c["likes"], reverse=True
        )

Comment Output Example

[
    {
        "text": "This is the best tutorial I've ever seen",
        "likes": 45200,
        "author": "codingfan99",
        "author_verified": false,
        "reply_count": 23,
        "created": "2026-03-28T14:22:00"
    },
    {
        "text": "Can you do a part 2?",
        "likes": 12800,
        "author": "webdev_sarah",
        "author_verified": true,
        "reply_count": 5,
        "created": "2026-03-28T15:45:00"
    }
]

Trending data is valuable for content strategy. TikTok's discover page loads trending hashtags and sounds through interceptable API calls:

async def scrape_trending(proxy: dict = None) -> dict:
    """Scrape trending hashtags and sounds from TikTok's
    discover page."""
    async with async_playwright() as p:
        context_args = {
            "user_agent": (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
            ),
            "viewport": {"width": 1280, "height": 800},
            "locale": "en-US",
        }
        if proxy:
            context_args["proxy"] = proxy

        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(**context_args)
        page = await context.new_page()

        trending_hashtags = []
        trending_sounds = []

        async def handle_response(response):
            if response.status != 200:
                return
            try:
                url = response.url
                if "trending" in url or "discover" in url:
                    data = await response.json()
                    for item in data.get("challengeInfoList", []):
                        ch = item.get("challengeInfo", {})
                        trending_hashtags.append({
                            "name": ch.get("challengeName", ""),
                            "views": ch.get("stats", {}).get(
                                "videoCount", 0
                            ),
                            "desc": ch.get("desc", ""),
                        })
                    for item in data.get("musicInfoList", []):
                        mu = item.get("musicInfo", {})
                        trending_sounds.append({
                            "title": mu.get("title", ""),
                            "author": mu.get("authorName", ""),
                            "video_count": mu.get("stats", {}).get(
                                "videoCount", 0
                            ),
                        })
            except Exception:
                pass

        page.on("response", handle_response)
        await page.goto(
            "https://www.tiktok.com/discover",
            wait_until="domcontentloaded",
            timeout=30000,
        )
        await asyncio.sleep(4)

        await browser.close()
        return {
            "hashtags": trending_hashtags,
            "sounds": trending_sounds,
        }

{
    "hashtags": [
        {"name": "BookTok", "views": 218000000, "desc": "Book recommendations and reviews"},
        {"name": "CleanTok", "views": 95000000, "desc": "Cleaning tips and satisfying content"},
        {"name": "FitTok", "views": 78000000, "desc": "Fitness routines and transformations"}
    ],
    "sounds": [
        {"title": "original sound - trending", "author": "creator_xyz", "video_count": 4200000},
        {"title": "Espresso", "author": "Sabrina Carpenter", "video_count": 3800000}
    ]
}

Anti-Bot Detection: What TikTok Actually Checks

TikTok's bot detection in 2026 is among the most sophisticated of any social platform. Here's exactly what they look for and how to handle each vector:

Browser Fingerprint Checks

Signal	What TikTok Checks	How to Handle
Canvas fingerprint	Consistent canvas rendering	Use full Chromium (not headless-shell)
WebGL renderer	Headless browsers report "SwiftShader"	`playwright-extra` with stealth plugin
Navigator properties	`navigator.webdriver` is `true` in automation	Stealth plugin patches this
Screen dimensions	Must match viewport	Set realistic viewport (1280x800, 1920x1080)
Timezone	Must match IP geolocation	Set `timezone_id` in browser context
Language	Must match `Accept-Language` header	Set `locale` in browser context

Behavioral Analysis

TikTok tracks mouse movement patterns, scroll velocity, and time between actions. A scraper that instantly scrolls at perfect intervals looks robotic.

# Bad: fixed timing (easily detected)
for _ in range(5):
    await page.evaluate("window.scrollBy(0, 800)")
    await asyncio.sleep(1.5)  # too regular

# Good: human-like variation
for i in range(5):
    scroll_amount = random.randint(400, 1200)
    await page.evaluate(f"window.scrollBy(0, {scroll_amount})")
    await asyncio.sleep(random.uniform(1.0, 3.5))

    # Occasionally pause longer (humans get distracted)
    if random.random() < 0.2:
        await asyncio.sleep(random.uniform(3.0, 8.0))

IP Reputation

Datacenter IPs get flagged immediately. For production use, residential proxies are essential. ThorData provides residential proxies with rotating IPs that work well for TikTok — their pool covers consumer ISP addresses that TikTok doesn't block.

# Using a proxy with Playwright
context = await browser.new_context(
    proxy={
        "server": "http://proxy.thordata.com:9000",
        "username": "your_username",
        "password": "your_password",
    },
    # Match the browser fingerprint to proxy location
    timezone_id="America/New_York",
    locale="en-US",
)

Pro tip: Match the browser's timezone and locale to the proxy's geographic location. A request from a New York residential IP with Asia/Tokyo timezone is an obvious red flag.

Rate Limiting: The Hard Numbers

TikTok rate limits are aggressive. Based on community observations in 2026:

Action	Safe Rate	Danger Zone	Hard Block
Profile views	1 per 3 seconds	> 1/sec	> 5/sec
Video page loads	1 per 2 seconds	> 2/sec	> 10/sec
Comment fetching	1 per 4 seconds	> 1/sec	> 3/sec
Same profile repeat	Max 50 req/session	> 100/session	> 200/session
Session length	Under 10 min per IP	10-30 min	> 30 min

Exceeding these doesn't immediately block you — TikTok often returns empty results or serves a CAPTCHA page instead of a hard 429. Always check that your responses actually contain data, not just HTTP 200s with empty arrays.

# Defensive check — TikTok returns 200 with empty data
# when rate-limited
data = await response.json()
items = data.get("itemList", [])
if not items and data.get("statusCode") == 0:
    print("WARNING: Empty response — likely rate-limited")
    await asyncio.sleep(30)  # back off significantly

Exporting Data for Analysis

Once you have the data, here's how to export it for common use cases:

import csv
import json


def export_to_csv(
    videos: list[dict], filename: str = "tiktok_data.csv"
):
    """Export video data to CSV for spreadsheet analysis."""
    if not videos:
        return

    fieldnames = [
        "id", "description", "views", "likes", "comments",
        "shares", "saves", "duration", "sound", "hashtags",
        "created", "url",
    ]
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for v in videos:
            row = {
                **v,
                "hashtags": ", ".join(v.get("hashtags", [])),
            }
            writer.writerow(
                {k: row.get(k, "") for k in fieldnames}
            )

    print(f"Exported {len(videos)} videos to {filename}")


def engagement_summary(videos: list[dict]):
    """Print engagement rate analysis."""
    if not videos:
        return

    total_views = sum(v["views"] for v in videos)
    total_likes = sum(v["likes"] for v in videos)
    total_comments = sum(v["comments"] for v in videos)
    avg_engagement = (
        (total_likes + total_comments) / total_views * 100
        if total_views > 0
        else 0
    )

    print(f"\nEngagement Summary ({len(videos)} videos):")
    print(f"  Total views:    {total_views:>15,}")
    print(f"  Total likes:    {total_likes:>15,}")
    print(f"  Total comments: {total_comments:>15,}")
    print(f"  Avg engagement: {avg_engagement:>14.2f}%")
    print(f"  Avg views/video:{total_views // len(videos):>15,}")

Example Engagement Output

Engagement Summary (30 videos):
  Total views:      2,847,300,000
  Total likes:        198,500,000
  Total comments:      12,400,000
  Avg engagement:            7.40%
  Avg views/video:      94,910,000

Skip the Setup: Ready-Made Scraper

If you need TikTok data without maintaining infrastructure, there's a free TikTok Scraper on Apify that handles anti-bot measures, IP rotation, and pagination. You provide usernames or video URLs and get back structured JSON with video stats, comments, and profile data. Useful as a baseline before deciding whether to build your own pipeline.

Summary

Method	Effort	Reliability	Cost	Best For
Official Research API	High (application)	High	Free (if approved)	Academic research
Web endpoint scraping	Medium	Medium	Proxy cost	Quick prototypes
Playwright automation	Medium	High	Proxy + compute	Production scraping
Mobile API replay	High	Low (fragile)	Dev time	Specific data points
Apify/third-party	Low	Medium	Usage-based	One-off data pulls

For most projects, Playwright with residential proxies hits the right balance. Use domcontentloaded, intercept API responses rather than parsing the DOM, keep your request rate under 1 per 3 seconds, and rotate IPs regularly. Match your browser fingerprint to your proxy location, add human-like timing variation, and always validate that responses contain actual data rather than empty 200s.

How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)

How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)

Why Scrape TikTok? Real Use Cases

The Official Route and Why It Fails Most Developers

TikTok's Public Web Endpoints

The Mobile API Approach

Playwright Browser Automation: The Practical Approach

Complete Working Script: Profile Scraper

Example Output

Scraping Comments

Comment Output Example

Scraping Trending Sounds and Hashtags

Trending Output Example

Anti-Bot Detection: What TikTok Actually Checks

Browser Fingerprint Checks

Behavioral Analysis

IP Reputation

Rate Limiting: The Hard Numbers

Exporting Data for Analysis

Example Engagement Output

Skip the Setup: Ready-Made Scraper

Summary