← Back to blog

How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)

How to Scrape TikTok Data in 2026 (Videos, Comments, Profiles)

TikTok has made it notoriously difficult to access data programmatically. The official Research API exists, but unless you're affiliated with an academic institution or a verified business, you're not getting in. Even if you qualify, the application review process takes weeks and approval is far from guaranteed.

Here's how developers actually get TikTok data in 2026 — covering video metadata, comments, user profiles, and trending sounds.

Why Scrape TikTok? Real Use Cases

Before diving into code, here's what people actually build with TikTok data:

The Official Route and Why It Fails Most Developers

TikTok launched a Research API for "qualified researchers" in 2023. In practice this means:

For most developers — hobbyists, indie hackers, small analytics tools — this route is completely closed. So let's look at what actually works.

TikTok's Public Web Endpoints

TikTok's web app loads data from internal endpoints you can observe in browser devtools. The most useful ones don't require login for public content:

GET https://www.tiktok.com/api/post/item_list/?aid=1988&secUid={USER_SEC_UID}&count=30
GET https://www.tiktok.com/api/comment/list/?aid=1988&aweme_id={VIDEO_ID}&count=20

These work for public profiles and videos. The catch is the secUid — it's a long opaque identifier for each user that you need to resolve first from the username. You can get it from the user's profile page response.

However, TikTok rotates these endpoints and adds signature parameters (_signature, X-Bogus) that are generated client-side using obfuscated JavaScript. These signatures change with app versions and are intentionally hard to reverse-engineer.

The Mobile API Approach

TikTok's Android app communicates with a separate endpoint base (api16-normal-c-useast1a.tiktokv.com) that uses device fingerprints for auth. This approach involves:

  1. Capturing the device registration flow from a real or emulated Android device
  2. Replaying the device token with requests
  3. Parsing the protobuf or JSON responses

It works, but it's fragile. TikTok pushes app updates frequently and device tokens get flagged if request patterns look robotic. Maintaining this takes ongoing effort.

Playwright Browser Automation: The Practical Approach

For most use cases, Playwright automation against the TikTok web app is the most reliable option in 2026. It runs a real browser, so TikTok sees legitimate browser signals.

Critical note on page loading: Use domcontentloaded instead of networkidle as your wait condition. TikTok's pages never fully reach "network idle" — they keep making background requests for recommendations, ads, and analytics. Waiting for networkidle will cause your scraper to time out every single time.

Complete Working Script: Profile Scraper

Save this as tiktok_scraper.py and run with python3 tiktok_scraper.py <username>:

#!/usr/bin/env python3
"""TikTok profile + video scraper using Playwright.

Usage:
    pip install playwright
    playwright install chromium
    python3 tiktok_scraper.py charlidamelio
"""

import asyncio
import json
import sys
from datetime import datetime
from playwright.async_api import async_playwright
import random


async def scrape_tiktok_profile(username: str, proxy: dict = None) -> dict:
    """Scrape a TikTok profile for video metadata."""
    async with async_playwright() as p:
        launch_args = {"headless": True}
        context_args = {
            "user_agent": (
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
            ),
            "viewport": {"width": 1280, "height": 800},
            "locale": "en-US",
        }
        if proxy:
            context_args["proxy"] = proxy

        browser = await p.chromium.launch(**launch_args)
        context = await browser.new_context(**context_args)
        page = await context.new_page()

        videos = []
        profile_info = {}

        async def handle_response(response):
            url = response.url
            if response.status != 200:
                return
            try:
                if "item_list" in url or "/api/post" in url:
                    data = await response.json()
                    for item in data.get("itemList", []):
                        stats = item.get("stats", {})
                        author = item.get("author", {})
                        music = item.get("music", {})
                        videos.append({
                            "id": item.get("id"),
                            "description": item.get("desc", ""),
                            "likes": stats.get("diggCount", 0),
                            "comments": stats.get("commentCount", 0),
                            "shares": stats.get("shareCount", 0),
                            "views": stats.get("playCount", 0),
                            "saves": stats.get("collectCount", 0),
                            "created": datetime.fromtimestamp(
                                item.get("createTime", 0)
                            ).isoformat(),
                            "duration": item.get("video", {}).get(
                                "duration", 0
                            ),
                            "sound": music.get("title", ""),
                            "sound_author": music.get("authorName", ""),
                            "hashtags": [
                                t.get("hashtagName", "")
                                for t in item.get("textExtra", [])
                                if t.get("hashtagName")
                            ],
                            "url": (
                                f"https://www.tiktok.com/"
                                f"@{author.get('uniqueId', username)}/"
                                f"video/{item.get('id')}"
                            ),
                        })
                elif "/user/detail" in url or "uniqueId" in url:
                    data = await response.json()
                    user = data.get("userInfo", {})
                    if user:
                        u = user.get("user", {})
                        s = user.get("stats", {})
                        profile_info.update({
                            "username": u.get("uniqueId"),
                            "nickname": u.get("nickname"),
                            "bio": u.get("signature", ""),
                            "verified": u.get("verified", False),
                            "followers": s.get("followerCount", 0),
                            "following": s.get("followingCount", 0),
                            "total_likes": s.get("heartCount", 0),
                            "total_videos": s.get("videoCount", 0),
                        })
            except Exception:
                pass

        page.on("response", handle_response)

        url = f"https://www.tiktok.com/@{username}"
        await page.goto(url, wait_until="domcontentloaded", timeout=30000)
        await asyncio.sleep(3)

        # Scroll with human-like variation to trigger lazy loads
        for i in range(5):
            scroll_amount = random.randint(600, 1200)
            await page.evaluate(f"window.scrollBy(0, {scroll_amount})")
            await asyncio.sleep(random.uniform(1.2, 3.0))

        await browser.close()

        return {
            "profile": profile_info,
            "videos": sorted(
                videos, key=lambda v: v["views"], reverse=True
            ),
            "scraped_at": datetime.now().isoformat(),
        }


async def main():
    username = sys.argv[1] if len(sys.argv) > 1 else "tiktok"

    print(f"Scraping @{username}...")
    data = await scrape_tiktok_profile(username)

    # Print profile summary
    p = data["profile"]
    if p:
        print(f"\n{'='*60}")
        print(f"@{p.get('username', username)}")
        print(f"  {p.get('nickname', '')} "
              f"{'✓' if p.get('verified') else ''}")
        print(f"  Followers: {p.get('followers', 0):,}")
        print(f"  Total likes: {p.get('total_likes', 0):,}")
        print(f"  Videos: {p.get('total_videos', 0):,}")
        print(f"{'='*60}")

    # Print top videos
    print(f"\nFound {len(data['videos'])} videos:\n")
    for i, v in enumerate(data["videos"][:10], 1):
        print(f"  {i}. {v['views']:>12,} views | "
              f"{v['likes']:>8,} likes | "
              f"{v['comments']:>6,} comments")
        desc = v["description"][:80]
        if len(v["description"]) > 80:
            desc += "..."
        print(f"     {desc}")
        if v["hashtags"]:
            print(f"     Tags: {', '.join(v['hashtags'][:5])}")
        print()

    # Save to JSON
    outfile = f"tiktok_{username}.json"
    with open(outfile, "w") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    print(f"Saved full data to {outfile}")


if __name__ == "__main__":
    asyncio.run(main())

Example Output

Scraping @charlidamelio...

============================================================
@charlidamelio
  Charli D'Amelio ✓
  Followers: 155,200,000
  Total likes: 11,800,000,000
  Videos: 2,847
============================================================

Found 30 videos:

  1.  342,100,000 views | 28,400,000 likes | 185,000 comments
     New dance with @landonbarker #couple #dance
     Tags: couple, dance, fyp

  2.  289,000,000 views | 22,100,000 likes | 142,000 comments
     POV: your mom walks in while you're filming #relatable
     Tags: relatable, fyp, comedy

  3.  198,500,000 views | 15,800,000 likes |  98,400 comments
     Trying the viral pasta recipe everyone's talking about
     Tags: cooking, pasta, viral, foodtok

Saved full data to tiktok_charlidamelio.json

Scraping Comments

Comments require navigating to the individual video page. The same response-intercept approach works:

async def scrape_video_comments(
    video_url: str, max_scroll: int = 5
) -> list[dict]:
    """Scrape comments from a single TikTok video."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
            )
        )
        page = await context.new_page()
        comments = []

        async def handle_response(response):
            if "/api/comment/list" not in response.url:
                return
            if response.status != 200:
                return
            try:
                data = await response.json()
                for c in data.get("comments", []):
                    user = c.get("user", {})
                    comments.append({
                        "text": c.get("text", ""),
                        "likes": c.get("digg_count", 0),
                        "author": user.get("unique_id", ""),
                        "author_verified": (
                            user.get("custom_verify", "") != ""
                        ),
                        "reply_count": c.get(
                            "reply_comment_total", 0
                        ),
                        "created": datetime.fromtimestamp(
                            c.get("create_time", 0)
                        ).isoformat(),
                    })
            except Exception:
                pass

        page.on("response", handle_response)
        await page.goto(
            video_url, wait_until="domcontentloaded", timeout=30000
        )
        await asyncio.sleep(4)

        # Scroll comment section to load more
        for _ in range(max_scroll):
            await page.evaluate("""
                const el = document.querySelector(
                    '[class*="CommentListContainer"]'
                );
                if (el) el.scrollTop += 500;
            """)
            await asyncio.sleep(2)

        await browser.close()
        return sorted(
            comments, key=lambda c: c["likes"], reverse=True
        )

Comment Output Example

[
    {
        "text": "This is the best tutorial I've ever seen",
        "likes": 45200,
        "author": "codingfan99",
        "author_verified": false,
        "reply_count": 23,
        "created": "2026-03-28T14:22:00"
    },
    {
        "text": "Can you do a part 2?",
        "likes": 12800,
        "author": "webdev_sarah",
        "author_verified": true,
        "reply_count": 5,
        "created": "2026-03-28T15:45:00"
    }
]

Trending data is valuable for content strategy. TikTok's discover page loads trending hashtags and sounds through interceptable API calls:

async def scrape_trending(proxy: dict = None) -> dict:
    """Scrape trending hashtags and sounds from TikTok's
    discover page."""
    async with async_playwright() as p:
        context_args = {
            "user_agent": (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
            ),
            "viewport": {"width": 1280, "height": 800},
            "locale": "en-US",
        }
        if proxy:
            context_args["proxy"] = proxy

        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(**context_args)
        page = await context.new_page()

        trending_hashtags = []
        trending_sounds = []

        async def handle_response(response):
            if response.status != 200:
                return
            try:
                url = response.url
                if "trending" in url or "discover" in url:
                    data = await response.json()
                    for item in data.get("challengeInfoList", []):
                        ch = item.get("challengeInfo", {})
                        trending_hashtags.append({
                            "name": ch.get("challengeName", ""),
                            "views": ch.get("stats", {}).get(
                                "videoCount", 0
                            ),
                            "desc": ch.get("desc", ""),
                        })
                    for item in data.get("musicInfoList", []):
                        mu = item.get("musicInfo", {})
                        trending_sounds.append({
                            "title": mu.get("title", ""),
                            "author": mu.get("authorName", ""),
                            "video_count": mu.get("stats", {}).get(
                                "videoCount", 0
                            ),
                        })
            except Exception:
                pass

        page.on("response", handle_response)
        await page.goto(
            "https://www.tiktok.com/discover",
            wait_until="domcontentloaded",
            timeout=30000,
        )
        await asyncio.sleep(4)

        await browser.close()
        return {
            "hashtags": trending_hashtags,
            "sounds": trending_sounds,
        }
{
    "hashtags": [
        {"name": "BookTok", "views": 218000000, "desc": "Book recommendations and reviews"},
        {"name": "CleanTok", "views": 95000000, "desc": "Cleaning tips and satisfying content"},
        {"name": "FitTok", "views": 78000000, "desc": "Fitness routines and transformations"}
    ],
    "sounds": [
        {"title": "original sound - trending", "author": "creator_xyz", "video_count": 4200000},
        {"title": "Espresso", "author": "Sabrina Carpenter", "video_count": 3800000}
    ]
}

Anti-Bot Detection: What TikTok Actually Checks

TikTok's bot detection in 2026 is among the most sophisticated of any social platform. Here's exactly what they look for and how to handle each vector:

Browser Fingerprint Checks

Signal What TikTok Checks How to Handle
Canvas fingerprint Consistent canvas rendering Use full Chromium (not headless-shell)
WebGL renderer Headless browsers report "SwiftShader" playwright-extra with stealth plugin
Navigator properties navigator.webdriver is true in automation Stealth plugin patches this
Screen dimensions Must match viewport Set realistic viewport (1280x800, 1920x1080)
Timezone Must match IP geolocation Set timezone_id in browser context
Language Must match Accept-Language header Set locale in browser context

Behavioral Analysis

TikTok tracks mouse movement patterns, scroll velocity, and time between actions. A scraper that instantly scrolls at perfect intervals looks robotic.

# Bad: fixed timing (easily detected)
for _ in range(5):
    await page.evaluate("window.scrollBy(0, 800)")
    await asyncio.sleep(1.5)  # too regular

# Good: human-like variation
for i in range(5):
    scroll_amount = random.randint(400, 1200)
    await page.evaluate(f"window.scrollBy(0, {scroll_amount})")
    await asyncio.sleep(random.uniform(1.0, 3.5))

    # Occasionally pause longer (humans get distracted)
    if random.random() < 0.2:
        await asyncio.sleep(random.uniform(3.0, 8.0))

IP Reputation

Datacenter IPs get flagged immediately. For production use, residential proxies are essential. ThorData provides residential proxies with rotating IPs that work well for TikTok — their pool covers consumer ISP addresses that TikTok doesn't block.

# Using a proxy with Playwright
context = await browser.new_context(
    proxy={
        "server": "http://proxy.thordata.com:9000",
        "username": "your_username",
        "password": "your_password",
    },
    # Match the browser fingerprint to proxy location
    timezone_id="America/New_York",
    locale="en-US",
)

Pro tip: Match the browser's timezone and locale to the proxy's geographic location. A request from a New York residential IP with Asia/Tokyo timezone is an obvious red flag.

Rate Limiting: The Hard Numbers

TikTok rate limits are aggressive. Based on community observations in 2026:

Action Safe Rate Danger Zone Hard Block
Profile views 1 per 3 seconds > 1/sec > 5/sec
Video page loads 1 per 2 seconds > 2/sec > 10/sec
Comment fetching 1 per 4 seconds > 1/sec > 3/sec
Same profile repeat Max 50 req/session > 100/session > 200/session
Session length Under 10 min per IP 10-30 min > 30 min

Exceeding these doesn't immediately block you — TikTok often returns empty results or serves a CAPTCHA page instead of a hard 429. Always check that your responses actually contain data, not just HTTP 200s with empty arrays.

# Defensive check — TikTok returns 200 with empty data
# when rate-limited
data = await response.json()
items = data.get("itemList", [])
if not items and data.get("statusCode") == 0:
    print("WARNING: Empty response — likely rate-limited")
    await asyncio.sleep(30)  # back off significantly

Exporting Data for Analysis

Once you have the data, here's how to export it for common use cases:

import csv
import json


def export_to_csv(
    videos: list[dict], filename: str = "tiktok_data.csv"
):
    """Export video data to CSV for spreadsheet analysis."""
    if not videos:
        return

    fieldnames = [
        "id", "description", "views", "likes", "comments",
        "shares", "saves", "duration", "sound", "hashtags",
        "created", "url",
    ]
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for v in videos:
            row = {
                **v,
                "hashtags": ", ".join(v.get("hashtags", [])),
            }
            writer.writerow(
                {k: row.get(k, "") for k in fieldnames}
            )

    print(f"Exported {len(videos)} videos to {filename}")


def engagement_summary(videos: list[dict]):
    """Print engagement rate analysis."""
    if not videos:
        return

    total_views = sum(v["views"] for v in videos)
    total_likes = sum(v["likes"] for v in videos)
    total_comments = sum(v["comments"] for v in videos)
    avg_engagement = (
        (total_likes + total_comments) / total_views * 100
        if total_views > 0
        else 0
    )

    print(f"\nEngagement Summary ({len(videos)} videos):")
    print(f"  Total views:    {total_views:>15,}")
    print(f"  Total likes:    {total_likes:>15,}")
    print(f"  Total comments: {total_comments:>15,}")
    print(f"  Avg engagement: {avg_engagement:>14.2f}%")
    print(f"  Avg views/video:{total_views // len(videos):>15,}")

Example Engagement Output

Engagement Summary (30 videos):
  Total views:      2,847,300,000
  Total likes:        198,500,000
  Total comments:      12,400,000
  Avg engagement:            7.40%
  Avg views/video:      94,910,000

Skip the Setup: Ready-Made Scraper

If you need TikTok data without maintaining infrastructure, there's a free TikTok Scraper on Apify that handles anti-bot measures, IP rotation, and pagination. You provide usernames or video URLs and get back structured JSON with video stats, comments, and profile data. Useful as a baseline before deciding whether to build your own pipeline.

Summary

Method Effort Reliability Cost Best For
Official Research API High (application) High Free (if approved) Academic research
Web endpoint scraping Medium Medium Proxy cost Quick prototypes
Playwright automation Medium High Proxy + compute Production scraping
Mobile API replay High Low (fragile) Dev time Specific data points
Apify/third-party Low Medium Usage-based One-off data pulls

For most projects, Playwright with residential proxies hits the right balance. Use domcontentloaded, intercept API responses rather than parsing the DOM, keep your request rate under 1 per 3 seconds, and rotate IPs regularly. Match your browser fingerprint to your proxy location, add human-like timing variation, and always validate that responses contain actual data rather than empty 200s.