How to Scrape Twitter/X in 2026: What Still Works

2026-03-30 [twitter x scraping api python]

Twitter's transformation into X came with a brutal API repricing that effectively closed off data access for anyone without a serious budget. In 2023, Elon Musk's team killed the free developer tier and restructured pricing into tiers that make Reddit's 2023 debacle look tame by comparison.

Here's the current pricing reality and what still works for scraping in 2026 — including complete Python code, anti-detection techniques, proxy rotation strategies, and how to handle rate limits at scale.

What Twitter/X API Access Costs Now

The pricing history tells the story:

2022 and before: Free API access for developers, generous rate limits
February 2023: Free tier eliminated with ~48 hours notice
Basic tier: $100/month — 10,000 tweets/month read access (essentially nothing at scale)
Pro tier: $5,000/month — 1 million tweets/month
Enterprise: Negotiated, typically $42,000+/month

For context: $100/month gets you roughly 333 tweets per day. A single trending topic can generate that many tweets in seconds. The pricing is deliberately prohibitive for anyone except large enterprises.

This pushed most developers toward unofficial methods. The two primary approaches that still work in 2026 are the guest token approach (using Twitter's own internal GraphQL API) and scraping rendered HTML for simpler data extraction.

The Guest Token Approach

Twitter's web client — the browser version you see at twitter.com — doesn't require you to log in to view public content. Under the hood, it authenticates using a guest token that Twitter issues automatically to unauthenticated clients.

This is not a bug or an exploit. It's how the public-facing web app works. But it's also the mechanism that powers most Twitter scrapers today.

Step 1: Get a Guest Token

Twitter issues guest tokens via a simple POST request:

import httpx
import time
import random

# This is the publicly known static Bearer token used by Twitter's web app
# It's embedded in the JavaScript bundle at twitter.com and is well-documented
APP_BEARER = (
    "AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I4xsZZZHhjA%3D"
    "Uoblefzeg0AgDisF3Td0dDAitKR6KCJL4dGnB24FFVoaA2l7"
)

BROWSER_UA = (
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/124.0.0.0 Safari/537.36"
)

def get_guest_token() -> str:
    """Fetch a fresh guest token from Twitter's activate endpoint."""
    resp = httpx.post(
        "https://api.twitter.com/1.1/guest/activate.json",
        headers={
            "Authorization": f"Bearer {APP_BEARER}",
            "User-Agent": BROWSER_UA,
            "Content-Type": "application/x-www-form-urlencoded",
        },
        timeout=10,
    )
    resp.raise_for_status()
    return resp.json()["guest_token"]

The APP_BEARER token above is Twitter's own app-level token embedded in their public web client. It's been publicly documented and extracted from Twitter's JavaScript bundle for years — searching "Twitter app Bearer token" will turn up the same value across hundreds of GitHub repos and blog posts.

Step 2: Fetch User Profile via GraphQL

Twitter's internal API uses GraphQL endpoints at twitter.com/i/api/graphql/. With a guest token, you can query public profile data:

import json

def build_headers(guest_token: str) -> dict:
    """Build the full header set that mimics the web client."""
    return {
        "Authorization": f"Bearer {APP_BEARER}",
        "x-guest-token": guest_token,
        "User-Agent": BROWSER_UA,
        "x-twitter-client-language": "en",
        "x-twitter-active-user": "yes",
        "Content-Type": "application/json",
        "Accept": "*/*",
        "Referer": "https://twitter.com/",
        "Origin": "https://twitter.com",
    }

def get_user_profile(username: str, guest_token: str) -> dict:
    variables = {
        "screen_name": username,
        "withSafetyModeUserFields": True,
    }
    features = {
        "hidden_profile_likes_enabled": False,
        "responsive_web_graphql_exclude_directive_enabled": True,
        "verified_phone_label_enabled": False,
        "subscriptions_verification_info_is_identity_verified_enabled": False,
        "subscriptions_verification_info_verified_since_enabled": True,
        "highlights_tweets_tab_ui_enabled": True,
        "creator_subscriptions_tweet_preview_api_enabled": True,
        "responsive_web_graphql_skip_user_profile_image_extensions_enabled": False,
        "responsive_web_graphql_timeline_navigation_enabled": True,
    }

    resp = httpx.get(
        "https://twitter.com/i/api/graphql/G3KGOASz96M-Qu0nwmGXNg/UserByScreenName",
        params={
            "variables": json.dumps(variables),
            "features": json.dumps(features),
        },
        headers=build_headers(guest_token),
        timeout=10,
    )
    resp.raise_for_status()
    return resp.json()["data"]["user"]["result"]

# Usage
token = get_guest_token()
user = get_user_profile("elonmusk", token)
legacy = user["legacy"]
print(f"Name: {legacy['name']}")
print(f"Followers: {legacy['followers_count']:,}")
print(f"Tweets: {legacy['statuses_count']:,}")
print(f"Bio: {legacy['description']}")
print(f"Location: {legacy['location']}")
print(f"Verified: {legacy.get('verified', False)}")

Step 3: Fetch User Timeline

def get_user_tweets(user_id: str, guest_token: str, count: int = 20, cursor: str = None) -> tuple:
    """
    Fetch tweets from a user's timeline.
    Returns (tweets_list, next_cursor) for pagination.
    """
    variables = {
        "userId": user_id,
        "count": count,
        "includePromotedContent": False,
        "withQuickPromoteEligibilityTweetFields": True,
        "withVoice": True,
        "withV2Timeline": True,
    }
    if cursor:
        variables["cursor"] = cursor

    features = {
        "rweb_lists_timeline_redesign_enabled": True,
        "responsive_web_graphql_exclude_directive_enabled": True,
        "verified_phone_label_enabled": False,
        "creator_subscriptions_tweet_preview_api_enabled": True,
        "responsive_web_graphql_timeline_navigation_enabled": True,
        "responsive_web_graphql_skip_user_profile_image_extensions_enabled": False,
        "tweetypie_unmention_optimization_enabled": True,
        "responsive_web_edit_tweet_api_enabled": True,
        "graphql_is_translatable_rweb_tweet_is_translatable_enabled": True,
        "view_counts_everywhere_api_enabled": True,
        "longform_notetweets_consumption_enabled": True,
        "responsive_web_twitter_article_tweet_consumption_enabled": False,
        "tweet_awards_web_tipping_enabled": False,
        "freedom_of_speech_not_reach_fetch_enabled": True,
        "standardized_nudges_misinfo": True,
        "tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled": True,
        "longform_notetweets_rich_text_read_enabled": True,
        "longform_notetweets_inline_media_enabled": True,
        "responsive_web_enhance_cards_enabled": False,
    }

    resp = httpx.get(
        "https://twitter.com/i/api/graphql/V1ze5q3ijDS1VeLwLY0m7g/UserTweets",
        params={
            "variables": json.dumps(variables),
            "features": json.dumps(features),
        },
        headers=build_headers(guest_token),
        timeout=15,
    )
    resp.raise_for_status()

    tweets = []
    next_cursor = None
    instructions = (
        resp.json()
        .get("data", {})
        .get("user", {})
        .get("result", {})
        .get("timeline_v2", {})
        .get("timeline", {})
        .get("instructions", [])
    )
    for instr in instructions:
        for entry in instr.get("entries", []):
            entry_id = entry.get("entryId", "")
            content = entry.get("content", {})

            # Cursor entries for pagination
            if content.get("entryType") == "TimelineTimelineCursor":
                if content.get("cursorType") == "Bottom":
                    next_cursor = content.get("value")
                continue

            if content.get("entryType") == "TimelineTimelineItem":
                tweet_result = (
                    content.get("itemContent", {})
                    .get("tweet_results", {})
                    .get("result", {})
                )
                if tweet_result:
                    core = tweet_result.get("legacy", {})
                    tweets.append({
                        "id": core.get("id_str"),
                        "text": core.get("full_text"),
                        "likes": core.get("favorite_count"),
                        "retweets": core.get("retweet_count"),
                        "replies": core.get("reply_count"),
                        "quotes": core.get("quote_count"),
                        "bookmarks": core.get("bookmark_count"),
                        "views": tweet_result.get("views", {}).get("count"),
                        "created_at": core.get("created_at"),
                        "lang": core.get("lang"),
                        "is_retweet": "retweeted_status_result" in tweet_result,
                        "media": [
                            m.get("media_url_https")
                            for m in core.get("extended_entities", {}).get("media", [])
                        ],
                    })
    return tweets, next_cursor


def get_all_user_tweets(user_id: str, max_tweets: int = 200) -> list:
    """Paginate through a user's full timeline."""
    all_tweets = []
    token = get_guest_token()
    cursor = None
    request_count = 0

    while len(all_tweets) < max_tweets:
        # Refresh guest token every 50 requests
        if request_count > 0 and request_count % 50 == 0:
            token = get_guest_token()

        tweets, cursor = get_user_tweets(user_id, token, count=20, cursor=cursor)
        all_tweets.extend(tweets)
        request_count += 1

        if not cursor or not tweets:
            break

        # Rate limit: 1-2 seconds between requests
        time.sleep(random.uniform(1.0, 2.5))

    return all_tweets[:max_tweets]

Step 4: Search Tweets

Twitter's search endpoint works similarly:

def search_tweets(query: str, guest_token: str, count: int = 20, cursor: str = None) -> tuple:
    """Search for tweets matching a query."""
    variables = {
        "rawQuery": query,
        "count": count,
        "querySource": "typed_query",
        "product": "Latest",  # or "Top" for algorithmic ranking
    }
    if cursor:
        variables["cursor"] = cursor

    features = {
        "responsive_web_graphql_exclude_directive_enabled": True,
        "verified_phone_label_enabled": False,
        "creator_subscriptions_tweet_preview_api_enabled": True,
        "responsive_web_graphql_timeline_navigation_enabled": True,
        "responsive_web_graphql_skip_user_profile_image_extensions_enabled": False,
        "longform_notetweets_consumption_enabled": True,
        "responsive_web_twitter_article_tweet_consumption_enabled": False,
        "tweet_awards_web_tipping_enabled": False,
        "freedom_of_speech_not_reach_fetch_enabled": True,
        "standardized_nudges_misinfo": True,
        "tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled": True,
        "longform_notetweets_rich_text_read_enabled": True,
        "longform_notetweets_inline_media_enabled": True,
        "responsive_web_enhance_cards_enabled": False,
    }

    resp = httpx.get(
        "https://twitter.com/i/api/graphql/nK1dw4oV3k4w5TdtcAdSww/SearchTimeline",
        params={
            "variables": json.dumps(variables),
            "features": json.dumps(features),
        },
        headers=build_headers(guest_token),
        timeout=15,
    )
    resp.raise_for_status()

    tweets = []
    next_cursor = None
    data = resp.json()
    instructions = (
        data.get("data", {})
        .get("search_by_raw_query", {})
        .get("search_timeline", {})
        .get("timeline", {})
        .get("instructions", [])
    )

    for instr in instructions:
        for entry in instr.get("entries", []):
            content = entry.get("content", {})
            if content.get("cursorType") == "Bottom":
                next_cursor = content.get("value")
            elif content.get("entryType") == "TimelineTimelineItem":
                tweet_result = (
                    content.get("itemContent", {})
                    .get("tweet_results", {})
                    .get("result", {})
                )
                if tweet_result:
                    core = tweet_result.get("legacy", {})
                    tweets.append({
                        "text": core.get("full_text"),
                        "user": tweet_result.get("core", {})
                            .get("user_results", {})
                            .get("result", {})
                            .get("legacy", {})
                            .get("screen_name"),
                        "likes": core.get("favorite_count"),
                        "retweets": core.get("retweet_count"),
                        "created_at": core.get("created_at"),
                    })

    return tweets, next_cursor

# Example: search for tweets about a topic
token = get_guest_token()
results, cursor = search_tweets("#Python programming", token, count=20)
for t in results[:5]:
    print(f"@{t['user']}: {t['text'][:100]}")

Rate Limits and Avoiding Blocks

Guest tokens have aggressive rate limits, and Twitter actively detects scraping patterns:

What Triggers Blocks

More than ~50-100 requests per 15 minutes from the same IP
Reusing the same guest token across too many requests (tokens expire or get flagged)
Missing or inconsistent browser headers
Sequential or predictable timing patterns
Accessing endpoints too quickly in succession
Using datacenter IP ranges (AWS, GCP, Azure, DigitalOcean — all flagged)

Anti-Detection Checklist

import random
import time

# 1. Always rotate guest tokens frequently
class TwitterSession:
    def __init__(self, proxy_url=None):
        self.proxy_url = proxy_url
        self.token = None
        self.request_count = 0
        self.token_lifetime = random.randint(40, 70)  # randomize rotation

    def get_client(self):
        kwargs = {"timeout": 15}
        if self.proxy_url:
            kwargs["proxy"] = self.proxy_url
        return httpx.Client(**kwargs)

    def get_headers(self):
        if not self.token or self.request_count >= self.token_lifetime:
            self.token = get_guest_token()
            self.request_count = 0
            self.token_lifetime = random.randint(40, 70)
        self.request_count += 1
        return build_headers(self.token)

    def sleep(self):
        """Human-like delay between requests."""
        time.sleep(random.uniform(1.5, 4.0))


# 2. Use a pool of user agents
USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15",
]


# 3. Add jitter to timing — never request at exact intervals
def smart_sleep(base=2.0, jitter=1.5):
    delay = base + random.uniform(0, jitter)
    time.sleep(delay)

Using Proxies for Scale

For high-volume scraping — monitoring hashtags, tracking competitor accounts, building datasets — the rate limits on guest tokens become the bottleneck quickly.

Rotating residential proxies are the standard solution. Each new IP effectively gets fresh rate limit headroom. Twitter actively blocks datacenter IP ranges (AWS, GCP, DigitalOcean), so residential proxies are a hard requirement for sustained scraping.

ThorData provides residential proxy pools with 200M+ IPs across 190+ countries, specifically suited for social media scraping. Their sticky session feature lets you maintain a consistent IP identity across a session, which looks more natural than rotating on every request.

# ThorData proxy configuration
PROXY_USER = "your_username"
PROXY_PASS = "your_password"
PROXY_HOST = "proxy.thordata.net"
PROXY_PORT = 9000

def make_proxy_url(country_code=None):
    """Build a ThorData proxy URL with optional country targeting."""
    user = PROXY_USER
    if country_code:
        # Country-targeted residential IP
        user = f"{PROXY_USER}-country-{country_code}"
    return f"http://{user}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"

# US residential proxy (best for Twitter's main content)
session_us = TwitterSession(proxy_url=make_proxy_url("us"))

# UK residential proxy (for UK trending topics)
session_uk = TwitterSession(proxy_url=make_proxy_url("gb"))

# Usage with proxy
with httpx.Client(proxy=make_proxy_url("us"), timeout=15) as client:
    # All requests through US residential IP
    resp = client.post(
        "https://api.twitter.com/1.1/guest/activate.json",
        headers={"Authorization": f"Bearer {APP_BEARER}"},
    )
    token = resp.json()["guest_token"]

Parsing and Storing Tweet Data

Once you have tweets, store them properly for analysis:

import sqlite3
import json
from datetime import datetime

def setup_database(db_path="tweets.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS tweets (
            id TEXT PRIMARY KEY,
            username TEXT,
            text TEXT,
            likes INTEGER,
            retweets INTEGER,
            replies INTEGER,
            quotes INTEGER,
            views INTEGER,
            created_at TEXT,
            lang TEXT,
            is_retweet BOOLEAN,
            scraped_at TEXT
        )
    """)
    conn.execute("CREATE INDEX IF NOT EXISTS idx_username ON tweets(username)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_created ON tweets(created_at)")
    conn.commit()
    return conn


def save_tweets(conn, tweets: list, username: str):
    """Save tweets to SQLite, ignoring duplicates."""
    now = datetime.utcnow().isoformat()
    rows = []
    for t in tweets:
        rows.append((
            t.get("id"),
            username,
            t.get("text"),
            t.get("likes", 0),
            t.get("retweets", 0),
            t.get("replies", 0),
            t.get("quotes", 0),
            t.get("views", 0),
            t.get("created_at"),
            t.get("lang"),
            t.get("is_retweet", False),
            now,
        ))

    conn.executemany(
        """INSERT OR IGNORE INTO tweets
           (id, username, text, likes, retweets, replies, quotes, views,
            created_at, lang, is_retweet, scraped_at)
           VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
        rows
    )
    conn.commit()
    return len(rows)


# Full pipeline
conn = setup_database()
token = get_guest_token()
user = get_user_profile("python", token)
user_id = user["rest_id"]

tweets = get_all_user_tweets(user_id, max_tweets=100)
saved = save_tweets(conn, tweets, "python")
print(f"Saved {saved} tweets to database")

Twitter's trending topics are available through the explore tab:

def get_trending_topics(woeid: int = 1, guest_token: str = None) -> list:
    """
    Get trending topics.
    woeid=1 is worldwide; use country/city WOEID for local trends.
    Common WOEIDs: 23424977=USA, 44418=London, 615702=Paris
    """
    if not guest_token:
        guest_token = get_guest_token()

    resp = httpx.get(
        f"https://api.twitter.com/1.1/trends/place.json",
        params={"id": woeid},
        headers=build_headers(guest_token),
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()
    if data and len(data) > 0:
        return data[0].get("trends", [])
    return []

trends = get_trending_topics(woeid=1)
for t in trends[:10]:
    tweet_vol = t.get("tweet_volume") or "N/A"
    print(f"{t['name']}: {tweet_vol} tweets")

Handling Errors and Retries

Production scrapers need robust error handling:

import httpx
from typing import Optional

class RateLimitError(Exception):
    pass

class TwitterAPIError(Exception):
    pass

def safe_request(url: str, params: dict, headers: dict,
                 max_retries: int = 3, proxy: str = None) -> dict:
    """Make a Twitter API request with retry logic."""
    kwargs = {"params": params, "headers": headers, "timeout": 15}
    if proxy:
        kwargs["proxy"] = proxy

    for attempt in range(max_retries):
        try:
            resp = httpx.get(url, **kwargs)

            if resp.status_code == 429:
                wait = 60 * (attempt + 1)
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
                continue

            if resp.status_code == 401:
                # Guest token expired — refresh
                raise RateLimitError("Token expired, need refresh")

            if resp.status_code == 403:
                raise TwitterAPIError(f"Forbidden: {resp.text[:200]}")

            resp.raise_for_status()
            data = resp.json()

            # Check for embedded errors
            if "errors" in data:
                for err in data["errors"]:
                    if err.get("code") in [88, 326]:  # rate limit / account locked
                        raise RateLimitError(f"API error: {err}")

            return data

        except httpx.TimeoutException:
            if attempt == max_retries - 1:
                raise
            time.sleep(5 * (attempt + 1))

        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            # Get fresh token
            headers = build_headers(get_guest_token())
            time.sleep(30)

    raise TwitterAPIError("Max retries exceeded")

The Nitter Alternative

Nitter was an open-source Twitter frontend that let you browse and scrape Twitter without using any official API. At peak it had hundreds of public instances.

Current status in 2026: Most public Nitter instances are dead. Twitter blocked the IP ranges systematically, and maintaining a working instance requires constant updates as Twitter changes its internals. A few private instances still exist but aren't publicly listed.

Don't build anything production-critical on Nitter in 2026.

What You Can and Can't Get

With the guest token approach:

Data Type	Available	Notes
Public tweets & timelines	Yes	Up to ~3,200 recent tweets
User profiles	Yes	Full bio, stats, metadata
Engagement metrics	Yes	Likes, RTs, replies, quotes, views, bookmarks
Search (recent tweets)	Yes	Limited to recent; full history requires paid API
Trending topics	Yes	Global and per-country
Replies and quote tweets	Yes	With pagination
Media (images, videos)	Yes	URL extraction from tweet data
DMs	No	Requires authenticated account
Private accounts	No	Requires follow relationship
Historical tweets (>3,200)	No	Twitter API limit
Real-time streaming	No	Requires paid Streaming API

Complete Working Example

import httpx
import json
import time
import random
import sqlite3
from datetime import datetime

# Configuration
APP_BEARER = (
    "AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I4xsZZZHhjA%3D"
    "Uoblefzeg0AgDisF3Td0dDAitKR6KCJL4dGnB24FFVoaA2l7"
)

def main():
    """Full pipeline: get user profile and recent tweets."""
    # Setup
    conn = setup_database()
    token = get_guest_token()

    # Target accounts to monitor
    accounts = ["python", "fastapi", "github"]

    for username in accounts:
        print(f"\nFetching @{username}...")

        # Get profile
        try:
            profile = get_user_profile(username, token)
            user_id = profile["rest_id"]
            legacy = profile["legacy"]
            print(f"  {legacy['name']}: {legacy['followers_count']:,} followers")
        except Exception as e:
            print(f"  Error fetching profile: {e}")
            continue

        # Get tweets with pagination
        tweets, cursor = get_user_tweets(user_id, token, count=20)
        all_tweets = list(tweets)

        # Fetch 2 more pages
        for _ in range(2):
            if not cursor:
                break
            time.sleep(random.uniform(2, 4))
            tweets, cursor = get_user_tweets(user_id, token, count=20, cursor=cursor)
            all_tweets.extend(tweets)

        saved = save_tweets(conn, all_tweets, username)
        print(f"  Saved {saved} tweets")

        # Polite delay between accounts
        time.sleep(random.uniform(3, 7))

    conn.close()
    print("\nDone!")

if __name__ == "__main__":
    main()

Summary

Twitter's API pricing in 2026 is designed for enterprises, not developers. But the guest token + GraphQL approach gives you a practical path to public tweet data without spending $100-5,000/month:

POST to /1.1/guest/activate.json with the public app Bearer token
Use the guest token in x-guest-token header on GraphQL requests
Stay under ~50 req/15min per IP, rotate tokens every 40-70 requests
Use residential proxies (like ThorData) for any serious scale — datacenter IPs are reliably blocked
Store results in SQLite and implement proper retry logic

The endpoints change occasionally when Twitter pushes updates, but the core mechanism — guest tokens authenticating against the same GraphQL API the web app uses — has been stable for years and is likely to remain the primary free access path for the foreseeable future.