Scrape LeetCode Problems: Difficulty, Tags & Acceptance Rates (2026)

2026-04-09 [python leetcode scraping graphql api]

Scrape LeetCode Problems: Difficulty, Tags & Acceptance Rates (2026)

LeetCode has over 3,000 problems and they're adding more every week. If you're building a study tracker, analyzing which topics are most tested, or creating a recommendation engine for coding practice, you need programmatic access to that problem data.

LeetCode doesn't have a documented public API, but their frontend talks to a GraphQL endpoint. That's your way in.

LeetCode's GraphQL API

The frontend at leetcode.com uses https://leetcode.com/graphql/ for all data fetching. You can use the same queries the website makes. Every problem listing, difficulty filter, tag lookup, and submission stat flows through this endpoint.

Dependencies and Setup

pip install requests httpx beautifulsoup4

We'll use requests for the primary scraper and httpx for async fallbacks.

Basic Problem List Query

import requests
import time
import json

LEETCODE_GRAPHQL = "https://leetcode.com/graphql/"

SESSION = requests.Session()
SESSION.headers.update({
    "Content-Type": "application/json",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Referer": "https://leetcode.com/problemset/",
    "Accept": "application/json",
    "Accept-Language": "en-US,en;q=0.9",
})


def get_problem_list(skip=0, limit=50, filters=None):
    """Fetch a page of LeetCode problems."""
    query = """
    query problemsetQuestionList($categorySlug: String, $limit: Int,
        $skip: Int, $filters: QuestionListFilterInput) {
        problemsetQuestionList: questionList(
            categorySlug: $categorySlug
            limit: $limit
            skip: $skip
            filters: $filters
        ) {
            total: totalNum
            questions: data {
                questionId
                questionFrontendId
                title
                titleSlug
                difficulty
                acRate
                isPaidOnly
                topicTags {
                    name
                    slug
                }
                stats
                status
                likes
                dislikes
            }
        }
    }
    """

    variables = {
        "categorySlug": "all-code-essentials",
        "skip": skip,
        "limit": limit,
        "filters": filters or {},
    }

    response = SESSION.post(
        LEETCODE_GRAPHQL,
        json={"query": query, "variables": variables},
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()

    if "errors" in data:
        print(f"GraphQL errors: {data['errors']}")
        return [], 0

    result = data.get("data", {}).get("problemsetQuestionList", {})
    return result.get("questions", []), result.get("total", 0)


# Fetch first page of problems
problems, total = get_problem_list(skip=0, limit=20)
print(f"Total problems: {total}")
for p in problems[:5]:
    tags = ", ".join(t["name"] for t in p["topicTags"])
    print(f"#{p['questionFrontendId']} {p['title']} "
          f"[{p['difficulty']}] {p['acRate']:.1f}% - {tags}")

Fetching All Problems with Full Pagination

LeetCode paginates at 50 problems per request. To get the full set:

def fetch_all_problems(delay: float = 1.5) -> list:
    """Fetch every LeetCode problem with pagination."""
    all_problems = []
    skip = 0
    limit = 50

    # First request to get total
    batch, total = get_problem_list(skip=0, limit=limit)
    all_problems.extend(batch)
    print(f"Total problems to fetch: {total}")

    skip = limit
    while skip < total:
        batch, _ = get_problem_list(skip=skip, limit=limit)
        if not batch:
            print(f"Empty batch at skip={skip}, stopping")
            break
        all_problems.extend(batch)
        print(f"  Fetched {len(all_problems)}/{total}")
        skip += limit
        time.sleep(delay)  # Respect rate limits

    return all_problems


all_problems = fetch_all_problems()
print(f"\nCollected {len(all_problems)} problems total")

# Save raw data
with open("leetcode_problems.json", "w") as f:
    json.dump(all_problems, f, indent=2)

Getting Detailed Problem Data

The list endpoint gives you metadata, but for full problem details — description, hints, solution count, company tags — you need per-problem queries:

def get_problem_detail(title_slug: str) -> dict | None:
    """Get full details for a single problem including hints and code snippets."""
    query = """
    query questionData($titleSlug: String!) {
        question(titleSlug: $titleSlug) {
            questionId
            questionFrontendId
            title
            titleSlug
            content
            difficulty
            likes
            dislikes
            categoryTitle
            isPaidOnly
            stats
            hints
            similarQuestions
            topicTags {
                name
                slug
            }
            codeSnippets {
                lang
                langSlug
                code
            }
            sampleTestCase
            metaData
            judgerAvailable
            judgeType
            mysqlSchemas
            enableRunCode
            enableTestMode
        }
    }
    """

    response = SESSION.post(
        LEETCODE_GRAPHQL,
        json={"query": query, "variables": {"titleSlug": title_slug}},
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()

    if "errors" in data:
        return None

    return data.get("data", {}).get("question")


# Get details for a specific problem
detail = get_problem_detail("two-sum")
if detail:
    stats = json.loads(detail.get("stats", "{}"))
    print(f"Problem: {detail['title']}")
    print(f"Difficulty: {detail['difficulty']}")
    print(f"Total submissions: {stats.get('totalSubmissionRaw', 'N/A'):,}")
    print(f"Total accepted: {stats.get('totalAcceptedRaw', 'N/A'):,}")
    print(f"Hints: {len(detail.get('hints', []))}")
    similar = json.loads(detail.get('similarQuestions', '[]'))
    print(f"Similar problems: {len(similar)}")
    print(f"Languages with snippets: {len(detail.get('codeSnippets', []))}")
    print(f"Topic tags: {', '.join(t['name'] for t in detail.get('topicTags', []))}")

Filtering by Topic and Difficulty

The filters parameter supports topic tags, difficulty, and status filtering:

def get_problems_by_topic(topic_slug: str, difficulty: str = None) -> list:
    """Get problems filtered by topic tag and optional difficulty."""
    filters = {"tags": [topic_slug]}
    if difficulty:
        filters["difficulty"] = difficulty.upper()

    all_problems = []
    skip = 0

    while True:
        batch, total = get_problem_list(skip=skip, limit=50, filters=filters)
        if not batch:
            break
        all_problems.extend(batch)
        print(f"  {len(all_problems)}/{total} problems with tag '{topic_slug}'")
        skip += 50
        if skip >= total:
            break
        time.sleep(1.5)

    return all_problems


def get_problems_by_difficulty(difficulty: str = "Hard") -> list:
    """Get all problems of a specific difficulty level."""
    filters = {"difficulty": difficulty.upper()}
    return _paginate_problems(filters)


def _paginate_problems(filters: dict, delay: float = 1.5) -> list:
    """Generic paginator for filtered problem sets."""
    all_problems = []
    skip = 0
    while True:
        batch, total = get_problem_list(skip=skip, limit=50, filters=filters)
        if not batch:
            break
        all_problems.extend(batch)
        skip += 50
        if skip >= total:
            break
        time.sleep(delay)
    return all_problems


# Examples
hard_dp = get_problems_by_topic("dynamic-programming", difficulty="Hard")
print(f"\nHard DP problems: {len(hard_dp)}")
for p in sorted(hard_dp, key=lambda x: x["acRate"])[:5]:
    print(f"  #{p['questionFrontendId']} {p['title']} - {p['acRate']:.1f}% acceptance")

# Get all binary search problems
binary_search = get_problems_by_topic("binary-search")
print(f"\nBinary search problems: {len(binary_search)}")
avg_acceptance = sum(p["acRate"] for p in binary_search) / len(binary_search)
print(f"Average acceptance rate: {avg_acceptance:.1f}%")

Fetching Company-Tagged Problems (Premium)

Company tags (which companies ask which problems) require LeetCode Premium. If you have an account, you can authenticate and query this data:

def authenticate_leetcode(username: str, password: str) -> requests.Session:
    """
    Authenticate with LeetCode to access premium features.
    Returns an authenticated session.
    """
    session = requests.Session()
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
        "Referer": "https://leetcode.com/",
    })

    # Get CSRF token from login page
    login_page = session.get("https://leetcode.com/accounts/login/")
    csrf_token = session.cookies.get("csrftoken")

    if not csrf_token:
        raise ValueError("Could not obtain CSRF token")

    # Submit login
    login_resp = session.post(
        "https://leetcode.com/accounts/login/",
        data={
            "login": username,
            "password": password,
            "csrfmiddlewaretoken": csrf_token,
        },
        headers={"X-CSRFToken": csrf_token},
    )

    if "profile" not in login_resp.url:
        raise ValueError("Login failed — check credentials")

    # Update CSRF token for subsequent API requests
    new_csrf = session.cookies.get("csrftoken")
    if new_csrf:
        session.headers["X-CSRFToken"] = new_csrf

    return session


def get_company_problems(company_slug: str, auth_session: requests.Session) -> list:
    """Get problems associated with a specific company (requires Premium)."""
    query = """
    query getCompanyTag($slug: String!) {
        companyTag(slug: $slug) {
            name
            questions {
                questionId
                questionFrontendId
                title
                titleSlug
                difficulty
                acRate
                topicTags { name slug }
                frequencyTimebar
            }
        }
    }
    """

    resp = auth_session.post(
        LEETCODE_GRAPHQL,
        json={"query": query, "variables": {"slug": company_slug}},
        timeout=30,
    )
    data = resp.json()

    if "errors" in data:
        return []

    tag_data = data.get("data", {}).get("companyTag")
    if not tag_data:
        return []

    return tag_data.get("questions", [])

Analysis: Topic Distribution and Difficulty Curves

from collections import Counter, defaultdict

def analyze_problem_set(problems: list) -> dict:
    """Analyze the full problem set for patterns."""
    difficulty_count = Counter()
    topic_count = Counter()
    topic_by_difficulty = defaultdict(Counter)
    acceptance_by_difficulty = defaultdict(list)

    for p in problems:
        diff = p["difficulty"]
        difficulty_count[diff] += 1
        acceptance_by_difficulty[diff].append(p["acRate"])

        for tag in p["topicTags"]:
            topic_count[tag["name"]] += 1
            topic_by_difficulty[diff][tag["name"]] += 1

    print("=== Difficulty Distribution ===")
    for diff in ["Easy", "Medium", "Hard"]:
        count = difficulty_count[diff]
        rates = acceptance_by_difficulty[diff]
        avg_rate = sum(rates) / len(rates) if rates else 0
        print(f"  {diff}: {count} problems, avg acceptance: {avg_rate:.1f}%")

    print("\n=== Top 15 Topics by Problem Count ===")
    for topic, count in topic_count.most_common(15):
        hard_count = topic_by_difficulty["Hard"][topic]
        print(f"  {topic}: {count} problems ({hard_count} Hard)")

    # Find hardest topics by average acceptance rate
    topic_rates = defaultdict(list)
    for p in problems:
        for tag in p["topicTags"]:
            topic_rates[tag["name"]].append(p["acRate"])

    print("\n=== Hardest Topics (lowest avg acceptance, min 10 problems) ===")
    avg_rates = {
        topic: sum(rates) / len(rates)
        for topic, rates in topic_rates.items()
        if len(rates) >= 10
    }
    for topic, rate in sorted(avg_rates.items(), key=lambda x: x[1])[:10]:
        print(f"  {topic}: {rate:.1f}% avg acceptance ({len(topic_rates[topic])} problems)")

    print("\n=== Easiest Topics (highest avg acceptance, min 10 problems) ===")
    for topic, rate in sorted(avg_rates.items(), key=lambda x: x[1], reverse=True)[:5]:
        print(f"  {topic}: {rate:.1f}% avg acceptance")

    # Compute likes/dislikes ratio by difficulty
    print("\n=== Community Rating by Difficulty ===")
    for diff in ["Easy", "Medium", "Hard"]:
        diff_probs = [p for p in problems if p["difficulty"] == diff and p.get("likes", 0) + p.get("dislikes", 0) > 0]
        if diff_probs:
            avg_ratio = sum(
                p["likes"] / (p["likes"] + p["dislikes"])
                for p in diff_probs
            ) / len(diff_probs)
            print(f"  {diff}: avg like ratio {avg_ratio:.2f}")

    return {
        "difficulty_count": dict(difficulty_count),
        "topic_count": dict(topic_count.most_common(30)),
        "avg_acceptance_by_difficulty": {
            d: round(sum(r) / len(r), 1)
            for d, r in acceptance_by_difficulty.items()
        },
        "topic_avg_acceptance": avg_rates,
    }


stats = analyze_problem_set(all_problems)

Building a Study Path Recommender

The acceptance rate and topic data enables smart study path recommendations:

def recommend_study_path(
    problems: list,
    target_topics: list,
    current_level: str = "Easy",
    min_acceptance_rate: float = 30.0,
) -> list:
    """
    Recommend problems for a study path based on target topics and skill level.

    Ordering logic:
    1. Filter by target topics
    2. Start with Easy or Medium problems that have high acceptance rates
    3. Progress to harder problems as user builds foundation
    """
    DIFFICULTY_ORDER = {"Easy": 0, "Medium": 1, "Hard": 2}
    START_LEVEL = DIFFICULTY_ORDER.get(current_level, 0)

    target_slugs = {t.lower().replace(" ", "-") for t in target_topics}

    # Filter problems matching target topics
    matching = [
        p for p in problems
        if any(tag["slug"] in target_slugs for tag in p["topicTags"])
        and not p.get("isPaidOnly", False)
    ]

    # Sort by difficulty progression, then by acceptance rate (easier/higher first)
    recommended = sorted(
        matching,
        key=lambda p: (
            max(0, DIFFICULTY_ORDER.get(p["difficulty"], 1) - START_LEVEL),
            -p["acRate"],
        )
    )

    # Group into phases
    phases = {
        "warmup": [p for p in recommended if p["difficulty"] == "Easy" and p["acRate"] >= 60],
        "core": [p for p in recommended if p["difficulty"] == "Medium" and p["acRate"] >= min_acceptance_rate],
        "challenge": [p for p in recommended if p["difficulty"] == "Hard"],
    }

    print(f"Study path for: {', '.join(target_topics)}")
    for phase, probs in phases.items():
        print(f"  {phase.capitalize()}: {len(probs)} problems")
        for p in probs[:3]:
            tags = ", ".join(t["name"] for t in p["topicTags"])
            print(f"    #{p['questionFrontendId']} {p['title']} ({p['acRate']:.0f}%) — {tags}")

    return phases


# Example: FAANG interview prep path
study_path = recommend_study_path(
    all_problems,
    target_topics=["arrays", "dynamic-programming", "graphs"],
    current_level="Medium",
)

Anti-Bot Measures and Rate Limits

LeetCode uses several protections on their GraphQL endpoint:

CSRF tokens — The site sets a csrftoken cookie that must be sent as an X-CSRFToken header for mutations. Read-only queries usually work without it.
Session-based rate limiting — Too many requests from one IP triggers 429 responses or temporary blocks. The threshold is roughly 20-30 requests per minute for unauthenticated users.
Cloudflare protection — LeetCode sits behind Cloudflare, which fingerprints your TLS stack and blocks known bot signatures.
Premium content gating — Company tags and some problem details require a paid LeetCode Premium account.

For collecting the full problem set (3,000+ problems at ~50 per request), you'll make around 60+ requests. At 1.5 seconds between requests, that's under 2 minutes — usually fine from a residential IP. But if you're running repeated collection jobs or scraping from a datacenter, a rotating proxy avoids Cloudflare blocks.

def create_leetcode_session(proxy_url: str = None) -> requests.Session:
    """Create a session with optional proxy for LeetCode scraping."""
    session = requests.Session()
    session.headers.update({
        "Content-Type": "application/json",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                      "AppleWebKit/537.36 Chrome/126.0.0.0 Safari/537.36",
        "Referer": "https://leetcode.com/problemset/",
        "Accept": "application/json",
        "Accept-Language": "en-US,en;q=0.9",
    })

    if proxy_url:
        session.proxies = {"http": proxy_url, "https": proxy_url}

    # Fetch the main page to get CSRF cookie — important for some mutations
    resp = session.get("https://leetcode.com/problemset/", timeout=30)
    csrf = session.cookies.get("csrftoken")
    if csrf:
        session.headers["X-CSRFToken"] = csrf

    return session


# For bulk collection, [ThorData](https://thordata.partnerstack.com/partner/0a0x4nzq (or [Oxylabs](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=2066&url_id=174)))
# residential proxies handle Cloudflare's TLS fingerprinting:
PROXY_URL = "http://YOUR_USER:[email protected]:9000"
session_with_proxy = create_leetcode_session(proxy_url=PROXY_URL)

Tracking Changes Over Time

LeetCode adds new problems weekly and updates acceptance rates continuously. Set up a scheduled collection to track what's new:

import sqlite3
import json
from datetime import datetime

def init_leetcode_db(db_path: str = "leetcode.db") -> sqlite3.Connection:
    """Create or connect to the LeetCode problems database."""
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS problems (
            question_id INTEGER,
            frontend_id TEXT,
            title TEXT,
            slug TEXT,
            difficulty TEXT,
            acceptance_rate REAL,
            is_paid BOOLEAN,
            topic_tags TEXT,
            likes INTEGER,
            dislikes INTEGER,
            collected_at TEXT,
            PRIMARY KEY (question_id, collected_at)
        );

        CREATE TABLE IF NOT EXISTS problem_snapshots (
            question_id INTEGER,
            acceptance_rate REAL,
            likes INTEGER,
            dislikes INTEGER,
            snapshot_date TEXT,
            PRIMARY KEY (question_id, snapshot_date)
        );

        CREATE INDEX IF NOT EXISTS idx_problems_slug ON problems(slug);
        CREATE INDEX IF NOT EXISTS idx_problems_difficulty ON problems(difficulty);
        CREATE INDEX IF NOT EXISTS idx_problems_date ON problems(collected_at);
    """)
    conn.commit()
    return conn


def store_problems(problems: list, db_path: str = "leetcode.db"):
    """Store problems with historical tracking."""
    conn = init_leetcode_db(db_path)
    collected_at = datetime.now().strftime("%Y-%m-%d")

    for p in problems:
        # Full record with date
        conn.execute(
            "INSERT OR REPLACE INTO problems VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
            (
                p["questionId"], p["questionFrontendId"],
                p["title"], p["titleSlug"], p["difficulty"],
                p["acRate"], p.get("isPaidOnly", False),
                json.dumps([t["name"] for t in p["topicTags"]]),
                p.get("likes", 0), p.get("dislikes", 0),
                collected_at,
            ),
        )
        # Lightweight snapshot for trending analysis
        conn.execute(
            "INSERT OR REPLACE INTO problem_snapshots VALUES (?, ?, ?, ?, ?)",
            (p["questionId"], p["acRate"], p.get("likes", 0), p.get("dislikes", 0), collected_at),
        )

    conn.commit()
    conn.close()
    print(f"Stored {len(problems)} problems for {collected_at}")


def find_new_problems(db_path: str = "leetcode.db") -> list:
    """Find problems added since last collection."""
    conn = sqlite3.connect(db_path)

    cursor = conn.execute("""
        SELECT DISTINCT p1.frontend_id, p1.title, p1.difficulty, p1.collected_at
        FROM problems p1
        WHERE p1.collected_at = (SELECT MAX(collected_at) FROM problems)
          AND p1.question_id NOT IN (
              SELECT question_id FROM problems
              WHERE collected_at < (SELECT MAX(collected_at) FROM problems)
          )
        ORDER BY CAST(p1.frontend_id AS INTEGER)
    """)

    new_problems = cursor.fetchall()
    conn.close()

    if new_problems:
        print(f"New problems since last run: {len(new_problems)}")
        for pid, title, diff, date in new_problems:
            print(f"  #{pid} {title} [{diff}] — added {date}")

    return new_problems


def find_trending_problems(db_path: str = "leetcode.db", days: int = 30) -> list:
    """Find problems with the biggest acceptance rate changes recently."""
    conn = sqlite3.connect(db_path)

    cursor = conn.execute("""
        SELECT
            p1.question_id,
            p1.acceptance_rate as current_rate,
            p2.acceptance_rate as old_rate,
            (p1.acceptance_rate - p2.acceptance_rate) as rate_change,
            prob.title,
            prob.difficulty
        FROM problem_snapshots p1
        JOIN problem_snapshots p2 ON p1.question_id = p2.question_id
        JOIN (
            SELECT question_id, title, difficulty
            FROM problems
            WHERE collected_at = (SELECT MAX(collected_at) FROM problems)
        ) prob ON p1.question_id = prob.question_id
        WHERE p1.snapshot_date = (SELECT MAX(snapshot_date) FROM problem_snapshots)
          AND p2.snapshot_date <= date(p1.snapshot_date, '-' || ? || ' days')
        ORDER BY ABS(rate_change) DESC
        LIMIT 20
    """, (days,))

    trending = cursor.fetchall()
    conn.close()
    return trending

Acceptance Rate Analysis: Understanding Problem Difficulty

The acceptance rate is more nuanced than raw difficulty labels:

def analyze_acceptance_patterns(problems: list) -> dict:
    """Deep analysis of acceptance rate distributions by difficulty and topic."""
    import statistics

    by_diff = defaultdict(list)
    for p in problems:
        by_diff[p["difficulty"]].append(p["acRate"])

    analysis = {}
    for diff, rates in by_diff.items():
        analysis[diff] = {
            "count": len(rates),
            "mean": round(statistics.mean(rates), 1),
            "median": round(statistics.median(rates), 1),
            "stdev": round(statistics.stdev(rates), 1) if len(rates) > 1 else 0,
            "min": round(min(rates), 1),
            "max": round(max(rates), 1),
            "below_30pct": sum(1 for r in rates if r < 30),
            "above_60pct": sum(1 for r in rates if r > 60),
        }

    # Find mismatched problems: Easy with low acceptance or Hard with high acceptance
    mislabeled_suspects = {
        "easy_but_hard": [
            p for p in problems
            if p["difficulty"] == "Easy" and p["acRate"] < 30
        ],
        "hard_but_approachable": [
            p for p in problems
            if p["difficulty"] == "Hard" and p["acRate"] > 50
        ],
    }

    print("Acceptance Rate Analysis:")
    for diff, stats in analysis.items():
        print(f"\n  {diff}:")
        print(f"    Count: {stats['count']}")
        print(f"    Mean: {stats['mean']}% | Median: {stats['median']}% | StdDev: {stats['stdev']}%")
        print(f"    Range: {stats['min']}% - {stats['max']}%")
        print(f"    Below 30%: {stats['below_30pct']} | Above 60%: {stats['above_60pct']}")

    print(f"\n  'Easy' problems with acceptance < 30%: {len(mislabeled_suspects['easy_but_hard'])}")
    print(f"  'Hard' problems with acceptance > 50%: {len(mislabeled_suspects['hard_but_approachable'])}")

    return {**analysis, "mislabeled_suspects": mislabeled_suspects}

Complete Pipeline: Daily Collection and Analysis

def run_daily_collection(proxy_url: str = None, db_path: str = "leetcode.db"):
    """Full pipeline: fetch all problems, store, detect changes, analyze."""
    print("=== LeetCode Daily Collection ===")

    # Set up session
    if proxy_url:
        SESSION.proxies = {"http": proxy_url, "https": proxy_url}

    # Fetch all problems
    print("\nFetching problem list...")
    problems = fetch_all_problems(delay=1.5)
    print(f"Total fetched: {len(problems)}")

    # Store with historical tracking
    print("\nStoring to database...")
    store_problems(problems, db_path=db_path)

    # Detect new additions
    print("\nChecking for new problems...")
    new_probs = find_new_problems(db_path=db_path)

    # Enrich new problems with details
    if new_probs:
        print(f"\nEnriching {min(len(new_probs), 10)} new problems...")
        for pid, title, diff, _ in new_probs[:10]:
            # Find the slug for this problem
            matching = [p for p in problems if p["questionFrontendId"] == pid]
            if matching:
                slug = matching[0]["titleSlug"]
                detail = get_problem_detail(slug)
                if detail:
                    hints = len(detail.get("hints", []))
                    snippets = len(detail.get("codeSnippets", []))
                    print(f"  #{pid} {title}: {hints} hints, {snippets} language snippets")
                time.sleep(2)

    # Run analytics
    print("\n=== Analysis ===")
    stats = analyze_problem_set(problems)

    return problems, stats


if __name__ == "__main__":
    PROXY_URL = "http://YOUR_USER:[email protected]:9000"
    problems, stats = run_daily_collection(proxy_url=PROXY_URL)

Practical Use Cases

Interview preparation tracker. Store your attempt history alongside the problem database. Query: "which Hard problems in graphs and DP have I not attempted yet, ordered by acceptance rate?" generates a personalized study queue.

Topic gap analysis. Map your company's interview history (scraped from Glassdoor or blind reports) against LeetCode topic tags. Identify which topics appear frequently in your target company's interviews but you haven't practiced.

Content creation. The acceptance rate + topic distribution data shows which problems have the most educational coverage gaps — problems that are frequently attempted but rarely discussed. These make great blog post or YouTube video topics.

Hiring tools. For technical recruiters, the problem difficulty distribution by topic provides a calibration baseline for interview question selection — ensuring consistent difficulty across candidates.

Summary

LeetCode's undocumented GraphQL API gives you full access to the problem database — metadata, topics, difficulty, acceptance rates, and code snippets. The list endpoint handles batch collection while the detail endpoint gives you per-problem specifics. Rate limiting is the main obstacle — keep requests spaced at 1-2 seconds and you'll collect the full set without issues.

For ongoing tracking, store snapshots in SQLite and diff between collections to catch new problems and acceptance rate changes. The dataset is valuable for building study tools, analyzing interview trends, or just understanding what the tech industry considers "must know" algorithms. With ThorData's residential proxies routing your requests, Cloudflare's TLS fingerprinting checks stop being an obstacle for sustained collection jobs.