Scrape LeetCode Problems: Difficulty, Tags & Acceptance Rates (2026)
Scrape LeetCode Problems: Difficulty, Tags & Acceptance Rates (2026)
LeetCode has over 3,000 problems and they're adding more every week. If you're building a study tracker, analyzing which topics are most tested, or creating a recommendation engine for coding practice, you need programmatic access to that problem data.
LeetCode doesn't have a documented public API, but their frontend talks to a GraphQL endpoint. That's your way in.
LeetCode's GraphQL API
The frontend at leetcode.com uses https://leetcode.com/graphql/ for all data fetching. You can use the same queries the website makes. Every problem listing, difficulty filter, tag lookup, and submission stat flows through this endpoint.
Dependencies and Setup
pip install requests httpx beautifulsoup4
We'll use requests for the primary scraper and httpx for async fallbacks.
Basic Problem List Query
import requests
import time
import json
LEETCODE_GRAPHQL = "https://leetcode.com/graphql/"
SESSION = requests.Session()
SESSION.headers.update({
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Referer": "https://leetcode.com/problemset/",
"Accept": "application/json",
"Accept-Language": "en-US,en;q=0.9",
})
def get_problem_list(skip=0, limit=50, filters=None):
"""Fetch a page of LeetCode problems."""
query = """
query problemsetQuestionList($categorySlug: String, $limit: Int,
$skip: Int, $filters: QuestionListFilterInput) {
problemsetQuestionList: questionList(
categorySlug: $categorySlug
limit: $limit
skip: $skip
filters: $filters
) {
total: totalNum
questions: data {
questionId
questionFrontendId
title
titleSlug
difficulty
acRate
isPaidOnly
topicTags {
name
slug
}
stats
status
likes
dislikes
}
}
}
"""
variables = {
"categorySlug": "all-code-essentials",
"skip": skip,
"limit": limit,
"filters": filters or {},
}
response = SESSION.post(
LEETCODE_GRAPHQL,
json={"query": query, "variables": variables},
timeout=30,
)
response.raise_for_status()
data = response.json()
if "errors" in data:
print(f"GraphQL errors: {data['errors']}")
return [], 0
result = data.get("data", {}).get("problemsetQuestionList", {})
return result.get("questions", []), result.get("total", 0)
# Fetch first page of problems
problems, total = get_problem_list(skip=0, limit=20)
print(f"Total problems: {total}")
for p in problems[:5]:
tags = ", ".join(t["name"] for t in p["topicTags"])
print(f"#{p['questionFrontendId']} {p['title']} "
f"[{p['difficulty']}] {p['acRate']:.1f}% - {tags}")
Fetching All Problems with Full Pagination
LeetCode paginates at 50 problems per request. To get the full set:
def fetch_all_problems(delay: float = 1.5) -> list:
"""Fetch every LeetCode problem with pagination."""
all_problems = []
skip = 0
limit = 50
# First request to get total
batch, total = get_problem_list(skip=0, limit=limit)
all_problems.extend(batch)
print(f"Total problems to fetch: {total}")
skip = limit
while skip < total:
batch, _ = get_problem_list(skip=skip, limit=limit)
if not batch:
print(f"Empty batch at skip={skip}, stopping")
break
all_problems.extend(batch)
print(f" Fetched {len(all_problems)}/{total}")
skip += limit
time.sleep(delay) # Respect rate limits
return all_problems
all_problems = fetch_all_problems()
print(f"\nCollected {len(all_problems)} problems total")
# Save raw data
with open("leetcode_problems.json", "w") as f:
json.dump(all_problems, f, indent=2)
Getting Detailed Problem Data
The list endpoint gives you metadata, but for full problem details — description, hints, solution count, company tags — you need per-problem queries:
def get_problem_detail(title_slug: str) -> dict | None:
"""Get full details for a single problem including hints and code snippets."""
query = """
query questionData($titleSlug: String!) {
question(titleSlug: $titleSlug) {
questionId
questionFrontendId
title
titleSlug
content
difficulty
likes
dislikes
categoryTitle
isPaidOnly
stats
hints
similarQuestions
topicTags {
name
slug
}
codeSnippets {
lang
langSlug
code
}
sampleTestCase
metaData
judgerAvailable
judgeType
mysqlSchemas
enableRunCode
enableTestMode
}
}
"""
response = SESSION.post(
LEETCODE_GRAPHQL,
json={"query": query, "variables": {"titleSlug": title_slug}},
timeout=30,
)
response.raise_for_status()
data = response.json()
if "errors" in data:
return None
return data.get("data", {}).get("question")
# Get details for a specific problem
detail = get_problem_detail("two-sum")
if detail:
stats = json.loads(detail.get("stats", "{}"))
print(f"Problem: {detail['title']}")
print(f"Difficulty: {detail['difficulty']}")
print(f"Total submissions: {stats.get('totalSubmissionRaw', 'N/A'):,}")
print(f"Total accepted: {stats.get('totalAcceptedRaw', 'N/A'):,}")
print(f"Hints: {len(detail.get('hints', []))}")
similar = json.loads(detail.get('similarQuestions', '[]'))
print(f"Similar problems: {len(similar)}")
print(f"Languages with snippets: {len(detail.get('codeSnippets', []))}")
print(f"Topic tags: {', '.join(t['name'] for t in detail.get('topicTags', []))}")
Filtering by Topic and Difficulty
The filters parameter supports topic tags, difficulty, and status filtering:
def get_problems_by_topic(topic_slug: str, difficulty: str = None) -> list:
"""Get problems filtered by topic tag and optional difficulty."""
filters = {"tags": [topic_slug]}
if difficulty:
filters["difficulty"] = difficulty.upper()
all_problems = []
skip = 0
while True:
batch, total = get_problem_list(skip=skip, limit=50, filters=filters)
if not batch:
break
all_problems.extend(batch)
print(f" {len(all_problems)}/{total} problems with tag '{topic_slug}'")
skip += 50
if skip >= total:
break
time.sleep(1.5)
return all_problems
def get_problems_by_difficulty(difficulty: str = "Hard") -> list:
"""Get all problems of a specific difficulty level."""
filters = {"difficulty": difficulty.upper()}
return _paginate_problems(filters)
def _paginate_problems(filters: dict, delay: float = 1.5) -> list:
"""Generic paginator for filtered problem sets."""
all_problems = []
skip = 0
while True:
batch, total = get_problem_list(skip=skip, limit=50, filters=filters)
if not batch:
break
all_problems.extend(batch)
skip += 50
if skip >= total:
break
time.sleep(delay)
return all_problems
# Examples
hard_dp = get_problems_by_topic("dynamic-programming", difficulty="Hard")
print(f"\nHard DP problems: {len(hard_dp)}")
for p in sorted(hard_dp, key=lambda x: x["acRate"])[:5]:
print(f" #{p['questionFrontendId']} {p['title']} - {p['acRate']:.1f}% acceptance")
# Get all binary search problems
binary_search = get_problems_by_topic("binary-search")
print(f"\nBinary search problems: {len(binary_search)}")
avg_acceptance = sum(p["acRate"] for p in binary_search) / len(binary_search)
print(f"Average acceptance rate: {avg_acceptance:.1f}%")
Fetching Company-Tagged Problems (Premium)
Company tags (which companies ask which problems) require LeetCode Premium. If you have an account, you can authenticate and query this data:
def authenticate_leetcode(username: str, password: str) -> requests.Session:
"""
Authenticate with LeetCode to access premium features.
Returns an authenticated session.
"""
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Referer": "https://leetcode.com/",
})
# Get CSRF token from login page
login_page = session.get("https://leetcode.com/accounts/login/")
csrf_token = session.cookies.get("csrftoken")
if not csrf_token:
raise ValueError("Could not obtain CSRF token")
# Submit login
login_resp = session.post(
"https://leetcode.com/accounts/login/",
data={
"login": username,
"password": password,
"csrfmiddlewaretoken": csrf_token,
},
headers={"X-CSRFToken": csrf_token},
)
if "profile" not in login_resp.url:
raise ValueError("Login failed — check credentials")
# Update CSRF token for subsequent API requests
new_csrf = session.cookies.get("csrftoken")
if new_csrf:
session.headers["X-CSRFToken"] = new_csrf
return session
def get_company_problems(company_slug: str, auth_session: requests.Session) -> list:
"""Get problems associated with a specific company (requires Premium)."""
query = """
query getCompanyTag($slug: String!) {
companyTag(slug: $slug) {
name
questions {
questionId
questionFrontendId
title
titleSlug
difficulty
acRate
topicTags { name slug }
frequencyTimebar
}
}
}
"""
resp = auth_session.post(
LEETCODE_GRAPHQL,
json={"query": query, "variables": {"slug": company_slug}},
timeout=30,
)
data = resp.json()
if "errors" in data:
return []
tag_data = data.get("data", {}).get("companyTag")
if not tag_data:
return []
return tag_data.get("questions", [])
Analysis: Topic Distribution and Difficulty Curves
from collections import Counter, defaultdict
def analyze_problem_set(problems: list) -> dict:
"""Analyze the full problem set for patterns."""
difficulty_count = Counter()
topic_count = Counter()
topic_by_difficulty = defaultdict(Counter)
acceptance_by_difficulty = defaultdict(list)
for p in problems:
diff = p["difficulty"]
difficulty_count[diff] += 1
acceptance_by_difficulty[diff].append(p["acRate"])
for tag in p["topicTags"]:
topic_count[tag["name"]] += 1
topic_by_difficulty[diff][tag["name"]] += 1
print("=== Difficulty Distribution ===")
for diff in ["Easy", "Medium", "Hard"]:
count = difficulty_count[diff]
rates = acceptance_by_difficulty[diff]
avg_rate = sum(rates) / len(rates) if rates else 0
print(f" {diff}: {count} problems, avg acceptance: {avg_rate:.1f}%")
print("\n=== Top 15 Topics by Problem Count ===")
for topic, count in topic_count.most_common(15):
hard_count = topic_by_difficulty["Hard"][topic]
print(f" {topic}: {count} problems ({hard_count} Hard)")
# Find hardest topics by average acceptance rate
topic_rates = defaultdict(list)
for p in problems:
for tag in p["topicTags"]:
topic_rates[tag["name"]].append(p["acRate"])
print("\n=== Hardest Topics (lowest avg acceptance, min 10 problems) ===")
avg_rates = {
topic: sum(rates) / len(rates)
for topic, rates in topic_rates.items()
if len(rates) >= 10
}
for topic, rate in sorted(avg_rates.items(), key=lambda x: x[1])[:10]:
print(f" {topic}: {rate:.1f}% avg acceptance ({len(topic_rates[topic])} problems)")
print("\n=== Easiest Topics (highest avg acceptance, min 10 problems) ===")
for topic, rate in sorted(avg_rates.items(), key=lambda x: x[1], reverse=True)[:5]:
print(f" {topic}: {rate:.1f}% avg acceptance")
# Compute likes/dislikes ratio by difficulty
print("\n=== Community Rating by Difficulty ===")
for diff in ["Easy", "Medium", "Hard"]:
diff_probs = [p for p in problems if p["difficulty"] == diff and p.get("likes", 0) + p.get("dislikes", 0) > 0]
if diff_probs:
avg_ratio = sum(
p["likes"] / (p["likes"] + p["dislikes"])
for p in diff_probs
) / len(diff_probs)
print(f" {diff}: avg like ratio {avg_ratio:.2f}")
return {
"difficulty_count": dict(difficulty_count),
"topic_count": dict(topic_count.most_common(30)),
"avg_acceptance_by_difficulty": {
d: round(sum(r) / len(r), 1)
for d, r in acceptance_by_difficulty.items()
},
"topic_avg_acceptance": avg_rates,
}
stats = analyze_problem_set(all_problems)
Building a Study Path Recommender
The acceptance rate and topic data enables smart study path recommendations:
def recommend_study_path(
problems: list,
target_topics: list,
current_level: str = "Easy",
min_acceptance_rate: float = 30.0,
) -> list:
"""
Recommend problems for a study path based on target topics and skill level.
Ordering logic:
1. Filter by target topics
2. Start with Easy or Medium problems that have high acceptance rates
3. Progress to harder problems as user builds foundation
"""
DIFFICULTY_ORDER = {"Easy": 0, "Medium": 1, "Hard": 2}
START_LEVEL = DIFFICULTY_ORDER.get(current_level, 0)
target_slugs = {t.lower().replace(" ", "-") for t in target_topics}
# Filter problems matching target topics
matching = [
p for p in problems
if any(tag["slug"] in target_slugs for tag in p["topicTags"])
and not p.get("isPaidOnly", False)
]
# Sort by difficulty progression, then by acceptance rate (easier/higher first)
recommended = sorted(
matching,
key=lambda p: (
max(0, DIFFICULTY_ORDER.get(p["difficulty"], 1) - START_LEVEL),
-p["acRate"],
)
)
# Group into phases
phases = {
"warmup": [p for p in recommended if p["difficulty"] == "Easy" and p["acRate"] >= 60],
"core": [p for p in recommended if p["difficulty"] == "Medium" and p["acRate"] >= min_acceptance_rate],
"challenge": [p for p in recommended if p["difficulty"] == "Hard"],
}
print(f"Study path for: {', '.join(target_topics)}")
for phase, probs in phases.items():
print(f" {phase.capitalize()}: {len(probs)} problems")
for p in probs[:3]:
tags = ", ".join(t["name"] for t in p["topicTags"])
print(f" #{p['questionFrontendId']} {p['title']} ({p['acRate']:.0f}%) — {tags}")
return phases
# Example: FAANG interview prep path
study_path = recommend_study_path(
all_problems,
target_topics=["arrays", "dynamic-programming", "graphs"],
current_level="Medium",
)
Anti-Bot Measures and Rate Limits
LeetCode uses several protections on their GraphQL endpoint:
- CSRF tokens — The site sets a
csrftokencookie that must be sent as anX-CSRFTokenheader for mutations. Read-only queries usually work without it. - Session-based rate limiting — Too many requests from one IP triggers 429 responses or temporary blocks. The threshold is roughly 20-30 requests per minute for unauthenticated users.
- Cloudflare protection — LeetCode sits behind Cloudflare, which fingerprints your TLS stack and blocks known bot signatures.
- Premium content gating — Company tags and some problem details require a paid LeetCode Premium account.
For collecting the full problem set (3,000+ problems at ~50 per request), you'll make around 60+ requests. At 1.5 seconds between requests, that's under 2 minutes — usually fine from a residential IP. But if you're running repeated collection jobs or scraping from a datacenter, a rotating proxy avoids Cloudflare blocks.
def create_leetcode_session(proxy_url: str = None) -> requests.Session:
"""Create a session with optional proxy for LeetCode scraping."""
session = requests.Session()
session.headers.update({
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 Chrome/126.0.0.0 Safari/537.36",
"Referer": "https://leetcode.com/problemset/",
"Accept": "application/json",
"Accept-Language": "en-US,en;q=0.9",
})
if proxy_url:
session.proxies = {"http": proxy_url, "https": proxy_url}
# Fetch the main page to get CSRF cookie — important for some mutations
resp = session.get("https://leetcode.com/problemset/", timeout=30)
csrf = session.cookies.get("csrftoken")
if csrf:
session.headers["X-CSRFToken"] = csrf
return session
# For bulk collection, [ThorData](https://thordata.partnerstack.com/partner/0a0x4nzq (or [Oxylabs](https://oxylabs.go2cloud.org/aff_c?offer_id=7&aff_id=2066&url_id=174)))
# residential proxies handle Cloudflare's TLS fingerprinting:
PROXY_URL = "http://YOUR_USER:[email protected]:9000"
session_with_proxy = create_leetcode_session(proxy_url=PROXY_URL)
Tracking Changes Over Time
LeetCode adds new problems weekly and updates acceptance rates continuously. Set up a scheduled collection to track what's new:
import sqlite3
import json
from datetime import datetime
def init_leetcode_db(db_path: str = "leetcode.db") -> sqlite3.Connection:
"""Create or connect to the LeetCode problems database."""
conn = sqlite3.connect(db_path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS problems (
question_id INTEGER,
frontend_id TEXT,
title TEXT,
slug TEXT,
difficulty TEXT,
acceptance_rate REAL,
is_paid BOOLEAN,
topic_tags TEXT,
likes INTEGER,
dislikes INTEGER,
collected_at TEXT,
PRIMARY KEY (question_id, collected_at)
);
CREATE TABLE IF NOT EXISTS problem_snapshots (
question_id INTEGER,
acceptance_rate REAL,
likes INTEGER,
dislikes INTEGER,
snapshot_date TEXT,
PRIMARY KEY (question_id, snapshot_date)
);
CREATE INDEX IF NOT EXISTS idx_problems_slug ON problems(slug);
CREATE INDEX IF NOT EXISTS idx_problems_difficulty ON problems(difficulty);
CREATE INDEX IF NOT EXISTS idx_problems_date ON problems(collected_at);
""")
conn.commit()
return conn
def store_problems(problems: list, db_path: str = "leetcode.db"):
"""Store problems with historical tracking."""
conn = init_leetcode_db(db_path)
collected_at = datetime.now().strftime("%Y-%m-%d")
for p in problems:
# Full record with date
conn.execute(
"INSERT OR REPLACE INTO problems VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
(
p["questionId"], p["questionFrontendId"],
p["title"], p["titleSlug"], p["difficulty"],
p["acRate"], p.get("isPaidOnly", False),
json.dumps([t["name"] for t in p["topicTags"]]),
p.get("likes", 0), p.get("dislikes", 0),
collected_at,
),
)
# Lightweight snapshot for trending analysis
conn.execute(
"INSERT OR REPLACE INTO problem_snapshots VALUES (?, ?, ?, ?, ?)",
(p["questionId"], p["acRate"], p.get("likes", 0), p.get("dislikes", 0), collected_at),
)
conn.commit()
conn.close()
print(f"Stored {len(problems)} problems for {collected_at}")
def find_new_problems(db_path: str = "leetcode.db") -> list:
"""Find problems added since last collection."""
conn = sqlite3.connect(db_path)
cursor = conn.execute("""
SELECT DISTINCT p1.frontend_id, p1.title, p1.difficulty, p1.collected_at
FROM problems p1
WHERE p1.collected_at = (SELECT MAX(collected_at) FROM problems)
AND p1.question_id NOT IN (
SELECT question_id FROM problems
WHERE collected_at < (SELECT MAX(collected_at) FROM problems)
)
ORDER BY CAST(p1.frontend_id AS INTEGER)
""")
new_problems = cursor.fetchall()
conn.close()
if new_problems:
print(f"New problems since last run: {len(new_problems)}")
for pid, title, diff, date in new_problems:
print(f" #{pid} {title} [{diff}] — added {date}")
return new_problems
def find_trending_problems(db_path: str = "leetcode.db", days: int = 30) -> list:
"""Find problems with the biggest acceptance rate changes recently."""
conn = sqlite3.connect(db_path)
cursor = conn.execute("""
SELECT
p1.question_id,
p1.acceptance_rate as current_rate,
p2.acceptance_rate as old_rate,
(p1.acceptance_rate - p2.acceptance_rate) as rate_change,
prob.title,
prob.difficulty
FROM problem_snapshots p1
JOIN problem_snapshots p2 ON p1.question_id = p2.question_id
JOIN (
SELECT question_id, title, difficulty
FROM problems
WHERE collected_at = (SELECT MAX(collected_at) FROM problems)
) prob ON p1.question_id = prob.question_id
WHERE p1.snapshot_date = (SELECT MAX(snapshot_date) FROM problem_snapshots)
AND p2.snapshot_date <= date(p1.snapshot_date, '-' || ? || ' days')
ORDER BY ABS(rate_change) DESC
LIMIT 20
""", (days,))
trending = cursor.fetchall()
conn.close()
return trending
Acceptance Rate Analysis: Understanding Problem Difficulty
The acceptance rate is more nuanced than raw difficulty labels:
def analyze_acceptance_patterns(problems: list) -> dict:
"""Deep analysis of acceptance rate distributions by difficulty and topic."""
import statistics
by_diff = defaultdict(list)
for p in problems:
by_diff[p["difficulty"]].append(p["acRate"])
analysis = {}
for diff, rates in by_diff.items():
analysis[diff] = {
"count": len(rates),
"mean": round(statistics.mean(rates), 1),
"median": round(statistics.median(rates), 1),
"stdev": round(statistics.stdev(rates), 1) if len(rates) > 1 else 0,
"min": round(min(rates), 1),
"max": round(max(rates), 1),
"below_30pct": sum(1 for r in rates if r < 30),
"above_60pct": sum(1 for r in rates if r > 60),
}
# Find mismatched problems: Easy with low acceptance or Hard with high acceptance
mislabeled_suspects = {
"easy_but_hard": [
p for p in problems
if p["difficulty"] == "Easy" and p["acRate"] < 30
],
"hard_but_approachable": [
p for p in problems
if p["difficulty"] == "Hard" and p["acRate"] > 50
],
}
print("Acceptance Rate Analysis:")
for diff, stats in analysis.items():
print(f"\n {diff}:")
print(f" Count: {stats['count']}")
print(f" Mean: {stats['mean']}% | Median: {stats['median']}% | StdDev: {stats['stdev']}%")
print(f" Range: {stats['min']}% - {stats['max']}%")
print(f" Below 30%: {stats['below_30pct']} | Above 60%: {stats['above_60pct']}")
print(f"\n 'Easy' problems with acceptance < 30%: {len(mislabeled_suspects['easy_but_hard'])}")
print(f" 'Hard' problems with acceptance > 50%: {len(mislabeled_suspects['hard_but_approachable'])}")
return {**analysis, "mislabeled_suspects": mislabeled_suspects}
Complete Pipeline: Daily Collection and Analysis
def run_daily_collection(proxy_url: str = None, db_path: str = "leetcode.db"):
"""Full pipeline: fetch all problems, store, detect changes, analyze."""
print("=== LeetCode Daily Collection ===")
# Set up session
if proxy_url:
SESSION.proxies = {"http": proxy_url, "https": proxy_url}
# Fetch all problems
print("\nFetching problem list...")
problems = fetch_all_problems(delay=1.5)
print(f"Total fetched: {len(problems)}")
# Store with historical tracking
print("\nStoring to database...")
store_problems(problems, db_path=db_path)
# Detect new additions
print("\nChecking for new problems...")
new_probs = find_new_problems(db_path=db_path)
# Enrich new problems with details
if new_probs:
print(f"\nEnriching {min(len(new_probs), 10)} new problems...")
for pid, title, diff, _ in new_probs[:10]:
# Find the slug for this problem
matching = [p for p in problems if p["questionFrontendId"] == pid]
if matching:
slug = matching[0]["titleSlug"]
detail = get_problem_detail(slug)
if detail:
hints = len(detail.get("hints", []))
snippets = len(detail.get("codeSnippets", []))
print(f" #{pid} {title}: {hints} hints, {snippets} language snippets")
time.sleep(2)
# Run analytics
print("\n=== Analysis ===")
stats = analyze_problem_set(problems)
return problems, stats
if __name__ == "__main__":
PROXY_URL = "http://YOUR_USER:[email protected]:9000"
problems, stats = run_daily_collection(proxy_url=PROXY_URL)
Practical Use Cases
Interview preparation tracker. Store your attempt history alongside the problem database. Query: "which Hard problems in graphs and DP have I not attempted yet, ordered by acceptance rate?" generates a personalized study queue.
Topic gap analysis. Map your company's interview history (scraped from Glassdoor or blind reports) against LeetCode topic tags. Identify which topics appear frequently in your target company's interviews but you haven't practiced.
Content creation. The acceptance rate + topic distribution data shows which problems have the most educational coverage gaps — problems that are frequently attempted but rarely discussed. These make great blog post or YouTube video topics.
Hiring tools. For technical recruiters, the problem difficulty distribution by topic provides a calibration baseline for interview question selection — ensuring consistent difficulty across candidates.
Summary
LeetCode's undocumented GraphQL API gives you full access to the problem database — metadata, topics, difficulty, acceptance rates, and code snippets. The list endpoint handles batch collection while the detail endpoint gives you per-problem specifics. Rate limiting is the main obstacle — keep requests spaced at 1-2 seconds and you'll collect the full set without issues.
For ongoing tracking, store snapshots in SQLite and diff between collections to catch new problems and acceptance rate changes. The dataset is valuable for building study tools, analyzing interview trends, or just understanding what the tech industry considers "must know" algorithms. With ThorData's residential proxies routing your requests, Cloudflare's TLS fingerprinting checks stop being an obstacle for sustained collection jobs.