How to Scrape Kickstarter Campaign Data with Python (2026 Guide)
How to Scrape Kickstarter Campaign Data with Python (2026 Guide)
Kickstarter is one of the richest sources of crowdfunding data on the web. Campaign funding amounts, backer counts, creator track records, reward tier pricing, funding velocity — it's all there, publicly accessible on every project page. Whether you're tracking market trends, analyzing what makes campaigns succeed, building a research dataset, or monitoring competitors, you need a reliable way to extract this data programmatically.
The challenge: Kickstarter has moved to a GraphQL API internally. The old REST-style discovery endpoints still partially work, but the real data — reward tiers, stretch goals, full descriptions, update counts — flows through GraphQL queries that the frontend fires on every page load. If you intercept those, you get clean structured JSON instead of parsing messy HTML.
This guide walks through both approaches: the lightweight discovery API for bulk collection and GraphQL interception for detailed campaign data. Plus data storage, anti-bot handling, and real analysis use cases.
Why Scrape Kickstarter?
Before diving into code, it's worth understanding why this data matters commercially:
- Market validation — Before launching your own product, analyze similar Kickstarter campaigns. What price points work? What reward tiers attract the most backers? What funding goals are realistic for your category?
- Trend detection — Track which categories and subcategories are growing. Spot emerging product niches months before they hit mainstream retail.
- Competitive intelligence — Monitor competing products in real time. Track their funding velocity, backer growth, and which reward tiers sell out first.
- Investment signals — Some venture firms and angel investors use Kickstarter success as a signal for early-stage startups worth watching.
- Academic research — Crowdfunding behavior is a rich area for economists, sociologists, and business researchers studying collective decision-making.
Method 1: The Discovery API (Bulk Campaign Listings)
Kickstarter exposes a public discovery API that returns campaign listings by category, location, or sort order. This endpoint doesn't require authentication and returns structured JSON:
import httpx
import time
import json
from dataclasses import dataclass, asdict
from datetime import datetime
@dataclass
class KickstarterProject:
project_id: int
name: str
blurb: str
slug: str
state: str # 'live', 'successful', 'failed', 'canceled'
category: str
subcategory: str
creator_name: str
goal: float
pledged: float
currency: str
backers_count: int
country: str
launched_at: int # Unix timestamp
deadline: int # Unix timestamp
funding_percentage: float
staff_pick: bool
url: str
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
"Accept": "application/json",
}
BASE = "https://www.kickstarter.com/discover/advanced"
def scrape_category(
category_id: int,
sort: str = "newest",
max_pages: int = 10,
state: str = "live",
) -> list[KickstarterProject]:
"""
Scrape Kickstarter campaigns from a specific category.
Args:
category_id: Kickstarter's internal category ID.
sort: 'newest', 'popularity', 'end_date', 'most_funded',
'most_backed', 'magic' (recommended).
max_pages: Maximum number of pages to fetch (12 projects per page).
state: 'live', 'successful', or 'all'.
Returns:
List of KickstarterProject objects.
"""
client = httpx.Client(headers=HEADERS, timeout=15, follow_redirects=True)
projects = []
for page in range(1, max_pages + 1):
params = {
"category_id": str(category_id),
"sort": sort,
"format": "json",
"page": page,
}
if state != "all":
params["state"] = state
resp = client.get(BASE, params=params)
if resp.status_code == 429:
print(f"Rate limited on page {page}. Waiting 30s...")
time.sleep(30)
continue
resp.raise_for_status()
data = resp.json()
batch = data.get("projects", [])
if not batch:
print(f"No more projects after page {page - 1}")
break
for p in batch:
category_info = p.get("category", {})
projects.append(KickstarterProject(
project_id=p["id"],
name=p["name"],
blurb=p.get("blurb", ""),
slug=p.get("slug", ""),
state=p.get("state", ""),
category=category_info.get("parent_name",
category_info.get("name", "")),
subcategory=category_info.get("name", ""),
creator_name=p.get("creator", {}).get("name", ""),
goal=p.get("goal", 0),
pledged=p.get("pledged", 0),
currency=p.get("currency", "USD"),
backers_count=p.get("backers_count", 0),
country=p.get("country", ""),
launched_at=p.get("launched_at", 0),
deadline=p.get("deadline", 0),
funding_percentage=round(
(p.get("pledged", 0) / max(p.get("goal", 1), 1)) * 100, 1
),
staff_pick=p.get("staff_pick", False),
url=p.get("urls", {}).get("web", {}).get(
"project", f"https://www.kickstarter.com/projects/"
f"{p.get('creator', {}).get('slug', '')}/{p.get('slug', '')}"
),
))
print(f"Page {page}: {len(batch)} projects (total: {len(projects)})")
time.sleep(2.5) # Stay well under rate limits
client.close()
return projects
# Kickstarter category IDs
CATEGORIES = {
"art": 1, "comics": 3, "crafts": 26, "dance": 6,
"design": 7, "fashion": 9, "film_video": 11, "food": 10,
"games": 12, "journalism": 13, "music": 14, "photography": 15,
"publishing": 18, "technology": 16, "theater": 17,
}
# Usage — scrape live technology campaigns
tech_projects = scrape_category(
category_id=CATEGORIES["technology"],
sort="popularity",
max_pages=5,
)
for p in sorted(tech_projects, key=lambda x: x.pledged, reverse=True)[:5]:
print(f"${p.pledged:>10,.0f} / ${p.goal:>8,.0f} "
f"({p.funding_percentage}%) | {p.name[:50]}")
print(f" {p.backers_count} backers | {p.subcategory}")
Method 2: GraphQL API for Detailed Campaign Data
The discovery API gives you top-level campaign metadata, but for the really valuable data — reward tiers, backer counts per tier, full descriptions, FAQ sections, update counts — you need Kickstarter's internal GraphQL endpoint.
Open DevTools on any Kickstarter project page, filter network requests to graph, and you'll see POST requests to https://www.kickstarter.com/graph. Here's how to replicate them:
GRAPHQL_URL = "https://www.kickstarter.com/graph"
PROJECT_QUERY = """
query Campaign($slug: String!) {
project(slug: $slug) {
name
description
story
risks
goal { amount currency }
pledged { amount currency }
backersCount
commentsCount
updatesCount
launchedAt
deadlineAt
state
stateChangedAt
isProjectWeLove
category { name parentCategory { name } }
location { displayableName country }
creator {
name
backingsCount
launchedProjects { totalCount }
}
rewards {
nodes {
name
description
amount { amount currency }
backersCount
estimatedDeliveryOn
remainingQuantity
limitPerBacker
shippingSummary
startsAt
endsAt
}
}
}
}
"""
def get_campaign_details(slug: str) -> dict | None:
"""
Fetch detailed campaign data via Kickstarter's GraphQL API.
Args:
slug: The project slug from the URL
(kickstarter.com/projects/creator/THIS-PART).
Returns:
Dict with full campaign data including rewards, or None on failure.
"""
payload = {
"query": PROJECT_QUERY,
"variables": {"slug": slug},
}
resp = httpx.post(
GRAPHQL_URL,
json=payload,
headers={**HEADERS, "Content-Type": "application/json"},
timeout=15,
)
if resp.status_code != 200:
print(f"GraphQL request failed: {resp.status_code}")
return None
data = resp.json()
if "errors" in data:
print(f"GraphQL errors: {data['errors']}")
return None
return data["data"]["project"]
# Usage
campaign = get_campaign_details("some-cool-gadget")
if campaign:
pledged = float(campaign["pledged"]["amount"])
goal = float(campaign["goal"]["amount"])
pct = round(pledged / goal * 100, 1)
print(f"{campaign['name']}")
print(f" ${pledged:,.0f} / ${goal:,.0f} ({pct}%)")
print(f" {campaign['backersCount']} backers")
print(f" {campaign['updatesCount']} updates")
print(f" Creator has {campaign['creator']['launchedProjects']['totalCount']} "
f"launched projects")
print(f"\n Reward tiers:")
for reward in campaign["rewards"]["nodes"]:
r_amount = float(reward["amount"]["amount"])
remaining = reward.get("remainingQuantity")
limit_str = f" ({remaining} left)" if remaining else ""
print(f" ${r_amount:>8,.0f} — {reward['name'][:40]} "
f"({reward['backersCount']} backers){limit_str}")
Tracking Funding Velocity Over Time
Kickstarter doesn't expose historical funding data. But if you're monitoring live campaigns, you can poll periodically and build your own time series — this is where the real analytical value lies:
import sqlite3
from datetime import datetime
def init_tracking_db(db_path: str = "kickstarter_tracking.db"):
"""Create SQLite tables for campaign tracking."""
db = sqlite3.connect(db_path)
db.executescript("""
CREATE TABLE IF NOT EXISTS campaigns (
slug TEXT PRIMARY KEY,
name TEXT,
category TEXT,
goal REAL,
currency TEXT,
launched_at INTEGER,
deadline INTEGER,
creator_name TEXT,
added_at TEXT
);
CREATE TABLE IF NOT EXISTS snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT NOT NULL,
pledged REAL,
backers INTEGER,
comments INTEGER,
updates INTEGER,
timestamp TEXT,
FOREIGN KEY (slug) REFERENCES campaigns(slug)
);
CREATE TABLE IF NOT EXISTS reward_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT NOT NULL,
reward_name TEXT,
amount REAL,
backers INTEGER,
remaining_qty INTEGER,
timestamp TEXT,
FOREIGN KEY (slug) REFERENCES campaigns(slug)
);
CREATE INDEX IF NOT EXISTS idx_snapshots_slug
ON snapshots(slug, timestamp);
""")
return db
def record_snapshot(db: sqlite3.Connection, slug: str):
"""Fetch current campaign data and store a timestamped snapshot."""
campaign = get_campaign_details(slug)
if not campaign:
print(f"Failed to fetch {slug}")
return
now = datetime.utcnow().isoformat()
# Upsert campaign metadata
db.execute(
"""INSERT OR REPLACE INTO campaigns
(slug, name, category, goal, currency, launched_at,
deadline, creator_name, added_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, COALESCE(
(SELECT added_at FROM campaigns WHERE slug = ?), ?
))""",
(
slug, campaign["name"],
campaign["category"]["name"],
float(campaign["goal"]["amount"]),
campaign["goal"]["currency"],
campaign.get("launchedAt", 0),
campaign.get("deadlineAt", 0),
campaign["creator"]["name"],
slug, now,
),
)
# Record funding snapshot
db.execute(
"INSERT INTO snapshots (slug, pledged, backers, comments, updates, timestamp) "
"VALUES (?, ?, ?, ?, ?, ?)",
(
slug,
float(campaign["pledged"]["amount"]),
campaign["backersCount"],
campaign.get("commentsCount", 0),
campaign.get("updatesCount", 0),
now,
),
)
# Record reward tier snapshots
for reward in campaign["rewards"]["nodes"]:
db.execute(
"INSERT INTO reward_snapshots "
"(slug, reward_name, amount, backers, remaining_qty, timestamp) "
"VALUES (?, ?, ?, ?, ?, ?)",
(
slug,
reward["name"],
float(reward["amount"]["amount"]),
reward["backersCount"],
reward.get("remainingQuantity"),
now,
),
)
db.commit()
pledged = float(campaign["pledged"]["amount"])
print(f"[{now[:16]}] {campaign['name'][:40]}: "
f"${pledged:,.0f} ({campaign['backersCount']} backers)")
def get_funding_velocity(
db: sqlite3.Connection,
slug: str,
) -> list[dict]:
"""Calculate hourly funding velocity from stored snapshots."""
cursor = db.execute(
"""SELECT timestamp, pledged, backers
FROM snapshots
WHERE slug = ?
ORDER BY timestamp""",
(slug,),
)
rows = cursor.fetchall()
velocity = []
for i in range(1, len(rows)):
prev_ts = datetime.fromisoformat(rows[i - 1][0])
curr_ts = datetime.fromisoformat(rows[i][0])
hours = (curr_ts - prev_ts).total_seconds() / 3600
if hours > 0:
dollar_per_hour = (rows[i][1] - rows[i - 1][1]) / hours
backers_per_hour = (rows[i][2] - rows[i - 1][2]) / hours
velocity.append({
"timestamp": rows[i][0],
"dollars_per_hour": round(dollar_per_hour, 2),
"backers_per_hour": round(backers_per_hour, 2),
"total_pledged": rows[i][1],
"total_backers": rows[i][2],
})
return velocity
# Usage — track campaigns every hour (run via cron or scheduler)
db = init_tracking_db()
slugs_to_track = ["cool-gadget-2026", "board-game-reprint", "indie-film-project"]
for slug in slugs_to_track:
record_snapshot(db, slug)
time.sleep(3) # Space out requests
# Later: analyze velocity
velocity = get_funding_velocity(db, "cool-gadget-2026")
for v in velocity[-5:]: # Last 5 data points
print(f" {v['timestamp'][:16]}: ${v['dollars_per_hour']:,.0f}/hr, "
f"{v['backers_per_hour']:.1f} backers/hr")
Run this every hour during a campaign's final 48 hours and you'll have granular funding velocity data — useful for predicting whether a campaign will hit its goal.
Scraping All Categories at Scale
To build a comprehensive dataset across all of Kickstarter, iterate through every category:
def scrape_all_categories(
sort: str = "newest",
pages_per_category: int = 20,
state: str = "all",
) -> list[KickstarterProject]:
"""
Scrape projects across all Kickstarter categories.
At ~12 projects per page and 20 pages per category,
this collects up to ~3,600 projects across 15 categories.
"""
all_projects = []
for name, cat_id in CATEGORIES.items():
print(f"\n--- Scraping {name} (category {cat_id}) ---")
projects = scrape_category(
category_id=cat_id,
sort=sort,
max_pages=pages_per_category,
state=state,
)
all_projects.extend(projects)
print(f"{name}: {len(projects)} projects collected")
# Longer pause between categories to avoid triggering rate limits
time.sleep(5)
print(f"\nTotal: {len(all_projects)} projects across {len(CATEGORIES)} categories")
return all_projects
def export_to_csv(projects: list[KickstarterProject], filename: str):
"""Export projects to CSV for analysis in spreadsheets or pandas."""
import csv
fieldnames = [
"name", "category", "subcategory", "state", "goal", "pledged",
"currency", "funding_percentage", "backers_count", "creator_name",
"country", "launched_at", "deadline", "staff_pick", "url",
]
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for p in projects:
row = asdict(p)
# Convert timestamps to readable dates
row["launched_at"] = datetime.fromtimestamp(
p.launched_at
).strftime("%Y-%m-%d") if p.launched_at else ""
row["deadline"] = datetime.fromtimestamp(
p.deadline
).strftime("%Y-%m-%d") if p.deadline else ""
writer.writerow({k: row[k] for k in fieldnames})
print(f"Exported {len(projects)} projects to {filename}")
# Usage
all_projects = scrape_all_categories(sort="most_funded", pages_per_category=10)
export_to_csv(all_projects, "kickstarter_dataset.csv")
Handling Anti-Bot Measures
Kickstarter uses Cloudflare protection that activates under sustained load. Here's what to watch for and how to handle it:
Rate limiting signals: - HTTP 403 with HTML instead of JSON — you've been flagged by Cloudflare - HTTP 429 Too Many Requests — explicit rate limit hit - CAPTCHA challenge pages — Cloudflare's JavaScript challenge
Recommended request pacing: - Discovery API: 1 request per 2-3 seconds - GraphQL endpoint: 1 request per 3-4 seconds (stricter) - Between categories: 5-10 second pause - After any 403/429: back off 30-60 seconds, then retry
For serious data collection, you'll need rotating residential proxies. Kickstarter's Cloudflare protection fingerprints datacenter IPs aggressively. ThorData's residential network handles this well — their geo-targeting lets you appear as traffic from specific countries, which is useful since Kickstarter shows different projects by region.
# Rotating proxy configuration
proxy_url = "http://USER:[email protected]:9000"
client = httpx.Client(
headers=HEADERS,
proxy=proxy_url,
timeout=20,
)
Browser automation fallback: If the API endpoints get locked down, Playwright can render project pages directly:
from playwright.async_api import async_playwright
async def scrape_campaign_page(url: str) -> dict:
"""Fallback: scrape campaign data from the rendered page."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, wait_until="networkidle", timeout=30000)
# Extract key data from rendered DOM
title = await page.text_content("h2.project-name") or ""
pledged_el = await page.query_selector(".money.usd")
pledged = await pledged_el.text_content() if pledged_el else "0"
backers_el = await page.query_selector(
"[data-test-id='backers-count']"
)
backers = await backers_el.text_content() if backers_el else "0"
# Intercept GraphQL responses for structured data
# (more reliable than DOM parsing)
await browser.close()
return {"title": title, "pledged": pledged, "backers": backers}
Analyzing the Data: What Makes Campaigns Succeed?
Once you've collected campaign data, here are the metrics that matter most for predicting success:
import statistics
def analyze_category(projects: list[KickstarterProject]) -> dict:
"""Calculate success metrics for a set of campaigns."""
successful = [p for p in projects if p.state == "successful"]
failed = [p for p in projects if p.state == "failed"]
success_rate = len(successful) / max(len(projects), 1) * 100
if successful:
avg_funding_pct = statistics.mean(
[p.funding_percentage for p in successful]
)
median_goal = statistics.median([p.goal for p in successful])
median_backers = statistics.median(
[p.backers_count for p in successful]
)
avg_pledge_per_backer = statistics.mean([
p.pledged / max(p.backers_count, 1) for p in successful
])
else:
avg_funding_pct = median_goal = median_backers = 0
avg_pledge_per_backer = 0
staff_pick_rate = (
len([p for p in successful if p.staff_pick])
/ max(len(successful), 1) * 100
)
return {
"total_projects": len(projects),
"success_rate": round(success_rate, 1),
"median_goal_successful": round(median_goal, 0),
"avg_funding_percentage": round(avg_funding_pct, 1),
"median_backers": round(median_backers, 0),
"avg_pledge_per_backer": round(avg_pledge_per_backer, 2),
"staff_pick_success_rate": round(staff_pick_rate, 1),
}
# Usage
tech_data = scrape_category(CATEGORIES["technology"], sort="newest", max_pages=50)
analysis = analyze_category(tech_data)
print("Technology Category Analysis:")
for key, value in analysis.items():
label = key.replace("_", " ").title()
if "rate" in key or "percentage" in key:
print(f" {label}: {value}%")
elif "goal" in key or "pledge" in key:
print(f" {label}: ${value:,.0f}")
else:
print(f" {label}: {value}")
Legal and Ethical Considerations
Kickstarter's robots.txt allows crawling of project pages. Their terms of service prohibit scraping for commercial redistribution of raw campaign data — you can't resell a Kickstarter database. However, research, analysis, internal business intelligence, and personal use are generally fine.
Key guidelines to follow: - Respect rate limits — keep requests paced at 1 per 2-3 seconds minimum - Identify yourself — use a real User-Agent string if doing academic research - Don't hammer during peak hours — avoid heavy scraping during major campaign launches - Don't redistribute raw data — analysis and insights are fine; raw databases are not - Cache aggressively — if you've already fetched a campaign page, don't fetch it again unless you need updated numbers
Crowdfunding data is genuinely valuable for market research, product validation, and trend analysis. Treat the platform well and the endpoints will keep working.