Scraping Kickstarter Project Data (2026)

2026-04-09 ["kickstarter" "web scraping" "python" "crowdfunding" "startup data"]

Scraping Kickstarter Project Data (2026)

Kickstarter hosts tens of thousands of live crowdfunding campaigns across tech, games, film, design, and more. The data is genuinely valuable: funding velocity, backer counts, and creator track records are strong signals for trend analysis, competitor research, and investment screening. Unlike LinkedIn or Crunchbase, Kickstarter is relatively scraper-friendly — no authentication required, and the API endpoints return clean JSON. This guide covers how to extract everything useful from the platform, store it properly, and scale to continuous monitoring.

What Data You Can Extract

Each Kickstarter project exposes a rich set of fields:

Project basics — name, slug, blurb, full description, category, subcategory
Funding data — goal amount, amount pledged, currency, deadline
Backer count — total backers, comments count, updates count
Creator info — name, user ID, previous projects, biography
Reward tiers — pledge amounts, descriptions, backer limits, delivery dates
Timeline — launch date, deadline, duration
Media — main image, video URL
Stretch goals — additional funding milestones and unlocks
Location — creator's city and country
State — live, successful, failed, canceled, or suspended

The combination of funding progress and backer count over time gives you a daily funding rate you can use to project final totals before a campaign closes.

Understanding the Kickstarter Data Structure

Kickstarter exposes data through multiple pathways:

The discover API (/discover/advanced?format=json) — returns paginated lists of projects with summary data
Individual project pages — embed a full JSON blob in the data-initial attribute of the root element
The GraphQL API — used by Kickstarter's own frontend, partially accessible without authentication

Each approach has tradeoffs: the discover API is fastest for broad sweeps, project pages have the most complete data, and GraphQL is useful for specific structured queries.

The Discover API

Kickstarter exposes a public discover endpoint that returns paginated JSON without any authentication. It supports sorting by magic score, newest, end date, most funded, and most backed.

import httpx
import json
import time
from typing import Optional

CATEGORY_IDS = {
    "art": 1,
    "comics": 3,
    "crafts": 26,
    "dance": 6,
    "design": 7,
    "fashion": 9,
    "film": 11,
    "food": 10,
    "games": 12,
    "journalism": 13,
    "music": 14,
    "photography": 15,
    "publishing": 18,
    "technology": 16,
    "theater": 17,
}

# Sub-category IDs (selected)
SUBCATEGORY_IDS = {
    "product-design": 329,
    "tabletop-games": 220,
    "video-games": 35,
    "hardware": 31,
    "software": 51,
    "apps": 250,
    "fiction": 281,
    "nonfiction": 280,
    "graphic-novels": 281,
}

def discover_projects(
    category: Optional[str] = None,
    subcategory: Optional[str] = None,
    sort: str = "magic",
    page: int = 1,
    per_page: int = 20,
    state: str = "live",
    proxy: Optional[str] = None,
) -> list[dict]:
    """
    Search Kickstarter projects via the discover API.

    Args:
        category: Category name from CATEGORY_IDS
        subcategory: Optional subcategory name from SUBCATEGORY_IDS
        sort: magic | newest | end_date | most_funded | most_backed
        page: Page number (1-based)
        state: live | successful | failed | canceled
        proxy: Optional proxy URL

    Returns:
        List of project summary dicts
    """
    url = "https://www.kickstarter.com/discover/advanced"
    params = {
        "format": "json",
        "sort": sort,
        "page": page,
        "per_page": per_page,
        "state": state,
    }

    if category and category in CATEGORY_IDS:
        params["category_id"] = CATEGORY_IDS[category]

    if subcategory and subcategory in SUBCATEGORY_IDS:
        params["category_id"] = SUBCATEGORY_IDS[subcategory]

    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
        "Accept": "application/json, text/javascript, */*; q=0.01",
        "X-Requested-With": "XMLHttpRequest",
        "Referer": "https://www.kickstarter.com/discover",
    }

    client_kwargs = {"timeout": 15, "headers": headers, "follow_redirects": True}
    if proxy:
        client_kwargs["proxies"] = {"all://": proxy}

    with httpx.Client(**client_kwargs) as client:
        resp = client.get(url, params=params)

        if resp.status_code == 429:
            retry_after = int(resp.headers.get("Retry-After", 30))
            print(f"Rate limited. Wait {retry_after}s then retry.")
            return []

        resp.raise_for_status()
        data = resp.json()

    return data.get("projects", [])


def discover_all(
    category: str,
    sort: str = "magic",
    max_pages: int = 10,
    state: str = "live",
    proxy: Optional[str] = None,
) -> list[dict]:
    """Paginate through discover results for a category."""
    all_projects = []

    for page in range(1, max_pages + 1):
        batch = discover_projects(
            category=category,
            sort=sort,
            page=page,
            state=state,
            proxy=proxy,
        )
        if not batch:
            break

        all_projects.extend(batch)
        print(f"Page {page}: {len(batch)} projects (total: {len(all_projects)})")
        time.sleep(1.0)

    return all_projects


# Example: pull top funded tech projects
tech_projects = discover_all("technology", sort="most_funded", max_pages=5)
for p in tech_projects[:5]:
    print(f"{p['name']} | ${float(p['pledged']):,.0f} pledged | {p['backers_count']} backers")

The JSON each project object returns includes id, name, slug, blurb, goal, pledged, currency, backers_count, state, deadline, launched_at, creator, category, location, and urls. That's enough for most analyses without hitting individual project pages.

Project Detail JSON Embedded in HTML

For full data — reward tiers, stretch goals, update count, full description — you need the project detail page. Kickstarter embeds the complete project object as JSON inside a data-initial attribute on the page's root element.

import httpx
from bs4 import BeautifulSoup
import json
from typing import Optional

def get_project_detail(project_slug: str, proxy: Optional[str] = None) -> dict:
    """
    Get full project details from embedded JSON on the project page.

    Args:
        project_slug: The creator/project-name portion of the project URL
        proxy: Optional proxy URL

    Returns:
        Full project dict with tiers, updates, stretch goals, etc.
    """
    url = f"https://www.kickstarter.com/projects/{project_slug}"
    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    }

    client_kwargs = {
        "timeout": 20,
        "headers": headers,
        "follow_redirects": True,
    }
    if proxy:
        client_kwargs["proxies"] = {"all://": proxy}

    with httpx.Client(**client_kwargs) as client:
        resp = client.get(url)
        resp.raise_for_status()

    soup = BeautifulSoup(resp.text, "html.parser")
    root = soup.find(attrs={"data-initial": True})
    if not root:
        # Try looking in script tags for embedded JSON
        scripts = soup.find_all("script", type="application/json")
        for s in scripts:
            try:
                data = json.loads(s.string)
                if "project" in data:
                    return data.get("project", {})
            except (json.JSONDecodeError, TypeError):
                continue
        raise ValueError(f"No embedded JSON found for {project_slug}")

    data = json.loads(root["data-initial"])
    project = data.get("project", {})
    return project


def extract_project_summary(project: dict) -> dict:
    """Extract the most useful fields from a full project detail dict."""
    creator = project.get("creator", {})
    rewards = project.get("rewards", {})
    if isinstance(rewards, dict):
        reward_list = rewards.get("rewards", [])
    else:
        reward_list = rewards or []

    return {
        "id": project.get("id"),
        "name": project.get("name"),
        "slug": project.get("slug"),
        "blurb": project.get("blurb"),
        "state": project.get("state"),
        "goal": float(project.get("goal", 0)),
        "pledged": float(project.get("pledged", 0)),
        "currency": project.get("currency"),
        "backers_count": project.get("backers_count", 0),
        "comments_count": project.get("comments_count", 0),
        "updates_count": project.get("updates_count", 0),
        "launched_at": project.get("launched_at"),
        "deadline": project.get("deadline"),
        "creator_name": creator.get("name"),
        "creator_id": creator.get("id"),
        "category": project.get("category", {}).get("name"),
        "subcategory": project.get("category", {}).get("parent", {}).get("name"),
        "location": project.get("location", {}).get("displayable_name"),
        "url": project.get("urls", {}).get("web", {}).get("project"),
        "rewards": [
            {
                "id": r.get("id"),
                "minimum": float(r.get("minimum", 0)),
                "title": r.get("title", ""),
                "description": r.get("description", ""),
                "backers_count": r.get("backers_count", 0),
                "limit": r.get("limit"),
                "remaining": r.get("remaining"),
                "estimated_delivery": r.get("estimated_delivery_on"),
            }
            for r in reward_list
        ],
    }


# Full usage
detail = get_project_detail("someuser/my-cool-gadget")
summary = extract_project_summary(detail)
print(f"{summary['name']}: {summary['pledged']:.0f}/{summary['goal']:.0f} ({len(summary['rewards'])} tiers)")

The project_slug is the creator/project-name portion of the URL, which you get from the urls.web.project field in the discover API response.

Extracting Reward Tier Data

Reward tiers are critical for understanding a campaign's monetization structure. Each tier shows how the creator is pricing different levels of backer access:

def analyze_reward_tiers(rewards: list[dict]) -> dict:
    """
    Analyze the reward tier structure of a campaign.

    Returns stats about pricing strategy, popular tiers, etc.
    """
    if not rewards:
        return {}

    prices = [r["minimum"] for r in rewards if r["minimum"] > 0]
    backer_by_tier = {r["minimum"]: r["backers_count"] for r in rewards}
    total_backers = sum(r["backers_count"] for r in rewards)

    # Find the most popular tier
    most_popular = max(rewards, key=lambda r: r["backers_count"]) if rewards else None

    # Estimate revenue contribution by tier
    tier_revenue = [
        {
            "price": r["minimum"],
            "backers": r["backers_count"],
            "revenue_estimate": r["minimum"] * r["backers_count"],
            "pct_of_backers": round(r["backers_count"] / total_backers * 100, 1) if total_backers else 0,
        }
        for r in rewards
        if r["minimum"] > 0
    ]

    return {
        "tier_count": len(rewards),
        "price_range": (min(prices), max(prices)) if prices else (0, 0),
        "most_popular_price": most_popular["minimum"] if most_popular else None,
        "most_popular_backers": most_popular["backers_count"] if most_popular else None,
        "tiers_with_limits": sum(1 for r in rewards if r.get("limit")),
        "tiers_sold_out": sum(1 for r in rewards if r.get("remaining") == 0),
        "tier_revenue": sorted(tier_revenue, key=lambda x: x["revenue_estimate"], reverse=True),
    }

Funding Progress and Backer Velocity

With goal, pledged, backers_count, launched_at, and deadline, you can calculate useful metrics:

from datetime import datetime, timezone

def funding_metrics(project: dict) -> dict:
    """
    Calculate derived funding metrics from a project dict.
    Works with both discover API summary and full detail objects.
    """
    now = datetime.now(timezone.utc)
    launched = datetime.fromtimestamp(project["launched_at"], tz=timezone.utc)
    deadline = datetime.fromtimestamp(project["deadline"], tz=timezone.utc)

    elapsed_days = (now - launched).total_seconds() / 86400
    remaining_days = max((deadline - now).total_seconds() / 86400, 0)
    total_days = (deadline - launched).total_seconds() / 86400

    pledged = float(project["pledged"])
    goal = float(project["goal"])
    backers = project["backers_count"]

    pct_funded = (pledged / goal * 100) if goal else 0
    pct_time = (elapsed_days / total_days * 100) if total_days else 0
    daily_rate = pledged / elapsed_days if elapsed_days > 0 else 0
    daily_backers = backers / elapsed_days if elapsed_days > 0 else 0
    projected_total = pledged + daily_rate * remaining_days
    projected_pct = (projected_total / goal * 100) if goal else 0
    avg_pledge = pledged / backers if backers else 0

    # Funding health assessment
    if pct_funded >= 100:
        health = "funded"
    elif pct_funded / max(pct_time, 1) >= 0.8:  # on track
        health = "healthy"
    elif pct_funded / max(pct_time, 1) >= 0.4:  # slightly behind
        health = "at_risk"
    else:
        health = "struggling"

    return {
        "pct_funded": round(pct_funded, 1),
        "pct_time_elapsed": round(pct_time, 1),
        "daily_rate_usd": round(daily_rate, 2),
        "daily_backers": round(daily_backers, 2),
        "projected_total_usd": round(projected_total, 2),
        "projected_pct_funded": round(projected_pct, 1),
        "avg_pledge_usd": round(avg_pledge, 2),
        "days_remaining": round(remaining_days, 1),
        "health": health,
    }


# Examples
# Project at 40% funded with 80% time elapsed = struggling
# Project at 200% funded with 60% time elapsed = healthy, watch for stretch goals
# Project at 100% funded with 1 day remaining = funded, coasting

metrics = funding_metrics(project_detail)
print(f"Health: {metrics['health']}")
print(f"Daily rate: ${metrics['daily_rate_usd']:,.2f}/day")
print(f"Projected final: ${metrics['projected_total_usd']:,.0f} ({metrics['projected_pct_funded']:.0f}%)")

A project at 40% funding with 80% time elapsed is in trouble. A project at 200% funded with 60% time remaining is worth watching for stretch goals. The health classification makes it easy to filter large datasets.

Creator Profile Data

The creator object inside each project contains the profile URL, name, and created/backed project counts. Cross-referencing with previous projects gives you a rough success rate:

def creator_profile(project: dict) -> dict:
    """Extract and enrich creator data from a project dict."""
    creator = project.get("creator", {})
    return {
        "id": creator.get("id"),
        "name": creator.get("name"),
        "slug": creator.get("slug"),
        "projects_created": creator.get("created_projects_count", 0),
        "projects_backed": creator.get("backed_projects_count", 0),
        "profile_url": creator.get("urls", {}).get("web", {}).get("user"),
        "avatar": creator.get("avatar", {}).get("medium"),
        "is_registered": creator.get("is_registered", False),
        "is_superbacker": creator.get("is_superbacker", False),
    }


def estimate_creator_success_rate(
    creator_id: int,
    proxy: Optional[str] = None,
) -> dict:
    """
    Estimate creator success rate by checking their past campaigns
    via the discover API.
    """
    # Search for all projects by this creator
    url = "https://www.kickstarter.com/discover/advanced"
    params = {
        "format": "json",
        "creator_id": creator_id,
        "per_page": 20,
    }
    headers = {"User-Agent": "Mozilla/5.0 (compatible; ResearchBot/1.0)"}

    client_kwargs = {"timeout": 15, "headers": headers}
    if proxy:
        client_kwargs["proxies"] = {"all://": proxy}

    with httpx.Client(**client_kwargs) as client:
        resp = client.get(url, params=params)
        if resp.status_code != 200:
            return {}
        data = resp.json()

    projects = data.get("projects", [])
    if not projects:
        return {"project_count": 0}

    states = [p.get("state") for p in projects]
    successful = states.count("successful")
    failed = states.count("failed")
    total_finished = successful + failed

    return {
        "project_count": len(projects),
        "successful": successful,
        "failed": failed,
        "canceled": states.count("canceled"),
        "success_rate": round(successful / total_finished * 100, 1) if total_finished else None,
        "total_pledged": sum(float(p.get("pledged", 0)) for p in projects if p.get("state") == "successful"),
    }

Anti-Bot Measures

Kickstarter's defenses are relatively light compared to LinkedIn or Crunchbase:

Standard Cloudflare protection on HTML pages — usually passive fingerprint checks only
Rate limiting around 60 requests per minute; exceeding this returns 429s
The /discover/advanced?format=json endpoint is less guarded than project HTML pages
No heavy JavaScript challenges or CAPTCHA walls for the discover API
Some IP ranges (notably cloud hosting ASNs) get challenged more aggressively

For low-volume research (a few hundred projects), plain httpx with a realistic User-Agent and appropriate request headers is usually enough. The key headers to include are:

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Accept": "application/json, text/html, */*",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Referer": "https://www.kickstarter.com/discover",
}

Proxy Strategy with ThorData

For volume scraping — monitoring thousands of campaigns, running daily snapshots, or rotating through all categories — you will hit Cloudflare on HTML pages and need residential IPs.

ThorData's residential proxy network handles Kickstarter's Cloudflare layer cleanly. Their sticky session option is useful when you need to load a project page and follow a redirect without triggering a session mismatch:

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "gate.thordata.net"
THORDATA_PORT = 9000

def make_proxy(country: str = "us", session_id: str = None) -> str:
    """Build a ThorData residential proxy URL."""
    user = f"{THORDATA_USER}-country-{country}"
    if session_id:
        user += f"-session-{session_id}"
    return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"


def scrape_with_proxy(slug: str) -> dict:
    """Scrape a project page using a residential proxy."""
    import random
    import string

    # Sticky session keeps same IP for the full page load chain
    session = "".join(random.choices(string.ascii_lowercase, k=8))
    proxy = make_proxy(country="us", session_id=session)

    return get_project_detail(slug, proxy=proxy)


# For discover API: rotate proxies per request (no session needed)
def discover_with_rotation(category: str, page: int) -> list[dict]:
    proxy = make_proxy(country="us")  # Fresh IP per discover request
    return discover_projects(category=category, page=page, proxy=proxy)

Rotate proxies per domain request rather than per session — Kickstarter does not require session continuity for read-only scraping of the discover API. For project detail pages, use sticky sessions to avoid mid-page IP changes.

Building a Funding Tracker in SQLite

To track how campaigns evolve over time, snapshot projects on a schedule:

import sqlite3
from datetime import datetime, timezone

def init_db(path: str = "kickstarter.db") -> sqlite3.Connection:
    """Initialize the Kickstarter tracking database."""
    conn = sqlite3.connect(path)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS projects (
            id INTEGER PRIMARY KEY,
            name TEXT,
            slug TEXT UNIQUE,
            blurb TEXT,
            category TEXT,
            subcategory TEXT,
            creator_name TEXT,
            creator_id INTEGER,
            goal REAL,
            currency TEXT,
            launched_at INTEGER,
            deadline INTEGER,
            location TEXT,
            url TEXT
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            project_id INTEGER NOT NULL,
            pledged REAL NOT NULL,
            backers_count INTEGER NOT NULL,
            state TEXT NOT NULL,
            comments_count INTEGER,
            updates_count INTEGER,
            captured_at TEXT NOT NULL,
            FOREIGN KEY (project_id) REFERENCES projects(id)
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS reward_tiers (
            id INTEGER PRIMARY KEY,
            project_id INTEGER NOT NULL,
            minimum REAL,
            title TEXT,
            backers_count INTEGER,
            reward_limit INTEGER,
            estimated_delivery TEXT,
            FOREIGN KEY (project_id) REFERENCES projects(id)
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_snapshots_project ON snapshots(project_id)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_snapshots_time ON snapshots(captured_at)")
    conn.commit()
    return conn


def save_project(conn: sqlite3.Connection, project: dict):
    """Upsert a project and insert a new snapshot."""
    now = datetime.now(timezone.utc).isoformat()

    # Upsert project metadata
    conn.execute("""
        INSERT OR REPLACE INTO projects
        (id, name, slug, blurb, category, creator_name, creator_id,
         goal, currency, launched_at, deadline, location, url)
        VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)
    """, (
        project["id"],
        project.get("name"),
        project.get("slug"),
        project.get("blurb"),
        project.get("category", {}).get("name") if isinstance(project.get("category"), dict) else project.get("category"),
        project.get("creator", {}).get("name") if isinstance(project.get("creator"), dict) else None,
        project.get("creator", {}).get("id") if isinstance(project.get("creator"), dict) else None,
        float(project.get("goal", 0)),
        project.get("currency"),
        project.get("launched_at"),
        project.get("deadline"),
        project.get("location", {}).get("displayable_name") if isinstance(project.get("location"), dict) else None,
        project.get("urls", {}).get("web", {}).get("project") if isinstance(project.get("urls"), dict) else None,
    ))

    # Insert snapshot
    conn.execute("""
        INSERT INTO snapshots
        (project_id, pledged, backers_count, state, comments_count, updates_count, captured_at)
        VALUES (?,?,?,?,?,?,?)
    """, (
        project["id"],
        float(project.get("pledged", 0)),
        project.get("backers_count", 0),
        project.get("state", "unknown"),
        project.get("comments_count"),
        project.get("updates_count"),
        now,
    ))

    conn.commit()


def get_velocity_trend(conn: sqlite3.Connection, project_id: int) -> list[dict]:
    """
    Get daily funding velocity from snapshot history.
    Useful for spotting viral moments or late surges.
    """
    rows = conn.execute("""
        SELECT pledged, backers_count, captured_at
        FROM snapshots
        WHERE project_id = ?
        ORDER BY captured_at ASC
    """, (project_id,)).fetchall()

    if len(rows) < 2:
        return []

    trend = []
    for i in range(1, len(rows)):
        prev = rows[i - 1]
        curr = rows[i]

        from datetime import datetime
        prev_dt = datetime.fromisoformat(prev[2])
        curr_dt = datetime.fromisoformat(curr[2])
        hours = (curr_dt - prev_dt).total_seconds() / 3600

        if hours > 0:
            trend.append({
                "timestamp": curr[2],
                "pledged": curr[0],
                "backers": curr[1],
                "usd_per_hour": round((curr[0] - prev[0]) / hours, 2),
                "backers_per_hour": round((curr[1] - prev[1]) / hours, 2),
            })

    return trend

Run this with a scheduler (cron, APScheduler, etc.) every few hours to build a time series. Comparing consecutive snapshots gives you hourly and daily funding velocity — useful for spotting viral moments or late surges.

def find_trending_projects(conn: sqlite3.Connection, min_velocity_usd: float = 500) -> list[dict]:
    """
    Find currently live projects with high funding velocity
    by comparing the last two snapshots.
    """
    rows = conn.execute("""
        SELECT
            p.id, p.name, p.category, p.url,
            s1.pledged as pledged_now,
            s2.pledged as pledged_before,
            s1.captured_at,
            s2.captured_at as prev_time,
            p.goal
        FROM projects p
        JOIN snapshots s1 ON s1.project_id = p.id
        JOIN snapshots s2 ON s2.project_id = p.id
        WHERE s1.id = (SELECT MAX(id) FROM snapshots WHERE project_id = p.id)
          AND s2.id = (SELECT MAX(id) FROM snapshots WHERE project_id = p.id AND id < s1.id)
          AND s1.state = 'live'
    """).fetchall()

    trending = []
    from datetime import datetime

    for row in rows:
        pledged_delta = row[4] - row[5]
        try:
            t1 = datetime.fromisoformat(row[6])
            t2 = datetime.fromisoformat(row[7])
            hours = (t1 - t2).total_seconds() / 3600
            usd_per_hour = pledged_delta / hours if hours > 0 else 0
        except Exception:
            continue

        if usd_per_hour >= min_velocity_usd:
            trending.append({
                "name": row[1],
                "category": row[2],
                "url": row[3],
                "pledged": row[4],
                "goal": row[8],
                "pct_funded": round(row[4] / row[8] * 100, 1) if row[8] else 0,
                "usd_per_hour": round(usd_per_hour, 0),
            })

    return sorted(trending, key=lambda x: x["usd_per_hour"], reverse=True)

Complete Monitoring Pipeline

import time
import random

def run_monitoring_pass(
    categories: list[str],
    db_path: str = "kickstarter.db",
    proxy: Optional[str] = None,
):
    """
    Discover live projects across categories and snapshot their data.
    Run this on a schedule (e.g., every 4 hours) to build time series.
    """
    conn = init_db(db_path)
    seen_ids = set()

    for category in categories:
        print(f"\nScraping category: {category}")

        for page in range(1, 6):  # First 100 projects per category
            projects = discover_projects(
                category=category,
                sort="most_funded",
                page=page,
                state="live",
                proxy=proxy,
            )

            if not projects:
                break

            for project in projects:
                pid = project.get("id")
                if pid in seen_ids:
                    continue
                seen_ids.add(pid)

                save_project(conn, project)

            print(f"  Page {page}: saved {len(projects)} projects")
            time.sleep(random.uniform(0.8, 1.5))

    conn.close()
    print(f"\nTotal unique projects tracked: {len(seen_ids)}")


# Run it
CATEGORIES = ["technology", "games", "design", "food", "publishing"]
run_monitoring_pass(CATEGORIES, proxy=make_proxy())

Legal Notes

Kickstarter's terms restrict automated scraping, but the data exposed is entirely public — no login, no paywall. Courts in the US have generally found that scraping publicly accessible data is protected (hiQ v. LinkedIn, 2022). Key guidelines:

Respect rate limits — 429 responses are a signal, not a challenge
Do not scrape backer personal data (emails are never exposed publicly on Kickstarter)
Do not hammer the server — treat it like a polite crawl, not a bulk download
Don't resell raw Kickstarter data — use it as input to analysis, tooling, or monitoring products

Summary

Kickstarter is one of the more accessible platforms to scrape: a clean JSON discover API, full project data embedded in page HTML, and comparatively light anti-bot defenses. The funding velocity and creator track record data you can derive are genuinely useful signals for market research and trend tracking.

For production workloads that need to run across thousands of projects daily, ThorData residential proxies will keep your requests flowing through Cloudflare without interruption. Start with the discover API, layer in project detail pages for the campaigns you care about, and snapshot to SQLite to build the time series that makes the data actionable.

The funding velocity metrics are the real payoff here — a snapshot every few hours across a few thousand projects gives you a live signal of which campaigns are going viral, which are stalling, and which niches are seeing unusually strong backer engagement. That's the kind of market intelligence that you can't buy and that aggregator sites don't provide.

Scraping Kickstarter Project Data (2026)

Scraping Kickstarter Project Data (2026)

What Data You Can Extract

Understanding the Kickstarter Data Structure

The Discover API

Project Detail JSON Embedded in HTML

Extracting Reward Tier Data

Funding Progress and Backer Velocity

Creator Profile Data

Anti-Bot Measures

Proxy Strategy with ThorData

Building a Funding Tracker in SQLite

Identifying Trending Projects

Complete Monitoring Pipeline

Legal Notes

Summary