Scraping Thingiverse 3D Model Data and Remix Networks with Python (2026)

2026-04-09 ["thingiverse" "web scraping" "python" "3d-printing" "makerbot-api"]

Scraping Thingiverse 3D Model Data and Remix Networks with Python (2026)

Thingiverse is the largest repository of 3D printable models — over 6 million Things uploaded by a community of makers, designers, and engineers. The data is genuinely useful: download counts as a proxy for popularity, remix chains that map creative influence across the community, creator profiles with follower counts and upload histories, and category distributions that show what the 3D printing community is actually making.

MakerBot runs a semi-public REST API that exposes most of this, and it still works in 2026 despite being largely undocumented since MakerBot's ownership changed hands. This guide covers what the API exposes, how to authenticate, how to handle rate limits, and working Python code for the most useful endpoints — plus proxy integration and SQLite storage for production-grade data collection.

What Data Is Available

The Thingiverse API exposes a rich set of fields for each Thing (model):

Model metadata — name, description, creator username, upload date, license, category, tags
Engagement counts — download count, like count (hearts), make count (users who printed it), comment count, remix count, collect count
File information — STL/OBJ/AMF file names, sizes, download URLs, thumbnail images
Remix relationships — each Thing has an ancestors field pointing to the Things it was remixed from, and a /remixes endpoint returns its direct children
Collection membership — which public collections a Thing appears in
Creator profiles — follower count, following count, total Things, total makes, account creation date, cover image

Why This Data Is Useful

Influence mapping: The remix graph reveals creative lineage — which original designs spawned dozens of derivatives. Identifying highly-remixed "root" designs helps you understand which creators are most influential in a niche.

Popularity signals: Download counts are one of the few public, verifiable metrics for 3D model popularity. Combined with like counts and make counts, you can build a multi-dimensional popularity score.

Category trend analysis: Tracking which categories are growing in total uploads and downloads shows you where the community's attention is shifting.

Creator analytics: Follower counts and upload rates help identify prolific creators in specific niches — useful for community building, sponsorships, or curated recommendation systems.

Authentication and Token Acquisition

The Thingiverse API requires a Bearer token. As of 2026, MakerBot has stopped approving new developer applications through the official portal. The practical approach is to extract a token from an authenticated browser session.

Extracting a Token from Browser DevTools

Log into Thingiverse at thingiverse.com
Open DevTools (F12 → Network tab)
Filter requests for api.thingiverse.com
Click any API request and look at the Authorization header: Bearer <long_token>
Copy that token — it's typically valid for weeks to months

# Store the token securely as an environment variable
export THINGIVERSE_TOKEN="your_bearer_token_here"

If you have an existing registered application, the token comes through the standard OAuth flow. Store it the same way — never in code.

Verifying Your Token Works

import httpx
import os

TOKEN = os.environ.get("THINGIVERSE_TOKEN", "")
BASE_URL = "https://api.thingiverse.com"

def verify_token() -> bool:
    """Verify the token works by fetching the authenticated user's profile."""
    resp = httpx.get(
        f"{BASE_URL}/users/me",
        headers={"Authorization": f"Bearer {TOKEN}"},
        timeout=10,
    )
    if resp.status_code == 200:
        user = resp.json()
        print(f"Authenticated as: {user.get('name')} ({user.get('email')})")
        return True
    else:
        print(f"Token verification failed: {resp.status_code}")
        return False

verify_token()

Rate Limits and Anti-Bot Measures

The API enforces rate limits that are not publicly documented but are consistent in practice:

Bearer token required — All endpoints reject unauthenticated requests with a 401. There is no anonymous access.
Rate limit: ~300 requests/hour per token — Exceeding this returns a 429. The limit resets on the hour, not as a rolling window. Spreading requests with a 12-second delay between them keeps you safely under.
IP-level blocking for abuse patterns — Rapid sequential requests from the same IP, even with a valid token, can trigger IP blocks. These are silent — you get connection timeouts rather than explicit error responses.
Pagination caps — Search and listing endpoints cap at 30 results per page. Results become sparse past page 100.

Proxy Strategy

When collecting data at scale — crawling tens of thousands of models across multiple categories — the per-IP rate limiting becomes the binding constraint. A single IP running at the safe request cadence takes days to cover meaningful swaths of the catalog.

ThorData's residential proxies handle this well: rotating residential IPs means each request originates from a fresh address, so you can run parallel workers without accumulating rate limit pressure on a single IP. Their geo-targeting is useful if you want to ensure consistent regional routing:

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "gate.thordata.net"
THORDATA_PORT = 9000

def make_proxy(country: str = "us", session_id: str = None) -> str:
    """Build a ThorData residential proxy URL."""
    user = f"{THORDATA_USER}-country-{country}"
    if session_id:
        user += f"-session-{session_id}"
    return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"

Setup and Base Client

uv pip install httpx

import httpx
import time
import os
import sqlite3
import json
from typing import Optional

BASE_URL = "https://api.thingiverse.com"
TOKEN = os.environ["THINGIVERSE_TOKEN"]

HEADERS = {
    "Authorization": f"Bearer {TOKEN}",
    "User-Agent": "Mozilla/5.0 (compatible; research-bot/1.0)",
    "Accept": "application/json",
}


def get(
    path: str,
    params: dict = None,
    proxy: Optional[str] = None,
    retries: int = 5,
) -> dict:
    """
    Make a Thingiverse API request with exponential backoff retry logic.

    Args:
        path: API path (e.g., "/things/12345")
        params: Optional query parameters
        proxy: Optional proxy URL
        retries: Max retry attempts

    Returns:
        Parsed JSON response, or {} on failure
    """
    url = f"{BASE_URL}{path}"
    client_kwargs = {
        "headers": HEADERS,
        "timeout": 30,
    }
    if proxy:
        client_kwargs["proxies"] = {"all://": proxy}

    for attempt in range(retries):
        try:
            with httpx.Client(**client_kwargs) as client:
                resp = client.get(url, params=params or {})

            if resp.status_code == 200:
                return resp.json()
            elif resp.status_code == 429:
                wait = 2 ** attempt * 15
                print(f"Rate limited on {path}, waiting {wait}s (attempt {attempt + 1}/{retries})")
                time.sleep(wait)
            elif resp.status_code == 401:
                raise Exception("Invalid or expired Bearer token — re-extract from browser session")
            elif resp.status_code == 404:
                return {}  # Thing not found, return empty
            else:
                wait = 2 ** attempt * 5
                print(f"HTTP {resp.status_code} on {path}, waiting {wait}s")
                time.sleep(wait)

        except httpx.TimeoutException:
            wait = 2 ** attempt * 10
            print(f"Timeout on {path}, waiting {wait}s")
            time.sleep(wait)
        except Exception as e:
            print(f"Error on {path}: {e}")
            if attempt == retries - 1:
                return {}

    raise Exception(f"Failed after {retries} retries for {url}")

Searching Things

Search returns up to 30 results per page. Paginate with the page parameter.

def search_things(
    query: str,
    page: int = 1,
    per_page: int = 30,
    sort: str = "relevant",
    proxy: Optional[str] = None,
) -> list[dict]:
    """
    Search for Things by keyword.

    Sort options: relevant | newest | popular | makes | derivatives
    Returns summary objects with id, name, creator, download_count, etc.
    """
    results = get(
        f"/search/{query}",
        params={
            "type": "things",
            "page": page,
            "per_page": per_page,
            "sort": sort,
        },
        proxy=proxy,
    )
    # Response is a list directly for search
    if isinstance(results, list):
        return results
    return results.get("hits", [])


def search_all(
    query: str,
    max_pages: int = 10,
    sort: str = "popular",
    proxy: Optional[str] = None,
) -> list[dict]:
    """Paginate through search results up to max_pages."""
    results = []

    for page in range(1, max_pages + 1):
        batch = search_things(query, page=page, sort=sort, proxy=proxy)
        if not batch:
            print(f"No results on page {page}, stopping")
            break
        results.extend(batch)
        print(f"Page {page}: {len(batch)} results (total: {len(results)})")
        time.sleep(12)  # Stay under 300 req/hour limit

    return results


# Example: find the most popular gothic architecture models
results = search_all("gothic architecture", sort="popular", max_pages=5)
print(f"Found {len(results)} Things")
for r in results[:5]:
    print(f"  {r['name']}: {r.get('download_count', 0)} downloads")

Fetching Thing Details

def get_thing(thing_id: int, proxy: Optional[str] = None) -> dict:
    """
    Fetch full metadata for a Thing.

    Key response fields:
      id, name, description, url, public_url
      creator.name, creator.public_url
      added (ISO timestamp), modified
      is_published, is_wip
      like_count, collect_count, comment_count
      download_count, make_count, remix_count
      default_image.url, preview_image
      license, categories (list), tags (list)
      ancestors (list of dicts — remix parents)
      is_featured, is_nsfw
    """
    return get(f"/things/{thing_id}", proxy=proxy)


def get_thing_files(thing_id: int, proxy: Optional[str] = None) -> list[dict]:
    """
    Fetch file metadata for a Thing's downloadable assets.

    Key response fields per file:
      id, name, size (bytes)
      download_url, direct_url
      date (upload timestamp)
      thumbnail, default_image
    """
    result = get(f"/things/{thing_id}/files", proxy=proxy)
    return result if isinstance(result, list) else []


def extract_thing_summary(thing: dict) -> dict:
    """Extract the most analytically useful fields from a Thing."""
    creator = thing.get("creator", {}) or {}
    return {
        "id": thing.get("id"),
        "name": thing.get("name"),
        "creator": creator.get("name"),
        "creator_url": creator.get("public_url"),
        "added": thing.get("added"),
        "modified": thing.get("modified"),
        "license": thing.get("license"),
        "download_count": thing.get("download_count", 0),
        "like_count": thing.get("like_count", 0),
        "make_count": thing.get("make_count", 0),
        "remix_count": thing.get("remix_count", 0),
        "collect_count": thing.get("collect_count", 0),
        "comment_count": thing.get("comment_count", 0),
        "tags": [t.get("name") for t in (thing.get("tags") or []) if isinstance(t, dict)],
        "categories": [c.get("name") for c in (thing.get("categories") or []) if isinstance(c, dict)],
        "ancestors": [a.get("id") for a in (thing.get("ancestors") or []) if isinstance(a, dict)],
        "is_featured": thing.get("is_featured", False),
        "url": thing.get("public_url"),
    }

Fetching and Traversing the Remix Network

The remix graph is directional. Each Thing knows its parents (via ancestors in the detail response) and you can fetch its children via the /remixes endpoint.

def get_remixes(thing_id: int, proxy: Optional[str] = None) -> list[dict]:
    """
    Fetch direct remixes (children) of a Thing.

    Each item has the same summary shape as search results:
      id, name, creator, download_count, like_count, etc.
    """
    result = get(f"/things/{thing_id}/remixes", proxy=proxy)
    return result if isinstance(result, list) else []


def build_remix_tree(
    root_id: int,
    depth: int = 2,
    visited: set = None,
    proxy: Optional[str] = None,
) -> dict:
    """
    Recursively build a remix tree starting from root_id.

    Returns a nested dict: {id, name, download_count, children: [...]}
    depth controls how many levels deep to traverse.
    visited prevents infinite loops in cyclic remix references.
    """
    if visited is None:
        visited = set()

    if root_id in visited:
        return {"id": root_id, "name": "ALREADY_VISITED", "children": []}

    visited.add(root_id)
    thing = get_thing(root_id, proxy=proxy)
    time.sleep(12)

    node = {
        "id": root_id,
        "name": thing.get("name"),
        "creator": thing.get("creator", {}).get("name") if isinstance(thing.get("creator"), dict) else None,
        "download_count": thing.get("download_count", 0),
        "like_count": thing.get("like_count", 0),
        "make_count": thing.get("make_count", 0),
        "added": thing.get("added"),
        "children": [],
    }

    if depth > 0:
        remixes = get_remixes(root_id, proxy=proxy)
        time.sleep(12)

        for child in remixes:
            child_id = child.get("id")
            if child_id and child_id not in visited:
                node["children"].append(
                    build_remix_tree(child_id, depth=depth - 1, visited=visited, proxy=proxy)
                )
                time.sleep(12)

    return node


def flatten_remix_tree(tree: dict, parent_id: int = None) -> list[dict]:
    """Convert a nested remix tree into a flat list of edges for graph analysis."""
    edges = []
    node_id = tree.get("id")

    if parent_id is not None and node_id:
        edges.append({
            "parent_id": parent_id,
            "child_id": node_id,
            "child_name": tree.get("name"),
            "child_downloads": tree.get("download_count", 0),
        })

    for child in tree.get("children", []):
        edges.extend(flatten_remix_tree(child, parent_id=node_id))

    return edges


# Example: map the remix tree for the iconic Flexi-Rex (Thing #763622)
# This is one of the most-remixed 3D models ever — has hundreds of derivatives
tree = build_remix_tree(763622, depth=2)
edges = flatten_remix_tree(tree)
print(f"Remix tree has {len(edges)} edges")
print(json.dumps(tree, indent=2)[:1000])

Fetching Creator Profiles

def get_user(username: str, proxy: Optional[str] = None) -> dict:
    """
    Fetch a creator's profile.

    Key response fields:
      id, name, first_name, last_name
      public_url, thumbnail, cover_image
      location, bio
      follower_count, following_count
      thing_count, make_count, like_count
      skill_level (beginner/intermediate/advanced)
      registered (ISO timestamp)
    """
    return get(f"/users/{username}", proxy=proxy)


def get_user_things(
    username: str,
    page: int = 1,
    per_page: int = 30,
    proxy: Optional[str] = None,
) -> list[dict]:
    """Fetch Things uploaded by a specific user, paginated."""
    result = get(
        f"/users/{username}/things",
        params={"page": page, "per_page": per_page},
        proxy=proxy,
    )
    return result if isinstance(result, list) else []


def get_user_all_things(username: str, proxy: Optional[str] = None) -> list[dict]:
    """Fetch all Things by a user across all pages."""
    all_things = []
    page = 1

    while True:
        batch = get_user_things(username, page=page, proxy=proxy)
        if not batch:
            break
        all_things.extend(batch)
        print(f"User {username} - page {page}: {len(batch)} Things")
        page += 1
        time.sleep(12)

    return all_things


def get_user_makes(username: str, proxy: Optional[str] = None) -> list[dict]:
    """Fetch all Makes (physical prints) by a user."""
    result = get(f"/users/{username}/makes", proxy=proxy)
    return result if isinstance(result, list) else []


def get_user_liked(username: str, proxy: Optional[str] = None) -> list[dict]:
    """Fetch Things that a user has liked (hearted)."""
    result = get(f"/users/{username}/likes", proxy=proxy)
    return result if isinstance(result, list) else []


def get_user_collected(username: str, proxy: Optional[str] = None) -> list[dict]:
    """Fetch collections created by a user."""
    result = get(f"/users/{username}/collections", proxy=proxy)
    return result if isinstance(result, list) else []


def creator_analytics(username: str, proxy: Optional[str] = None) -> dict:
    """
    Build a comprehensive analytics profile for a creator.
    Aggregates profile data with their Things' engagement metrics.
    """
    profile = get_user(username, proxy=proxy)
    time.sleep(12)

    things = get_user_all_things(username, proxy=proxy)

    # Aggregate engagement across all Things
    total_downloads = sum(t.get("download_count", 0) for t in things)
    total_likes = sum(t.get("like_count", 0) for t in things)
    total_makes = sum(t.get("make_count", 0) for t in things)
    total_remixes = sum(t.get("remix_count", 0) for t in things)

    # Find most popular Things
    top_things = sorted(things, key=lambda t: t.get("download_count", 0), reverse=True)[:5]

    return {
        "username": username,
        "name": profile.get("name"),
        "follower_count": profile.get("follower_count", 0),
        "following_count": profile.get("following_count", 0),
        "skill_level": profile.get("skill_level"),
        "registered": profile.get("registered"),
        "thing_count": profile.get("thing_count", len(things)),
        "total_downloads": total_downloads,
        "total_likes": total_likes,
        "total_makes": total_makes,
        "total_remixes": total_remixes,
        "avg_downloads_per_thing": round(total_downloads / len(things), 1) if things else 0,
        "top_things": [
            {
                "name": t.get("name"),
                "downloads": t.get("download_count", 0),
                "url": t.get("public_url"),
            }
            for t in top_things
        ],
    }

Browsing Categories

Thingiverse organizes models into categories. You can browse by category to find things in a specific domain:

def get_categories(proxy: Optional[str] = None) -> list[dict]:
    """Fetch all top-level categories."""
    result = get("/categories", proxy=proxy)
    return result if isinstance(result, list) else []


def get_category_things(
    category_name: str,
    page: int = 1,
    sort: str = "popular",
    proxy: Optional[str] = None,
) -> list[dict]:
    """
    Fetch Things in a specific category.

    category_name: URL-encoded category name (e.g., "art", "gadgets", "household")
    sort: popular | newest | makes | derivatives
    """
    result = get(
        f"/categories/{category_name}/things",
        params={"page": page, "per_page": 30, "sort": sort},
        proxy=proxy,
    )
    return result if isinstance(result, list) else []


# Example: Browse household category
household_things = []
for page in range(1, 4):
    batch = get_category_things("household", page=page, sort="popular")
    if not batch:
        break
    household_things.extend(batch)
    time.sleep(12)

print(f"Found {len(household_things)} popular household Things")

Storing in SQLite

def init_db(db_path: str = "thingiverse.db") -> sqlite3.Connection:
    """Initialize the Thingiverse database with Things, creators, and remix graph."""
    conn = sqlite3.connect(db_path)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS things (
            id INTEGER PRIMARY KEY,
            name TEXT,
            creator TEXT,
            creator_url TEXT,
            added TEXT,
            modified TEXT,
            license TEXT,
            download_count INTEGER DEFAULT 0,
            like_count INTEGER DEFAULT 0,
            make_count INTEGER DEFAULT 0,
            remix_count INTEGER DEFAULT 0,
            collect_count INTEGER DEFAULT 0,
            comment_count INTEGER DEFAULT 0,
            tags TEXT,
            categories TEXT,
            ancestors TEXT,
            is_featured INTEGER DEFAULT 0,
            url TEXT,
            last_scraped TEXT,
            raw_json TEXT
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS remix_edges (
            parent_id INTEGER NOT NULL,
            child_id INTEGER NOT NULL,
            PRIMARY KEY (parent_id, child_id)
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS creators (
            username TEXT PRIMARY KEY,
            name TEXT,
            follower_count INTEGER,
            following_count INTEGER,
            thing_count INTEGER,
            make_count INTEGER,
            skill_level TEXT,
            registered TEXT,
            location TEXT,
            last_scraped TEXT
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS download_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            thing_id INTEGER NOT NULL,
            download_count INTEGER NOT NULL,
            like_count INTEGER,
            make_count INTEGER,
            recorded_at TEXT NOT NULL
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_things_creator ON things(creator)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_things_downloads ON things(download_count DESC)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_remix_parent ON remix_edges(parent_id)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_remix_child ON remix_edges(child_id)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_snapshots_thing ON download_snapshots(thing_id)")

    conn.commit()
    return conn


def store_thing(conn: sqlite3.Connection, data: dict):
    """Save a Thing to the database, recording download count history."""
    now = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())

    # Check if download count changed (for trending detection)
    prev = conn.execute(
        "SELECT download_count FROM things WHERE id=?", (data.get("id"),)
    ).fetchone()

    if prev is None or prev[0] != data.get("download_count", 0):
        conn.execute("""
            INSERT INTO download_snapshots (thing_id, download_count, like_count, make_count, recorded_at)
            VALUES (?,?,?,?,?)
        """, (
            data.get("id"),
            data.get("download_count", 0),
            data.get("like_count", 0),
            data.get("make_count", 0),
            now,
        ))

    conn.execute("""
        INSERT OR REPLACE INTO things
        (id, name, creator, creator_url, added, modified, license,
         download_count, like_count, make_count, remix_count, collect_count, comment_count,
         tags, categories, ancestors, is_featured, url, last_scraped, raw_json)
        VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
    """, (
        data.get("id"),
        data.get("name"),
        data.get("creator", {}).get("name") if isinstance(data.get("creator"), dict) else data.get("creator"),
        data.get("creator", {}).get("public_url") if isinstance(data.get("creator"), dict) else data.get("creator_url"),
        data.get("added"),
        data.get("modified"),
        data.get("license"),
        data.get("download_count", 0),
        data.get("like_count", 0),
        data.get("make_count", 0),
        data.get("remix_count", 0),
        data.get("collect_count", 0),
        data.get("comment_count", 0),
        json.dumps([t.get("name") for t in (data.get("tags") or []) if isinstance(t, dict)]),
        json.dumps([c.get("name") for c in (data.get("categories") or []) if isinstance(c, dict)]),
        json.dumps([a.get("id") for a in (data.get("ancestors") or []) if isinstance(a, dict)]),
        1 if data.get("is_featured") else 0,
        data.get("public_url"),
        now,
        json.dumps(data),
    ))

    # Store remix edges from ancestors list
    for ancestor in (data.get("ancestors") or []):
        if isinstance(ancestor, dict) and ancestor.get("id"):
            conn.execute(
                "INSERT OR IGNORE INTO remix_edges (parent_id, child_id) VALUES (?,?)",
                (ancestor["id"], data["id"]),
            )

    conn.commit()


def store_creator(conn: sqlite3.Connection, data: dict):
    """Save or update a creator profile."""
    now = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
    conn.execute("""
        INSERT OR REPLACE INTO creators
        (username, name, follower_count, following_count, thing_count, make_count,
         skill_level, registered, location, last_scraped)
        VALUES (?,?,?,?,?,?,?,?,?,?)
    """, (
        data.get("name"),
        data.get("name"),
        data.get("follower_count", 0),
        data.get("following_count", 0),
        data.get("thing_count", 0),
        data.get("make_count", 0),
        data.get("skill_level"),
        data.get("registered"),
        data.get("location"),
        now,
    ))
    conn.commit()

Analytics Queries

def find_influential_things(conn: sqlite3.Connection, min_remixes: int = 5) -> list[dict]:
    """Find Things that have been highly remixed — creative influencers."""
    rows = conn.execute("""
        SELECT t.id, t.name, t.creator, t.download_count,
               t.remix_count, t.like_count,
               COUNT(re.child_id) as measured_remixes
        FROM things t
        LEFT JOIN remix_edges re ON re.parent_id = t.id
        WHERE t.remix_count >= ?
        GROUP BY t.id
        ORDER BY measured_remixes DESC
        LIMIT 50
    """, (min_remixes,)).fetchall()

    return [
        {
            "id": r[0],
            "name": r[1],
            "creator": r[2],
            "downloads": r[3],
            "remix_count": r[4],
            "likes": r[5],
            "measured_remixes": r[6],
        }
        for r in rows
    ]


def trending_things(conn: sqlite3.Connection, days: int = 7) -> list[dict]:
    """
    Find Things with the highest download velocity in the last N days.
    Compares latest snapshot to snapshot from N days ago.
    """
    from datetime import datetime, timedelta
    cutoff = (datetime.utcnow() - timedelta(days=days)).strftime("%Y-%m-%dT%H:%M:%SZ")

    rows = conn.execute("""
        SELECT
            t.id, t.name, t.creator,
            s_now.download_count as downloads_now,
            s_old.download_count as downloads_then,
            (s_now.download_count - COALESCE(s_old.download_count, 0)) as gain
        FROM things t
        JOIN (
            SELECT thing_id, download_count
            FROM download_snapshots ds1
            WHERE recorded_at = (SELECT MAX(recorded_at) FROM download_snapshots WHERE thing_id = ds1.thing_id)
        ) s_now ON s_now.thing_id = t.id
        LEFT JOIN (
            SELECT thing_id, download_count
            FROM download_snapshots ds2
            WHERE recorded_at <= ?
              AND recorded_at = (SELECT MAX(recorded_at) FROM download_snapshots WHERE thing_id = ds2.thing_id AND recorded_at <= ?)
        ) s_old ON s_old.thing_id = t.id
        WHERE gain > 0
        ORDER BY gain DESC
        LIMIT 20
    """, (cutoff, cutoff)).fetchall()

    return [
        {
            "id": r[0],
            "name": r[1],
            "creator": r[2],
            "downloads_now": r[3],
            "downloads_gain": r[5],
        }
        for r in rows
    ]


def category_distribution(conn: sqlite3.Connection) -> list[dict]:
    """Analyze download distribution by category."""
    rows = conn.execute("""
        SELECT categories, SUM(download_count) as total_downloads, COUNT(*) as thing_count
        FROM things
        WHERE categories != '[]' AND categories IS NOT NULL
        GROUP BY categories
        ORDER BY total_downloads DESC
        LIMIT 30
    """).fetchall()

    results = []
    for row in rows:
        try:
            cats = json.loads(row[0])
            for cat in cats:
                results.append({
                    "category": cat,
                    "total_downloads": row[1],
                    "thing_count": row[2],
                    "avg_downloads": round(row[1] / row[2], 1) if row[2] else 0,
                })
        except (json.JSONDecodeError, TypeError):
            pass

    return results

Putting It All Together

if __name__ == "__main__":
    import random
    import string

    conn = init_db()

    print("=== Phase 1: Search and collect Things ===")
    queries = ["gothic architecture", "flexi", "cable management", "planters", "tools"]

    for query in queries:
        print(f"\nSearching: {query}")
        results = search_all(query, max_pages=3, sort="popular")

        for item in results:
            time.sleep(12)
            detail = get_thing(item.get("id") or item.get("thing_id") or item["id"])
            if detail:
                store_thing(conn, detail)
                print(f"  Stored: {detail.get('name')} (downloads: {detail.get('download_count', 0)})")

    print("\n=== Phase 2: Build remix trees for top influencers ===")
    influencers = find_influential_things(conn, min_remixes=3)
    print(f"Found {len(influencers)} influential Things")

    for thing in influencers[:3]:
        print(f"\nBuilding remix tree for: {thing['name']} (ID: {thing['id']})")
        tree = build_remix_tree(thing["id"], depth=2)
        edges = flatten_remix_tree(tree)
        print(f"  {len(edges)} remix relationships found")

    print("\n=== Phase 3: Analytics ===")
    trending = trending_things(conn, days=7)
    print(f"\nTop 5 trending Things this week:")
    for t in trending[:5]:
        print(f"  {t['name']}: +{t['downloads_gain']} downloads")

    cats = category_distribution(conn)
    print(f"\nTop 5 categories by total downloads:")
    seen = set()
    for c in cats:
        if c["category"] not in seen:
            print(f"  {c['category']}: {c['total_downloads']:,} total downloads")
            seen.add(c["category"])
        if len(seen) >= 5:
            break

    conn.close()

Scaling Considerations

A 12-second delay between requests keeps you under the ~300 req/hour threshold on a single token and IP. For larger crawls:

Multiple tokens — If you have access to more than one authenticated Thingiverse account, each token has an independent rate limit bucket. Distribute requests across tokens in round-robin.

Proxy rotation — Even with a single token, IP rotation removes the per-IP blocking risk. ThorData's residential proxy pool integrates cleanly with httpx:

import random
import string

def get_with_rotation(path: str, params: dict = None) -> dict:
    """Make API request with fresh residential IP per call."""
    # Fresh IP per request — no session needed for Thingiverse API
    session_id = "".join(random.choices(string.ascii_lowercase, k=6))
    proxy = make_proxy(country="us", session_id=session_id)
    return get(path, params=params, proxy=proxy)

Neither approach changes the per-token hourly quota, but combining them means you're never the bottleneck at the IP level — you can run parallel workers at higher throughput while keeping individual IP rates low.

Legal Considerations

Thingiverse's terms of service permit reasonable API usage for personal projects, research, and non-commercial tools. The models themselves are licensed individually — most use Creative Commons variants. Redistributing STL files depends on the individual license; metadata (names, counts, descriptions) is generally less restricted but check Thingiverse's current terms before building anything commercial.

The API is not guaranteed to be stable. MakerBot has not published a deprecation policy, and endpoint behavior has changed without notice in the past. Build your code defensively: log raw responses, handle missing fields with .get() defaults, and do not assume response shape is identical across all Thing types. The raw_json column in the SQLite schema preserves the original response for reprocessing if the schema evolves.