Scraping Thingiverse 3D Model Data and Remix Networks with Python (2026)
Scraping Thingiverse 3D Model Data and Remix Networks with Python (2026)
Thingiverse is the largest repository of 3D printable models — over 6 million Things uploaded by a community of makers, designers, and engineers. The data is genuinely useful: download counts as a proxy for popularity, remix chains that map creative influence across the community, creator profiles with follower counts and upload histories, and category distributions that show what the 3D printing community is actually making.
MakerBot runs a semi-public REST API that exposes most of this, and it still works in 2026 despite being largely undocumented since MakerBot's ownership changed hands. This guide covers what the API exposes, how to authenticate, how to handle rate limits, and working Python code for the most useful endpoints — plus proxy integration and SQLite storage for production-grade data collection.
What Data Is Available
The Thingiverse API exposes a rich set of fields for each Thing (model):
- Model metadata — name, description, creator username, upload date, license, category, tags
- Engagement counts — download count, like count (hearts), make count (users who printed it), comment count, remix count, collect count
- File information — STL/OBJ/AMF file names, sizes, download URLs, thumbnail images
- Remix relationships — each Thing has an
ancestorsfield pointing to the Things it was remixed from, and a/remixesendpoint returns its direct children - Collection membership — which public collections a Thing appears in
- Creator profiles — follower count, following count, total Things, total makes, account creation date, cover image
Why This Data Is Useful
Influence mapping: The remix graph reveals creative lineage — which original designs spawned dozens of derivatives. Identifying highly-remixed "root" designs helps you understand which creators are most influential in a niche.
Popularity signals: Download counts are one of the few public, verifiable metrics for 3D model popularity. Combined with like counts and make counts, you can build a multi-dimensional popularity score.
Category trend analysis: Tracking which categories are growing in total uploads and downloads shows you where the community's attention is shifting.
Creator analytics: Follower counts and upload rates help identify prolific creators in specific niches — useful for community building, sponsorships, or curated recommendation systems.
Authentication and Token Acquisition
The Thingiverse API requires a Bearer token. As of 2026, MakerBot has stopped approving new developer applications through the official portal. The practical approach is to extract a token from an authenticated browser session.
Extracting a Token from Browser DevTools
- Log into Thingiverse at
thingiverse.com - Open DevTools (F12 → Network tab)
- Filter requests for
api.thingiverse.com - Click any API request and look at the
Authorizationheader:Bearer <long_token> - Copy that token — it's typically valid for weeks to months
# Store the token securely as an environment variable
export THINGIVERSE_TOKEN="your_bearer_token_here"
If you have an existing registered application, the token comes through the standard OAuth flow. Store it the same way — never in code.
Verifying Your Token Works
import httpx
import os
TOKEN = os.environ.get("THINGIVERSE_TOKEN", "")
BASE_URL = "https://api.thingiverse.com"
def verify_token() -> bool:
"""Verify the token works by fetching the authenticated user's profile."""
resp = httpx.get(
f"{BASE_URL}/users/me",
headers={"Authorization": f"Bearer {TOKEN}"},
timeout=10,
)
if resp.status_code == 200:
user = resp.json()
print(f"Authenticated as: {user.get('name')} ({user.get('email')})")
return True
else:
print(f"Token verification failed: {resp.status_code}")
return False
verify_token()
Rate Limits and Anti-Bot Measures
The API enforces rate limits that are not publicly documented but are consistent in practice:
- Bearer token required — All endpoints reject unauthenticated requests with a 401. There is no anonymous access.
- Rate limit: ~300 requests/hour per token — Exceeding this returns a 429. The limit resets on the hour, not as a rolling window. Spreading requests with a 12-second delay between them keeps you safely under.
- IP-level blocking for abuse patterns — Rapid sequential requests from the same IP, even with a valid token, can trigger IP blocks. These are silent — you get connection timeouts rather than explicit error responses.
- Pagination caps — Search and listing endpoints cap at 30 results per page. Results become sparse past page 100.
Proxy Strategy
When collecting data at scale — crawling tens of thousands of models across multiple categories — the per-IP rate limiting becomes the binding constraint. A single IP running at the safe request cadence takes days to cover meaningful swaths of the catalog.
ThorData's residential proxies handle this well: rotating residential IPs means each request originates from a fresh address, so you can run parallel workers without accumulating rate limit pressure on a single IP. Their geo-targeting is useful if you want to ensure consistent regional routing:
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "gate.thordata.net"
THORDATA_PORT = 9000
def make_proxy(country: str = "us", session_id: str = None) -> str:
"""Build a ThorData residential proxy URL."""
user = f"{THORDATA_USER}-country-{country}"
if session_id:
user += f"-session-{session_id}"
return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"
Setup and Base Client
uv pip install httpx
import httpx
import time
import os
import sqlite3
import json
from typing import Optional
BASE_URL = "https://api.thingiverse.com"
TOKEN = os.environ["THINGIVERSE_TOKEN"]
HEADERS = {
"Authorization": f"Bearer {TOKEN}",
"User-Agent": "Mozilla/5.0 (compatible; research-bot/1.0)",
"Accept": "application/json",
}
def get(
path: str,
params: dict = None,
proxy: Optional[str] = None,
retries: int = 5,
) -> dict:
"""
Make a Thingiverse API request with exponential backoff retry logic.
Args:
path: API path (e.g., "/things/12345")
params: Optional query parameters
proxy: Optional proxy URL
retries: Max retry attempts
Returns:
Parsed JSON response, or {} on failure
"""
url = f"{BASE_URL}{path}"
client_kwargs = {
"headers": HEADERS,
"timeout": 30,
}
if proxy:
client_kwargs["proxies"] = {"all://": proxy}
for attempt in range(retries):
try:
with httpx.Client(**client_kwargs) as client:
resp = client.get(url, params=params or {})
if resp.status_code == 200:
return resp.json()
elif resp.status_code == 429:
wait = 2 ** attempt * 15
print(f"Rate limited on {path}, waiting {wait}s (attempt {attempt + 1}/{retries})")
time.sleep(wait)
elif resp.status_code == 401:
raise Exception("Invalid or expired Bearer token — re-extract from browser session")
elif resp.status_code == 404:
return {} # Thing not found, return empty
else:
wait = 2 ** attempt * 5
print(f"HTTP {resp.status_code} on {path}, waiting {wait}s")
time.sleep(wait)
except httpx.TimeoutException:
wait = 2 ** attempt * 10
print(f"Timeout on {path}, waiting {wait}s")
time.sleep(wait)
except Exception as e:
print(f"Error on {path}: {e}")
if attempt == retries - 1:
return {}
raise Exception(f"Failed after {retries} retries for {url}")
Searching Things
Search returns up to 30 results per page. Paginate with the page parameter.
def search_things(
query: str,
page: int = 1,
per_page: int = 30,
sort: str = "relevant",
proxy: Optional[str] = None,
) -> list[dict]:
"""
Search for Things by keyword.
Sort options: relevant | newest | popular | makes | derivatives
Returns summary objects with id, name, creator, download_count, etc.
"""
results = get(
f"/search/{query}",
params={
"type": "things",
"page": page,
"per_page": per_page,
"sort": sort,
},
proxy=proxy,
)
# Response is a list directly for search
if isinstance(results, list):
return results
return results.get("hits", [])
def search_all(
query: str,
max_pages: int = 10,
sort: str = "popular",
proxy: Optional[str] = None,
) -> list[dict]:
"""Paginate through search results up to max_pages."""
results = []
for page in range(1, max_pages + 1):
batch = search_things(query, page=page, sort=sort, proxy=proxy)
if not batch:
print(f"No results on page {page}, stopping")
break
results.extend(batch)
print(f"Page {page}: {len(batch)} results (total: {len(results)})")
time.sleep(12) # Stay under 300 req/hour limit
return results
# Example: find the most popular gothic architecture models
results = search_all("gothic architecture", sort="popular", max_pages=5)
print(f"Found {len(results)} Things")
for r in results[:5]:
print(f" {r['name']}: {r.get('download_count', 0)} downloads")
Fetching Thing Details
def get_thing(thing_id: int, proxy: Optional[str] = None) -> dict:
"""
Fetch full metadata for a Thing.
Key response fields:
id, name, description, url, public_url
creator.name, creator.public_url
added (ISO timestamp), modified
is_published, is_wip
like_count, collect_count, comment_count
download_count, make_count, remix_count
default_image.url, preview_image
license, categories (list), tags (list)
ancestors (list of dicts — remix parents)
is_featured, is_nsfw
"""
return get(f"/things/{thing_id}", proxy=proxy)
def get_thing_files(thing_id: int, proxy: Optional[str] = None) -> list[dict]:
"""
Fetch file metadata for a Thing's downloadable assets.
Key response fields per file:
id, name, size (bytes)
download_url, direct_url
date (upload timestamp)
thumbnail, default_image
"""
result = get(f"/things/{thing_id}/files", proxy=proxy)
return result if isinstance(result, list) else []
def extract_thing_summary(thing: dict) -> dict:
"""Extract the most analytically useful fields from a Thing."""
creator = thing.get("creator", {}) or {}
return {
"id": thing.get("id"),
"name": thing.get("name"),
"creator": creator.get("name"),
"creator_url": creator.get("public_url"),
"added": thing.get("added"),
"modified": thing.get("modified"),
"license": thing.get("license"),
"download_count": thing.get("download_count", 0),
"like_count": thing.get("like_count", 0),
"make_count": thing.get("make_count", 0),
"remix_count": thing.get("remix_count", 0),
"collect_count": thing.get("collect_count", 0),
"comment_count": thing.get("comment_count", 0),
"tags": [t.get("name") for t in (thing.get("tags") or []) if isinstance(t, dict)],
"categories": [c.get("name") for c in (thing.get("categories") or []) if isinstance(c, dict)],
"ancestors": [a.get("id") for a in (thing.get("ancestors") or []) if isinstance(a, dict)],
"is_featured": thing.get("is_featured", False),
"url": thing.get("public_url"),
}
Fetching and Traversing the Remix Network
The remix graph is directional. Each Thing knows its parents (via ancestors in the detail response) and you can fetch its children via the /remixes endpoint.
def get_remixes(thing_id: int, proxy: Optional[str] = None) -> list[dict]:
"""
Fetch direct remixes (children) of a Thing.
Each item has the same summary shape as search results:
id, name, creator, download_count, like_count, etc.
"""
result = get(f"/things/{thing_id}/remixes", proxy=proxy)
return result if isinstance(result, list) else []
def build_remix_tree(
root_id: int,
depth: int = 2,
visited: set = None,
proxy: Optional[str] = None,
) -> dict:
"""
Recursively build a remix tree starting from root_id.
Returns a nested dict: {id, name, download_count, children: [...]}
depth controls how many levels deep to traverse.
visited prevents infinite loops in cyclic remix references.
"""
if visited is None:
visited = set()
if root_id in visited:
return {"id": root_id, "name": "ALREADY_VISITED", "children": []}
visited.add(root_id)
thing = get_thing(root_id, proxy=proxy)
time.sleep(12)
node = {
"id": root_id,
"name": thing.get("name"),
"creator": thing.get("creator", {}).get("name") if isinstance(thing.get("creator"), dict) else None,
"download_count": thing.get("download_count", 0),
"like_count": thing.get("like_count", 0),
"make_count": thing.get("make_count", 0),
"added": thing.get("added"),
"children": [],
}
if depth > 0:
remixes = get_remixes(root_id, proxy=proxy)
time.sleep(12)
for child in remixes:
child_id = child.get("id")
if child_id and child_id not in visited:
node["children"].append(
build_remix_tree(child_id, depth=depth - 1, visited=visited, proxy=proxy)
)
time.sleep(12)
return node
def flatten_remix_tree(tree: dict, parent_id: int = None) -> list[dict]:
"""Convert a nested remix tree into a flat list of edges for graph analysis."""
edges = []
node_id = tree.get("id")
if parent_id is not None and node_id:
edges.append({
"parent_id": parent_id,
"child_id": node_id,
"child_name": tree.get("name"),
"child_downloads": tree.get("download_count", 0),
})
for child in tree.get("children", []):
edges.extend(flatten_remix_tree(child, parent_id=node_id))
return edges
# Example: map the remix tree for the iconic Flexi-Rex (Thing #763622)
# This is one of the most-remixed 3D models ever — has hundreds of derivatives
tree = build_remix_tree(763622, depth=2)
edges = flatten_remix_tree(tree)
print(f"Remix tree has {len(edges)} edges")
print(json.dumps(tree, indent=2)[:1000])
Fetching Creator Profiles
def get_user(username: str, proxy: Optional[str] = None) -> dict:
"""
Fetch a creator's profile.
Key response fields:
id, name, first_name, last_name
public_url, thumbnail, cover_image
location, bio
follower_count, following_count
thing_count, make_count, like_count
skill_level (beginner/intermediate/advanced)
registered (ISO timestamp)
"""
return get(f"/users/{username}", proxy=proxy)
def get_user_things(
username: str,
page: int = 1,
per_page: int = 30,
proxy: Optional[str] = None,
) -> list[dict]:
"""Fetch Things uploaded by a specific user, paginated."""
result = get(
f"/users/{username}/things",
params={"page": page, "per_page": per_page},
proxy=proxy,
)
return result if isinstance(result, list) else []
def get_user_all_things(username: str, proxy: Optional[str] = None) -> list[dict]:
"""Fetch all Things by a user across all pages."""
all_things = []
page = 1
while True:
batch = get_user_things(username, page=page, proxy=proxy)
if not batch:
break
all_things.extend(batch)
print(f"User {username} - page {page}: {len(batch)} Things")
page += 1
time.sleep(12)
return all_things
def get_user_makes(username: str, proxy: Optional[str] = None) -> list[dict]:
"""Fetch all Makes (physical prints) by a user."""
result = get(f"/users/{username}/makes", proxy=proxy)
return result if isinstance(result, list) else []
def get_user_liked(username: str, proxy: Optional[str] = None) -> list[dict]:
"""Fetch Things that a user has liked (hearted)."""
result = get(f"/users/{username}/likes", proxy=proxy)
return result if isinstance(result, list) else []
def get_user_collected(username: str, proxy: Optional[str] = None) -> list[dict]:
"""Fetch collections created by a user."""
result = get(f"/users/{username}/collections", proxy=proxy)
return result if isinstance(result, list) else []
def creator_analytics(username: str, proxy: Optional[str] = None) -> dict:
"""
Build a comprehensive analytics profile for a creator.
Aggregates profile data with their Things' engagement metrics.
"""
profile = get_user(username, proxy=proxy)
time.sleep(12)
things = get_user_all_things(username, proxy=proxy)
# Aggregate engagement across all Things
total_downloads = sum(t.get("download_count", 0) for t in things)
total_likes = sum(t.get("like_count", 0) for t in things)
total_makes = sum(t.get("make_count", 0) for t in things)
total_remixes = sum(t.get("remix_count", 0) for t in things)
# Find most popular Things
top_things = sorted(things, key=lambda t: t.get("download_count", 0), reverse=True)[:5]
return {
"username": username,
"name": profile.get("name"),
"follower_count": profile.get("follower_count", 0),
"following_count": profile.get("following_count", 0),
"skill_level": profile.get("skill_level"),
"registered": profile.get("registered"),
"thing_count": profile.get("thing_count", len(things)),
"total_downloads": total_downloads,
"total_likes": total_likes,
"total_makes": total_makes,
"total_remixes": total_remixes,
"avg_downloads_per_thing": round(total_downloads / len(things), 1) if things else 0,
"top_things": [
{
"name": t.get("name"),
"downloads": t.get("download_count", 0),
"url": t.get("public_url"),
}
for t in top_things
],
}
Browsing Categories
Thingiverse organizes models into categories. You can browse by category to find things in a specific domain:
def get_categories(proxy: Optional[str] = None) -> list[dict]:
"""Fetch all top-level categories."""
result = get("/categories", proxy=proxy)
return result if isinstance(result, list) else []
def get_category_things(
category_name: str,
page: int = 1,
sort: str = "popular",
proxy: Optional[str] = None,
) -> list[dict]:
"""
Fetch Things in a specific category.
category_name: URL-encoded category name (e.g., "art", "gadgets", "household")
sort: popular | newest | makes | derivatives
"""
result = get(
f"/categories/{category_name}/things",
params={"page": page, "per_page": 30, "sort": sort},
proxy=proxy,
)
return result if isinstance(result, list) else []
# Example: Browse household category
household_things = []
for page in range(1, 4):
batch = get_category_things("household", page=page, sort="popular")
if not batch:
break
household_things.extend(batch)
time.sleep(12)
print(f"Found {len(household_things)} popular household Things")
Storing in SQLite
def init_db(db_path: str = "thingiverse.db") -> sqlite3.Connection:
"""Initialize the Thingiverse database with Things, creators, and remix graph."""
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS things (
id INTEGER PRIMARY KEY,
name TEXT,
creator TEXT,
creator_url TEXT,
added TEXT,
modified TEXT,
license TEXT,
download_count INTEGER DEFAULT 0,
like_count INTEGER DEFAULT 0,
make_count INTEGER DEFAULT 0,
remix_count INTEGER DEFAULT 0,
collect_count INTEGER DEFAULT 0,
comment_count INTEGER DEFAULT 0,
tags TEXT,
categories TEXT,
ancestors TEXT,
is_featured INTEGER DEFAULT 0,
url TEXT,
last_scraped TEXT,
raw_json TEXT
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS remix_edges (
parent_id INTEGER NOT NULL,
child_id INTEGER NOT NULL,
PRIMARY KEY (parent_id, child_id)
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS creators (
username TEXT PRIMARY KEY,
name TEXT,
follower_count INTEGER,
following_count INTEGER,
thing_count INTEGER,
make_count INTEGER,
skill_level TEXT,
registered TEXT,
location TEXT,
last_scraped TEXT
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS download_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
thing_id INTEGER NOT NULL,
download_count INTEGER NOT NULL,
like_count INTEGER,
make_count INTEGER,
recorded_at TEXT NOT NULL
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_things_creator ON things(creator)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_things_downloads ON things(download_count DESC)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_remix_parent ON remix_edges(parent_id)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_remix_child ON remix_edges(child_id)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_snapshots_thing ON download_snapshots(thing_id)")
conn.commit()
return conn
def store_thing(conn: sqlite3.Connection, data: dict):
"""Save a Thing to the database, recording download count history."""
now = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
# Check if download count changed (for trending detection)
prev = conn.execute(
"SELECT download_count FROM things WHERE id=?", (data.get("id"),)
).fetchone()
if prev is None or prev[0] != data.get("download_count", 0):
conn.execute("""
INSERT INTO download_snapshots (thing_id, download_count, like_count, make_count, recorded_at)
VALUES (?,?,?,?,?)
""", (
data.get("id"),
data.get("download_count", 0),
data.get("like_count", 0),
data.get("make_count", 0),
now,
))
conn.execute("""
INSERT OR REPLACE INTO things
(id, name, creator, creator_url, added, modified, license,
download_count, like_count, make_count, remix_count, collect_count, comment_count,
tags, categories, ancestors, is_featured, url, last_scraped, raw_json)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
""", (
data.get("id"),
data.get("name"),
data.get("creator", {}).get("name") if isinstance(data.get("creator"), dict) else data.get("creator"),
data.get("creator", {}).get("public_url") if isinstance(data.get("creator"), dict) else data.get("creator_url"),
data.get("added"),
data.get("modified"),
data.get("license"),
data.get("download_count", 0),
data.get("like_count", 0),
data.get("make_count", 0),
data.get("remix_count", 0),
data.get("collect_count", 0),
data.get("comment_count", 0),
json.dumps([t.get("name") for t in (data.get("tags") or []) if isinstance(t, dict)]),
json.dumps([c.get("name") for c in (data.get("categories") or []) if isinstance(c, dict)]),
json.dumps([a.get("id") for a in (data.get("ancestors") or []) if isinstance(a, dict)]),
1 if data.get("is_featured") else 0,
data.get("public_url"),
now,
json.dumps(data),
))
# Store remix edges from ancestors list
for ancestor in (data.get("ancestors") or []):
if isinstance(ancestor, dict) and ancestor.get("id"):
conn.execute(
"INSERT OR IGNORE INTO remix_edges (parent_id, child_id) VALUES (?,?)",
(ancestor["id"], data["id"]),
)
conn.commit()
def store_creator(conn: sqlite3.Connection, data: dict):
"""Save or update a creator profile."""
now = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
conn.execute("""
INSERT OR REPLACE INTO creators
(username, name, follower_count, following_count, thing_count, make_count,
skill_level, registered, location, last_scraped)
VALUES (?,?,?,?,?,?,?,?,?,?)
""", (
data.get("name"),
data.get("name"),
data.get("follower_count", 0),
data.get("following_count", 0),
data.get("thing_count", 0),
data.get("make_count", 0),
data.get("skill_level"),
data.get("registered"),
data.get("location"),
now,
))
conn.commit()
Analytics Queries
def find_influential_things(conn: sqlite3.Connection, min_remixes: int = 5) -> list[dict]:
"""Find Things that have been highly remixed — creative influencers."""
rows = conn.execute("""
SELECT t.id, t.name, t.creator, t.download_count,
t.remix_count, t.like_count,
COUNT(re.child_id) as measured_remixes
FROM things t
LEFT JOIN remix_edges re ON re.parent_id = t.id
WHERE t.remix_count >= ?
GROUP BY t.id
ORDER BY measured_remixes DESC
LIMIT 50
""", (min_remixes,)).fetchall()
return [
{
"id": r[0],
"name": r[1],
"creator": r[2],
"downloads": r[3],
"remix_count": r[4],
"likes": r[5],
"measured_remixes": r[6],
}
for r in rows
]
def trending_things(conn: sqlite3.Connection, days: int = 7) -> list[dict]:
"""
Find Things with the highest download velocity in the last N days.
Compares latest snapshot to snapshot from N days ago.
"""
from datetime import datetime, timedelta
cutoff = (datetime.utcnow() - timedelta(days=days)).strftime("%Y-%m-%dT%H:%M:%SZ")
rows = conn.execute("""
SELECT
t.id, t.name, t.creator,
s_now.download_count as downloads_now,
s_old.download_count as downloads_then,
(s_now.download_count - COALESCE(s_old.download_count, 0)) as gain
FROM things t
JOIN (
SELECT thing_id, download_count
FROM download_snapshots ds1
WHERE recorded_at = (SELECT MAX(recorded_at) FROM download_snapshots WHERE thing_id = ds1.thing_id)
) s_now ON s_now.thing_id = t.id
LEFT JOIN (
SELECT thing_id, download_count
FROM download_snapshots ds2
WHERE recorded_at <= ?
AND recorded_at = (SELECT MAX(recorded_at) FROM download_snapshots WHERE thing_id = ds2.thing_id AND recorded_at <= ?)
) s_old ON s_old.thing_id = t.id
WHERE gain > 0
ORDER BY gain DESC
LIMIT 20
""", (cutoff, cutoff)).fetchall()
return [
{
"id": r[0],
"name": r[1],
"creator": r[2],
"downloads_now": r[3],
"downloads_gain": r[5],
}
for r in rows
]
def category_distribution(conn: sqlite3.Connection) -> list[dict]:
"""Analyze download distribution by category."""
rows = conn.execute("""
SELECT categories, SUM(download_count) as total_downloads, COUNT(*) as thing_count
FROM things
WHERE categories != '[]' AND categories IS NOT NULL
GROUP BY categories
ORDER BY total_downloads DESC
LIMIT 30
""").fetchall()
results = []
for row in rows:
try:
cats = json.loads(row[0])
for cat in cats:
results.append({
"category": cat,
"total_downloads": row[1],
"thing_count": row[2],
"avg_downloads": round(row[1] / row[2], 1) if row[2] else 0,
})
except (json.JSONDecodeError, TypeError):
pass
return results
Putting It All Together
if __name__ == "__main__":
import random
import string
conn = init_db()
print("=== Phase 1: Search and collect Things ===")
queries = ["gothic architecture", "flexi", "cable management", "planters", "tools"]
for query in queries:
print(f"\nSearching: {query}")
results = search_all(query, max_pages=3, sort="popular")
for item in results:
time.sleep(12)
detail = get_thing(item.get("id") or item.get("thing_id") or item["id"])
if detail:
store_thing(conn, detail)
print(f" Stored: {detail.get('name')} (downloads: {detail.get('download_count', 0)})")
print("\n=== Phase 2: Build remix trees for top influencers ===")
influencers = find_influential_things(conn, min_remixes=3)
print(f"Found {len(influencers)} influential Things")
for thing in influencers[:3]:
print(f"\nBuilding remix tree for: {thing['name']} (ID: {thing['id']})")
tree = build_remix_tree(thing["id"], depth=2)
edges = flatten_remix_tree(tree)
print(f" {len(edges)} remix relationships found")
print("\n=== Phase 3: Analytics ===")
trending = trending_things(conn, days=7)
print(f"\nTop 5 trending Things this week:")
for t in trending[:5]:
print(f" {t['name']}: +{t['downloads_gain']} downloads")
cats = category_distribution(conn)
print(f"\nTop 5 categories by total downloads:")
seen = set()
for c in cats:
if c["category"] not in seen:
print(f" {c['category']}: {c['total_downloads']:,} total downloads")
seen.add(c["category"])
if len(seen) >= 5:
break
conn.close()
Scaling Considerations
A 12-second delay between requests keeps you under the ~300 req/hour threshold on a single token and IP. For larger crawls:
Multiple tokens — If you have access to more than one authenticated Thingiverse account, each token has an independent rate limit bucket. Distribute requests across tokens in round-robin.
Proxy rotation — Even with a single token, IP rotation removes the per-IP blocking risk. ThorData's residential proxy pool integrates cleanly with httpx:
import random
import string
def get_with_rotation(path: str, params: dict = None) -> dict:
"""Make API request with fresh residential IP per call."""
# Fresh IP per request — no session needed for Thingiverse API
session_id = "".join(random.choices(string.ascii_lowercase, k=6))
proxy = make_proxy(country="us", session_id=session_id)
return get(path, params=params, proxy=proxy)
Neither approach changes the per-token hourly quota, but combining them means you're never the bottleneck at the IP level — you can run parallel workers at higher throughput while keeping individual IP rates low.
Legal Considerations
Thingiverse's terms of service permit reasonable API usage for personal projects, research, and non-commercial tools. The models themselves are licensed individually — most use Creative Commons variants. Redistributing STL files depends on the individual license; metadata (names, counts, descriptions) is generally less restricted but check Thingiverse's current terms before building anything commercial.
The API is not guaranteed to be stable. MakerBot has not published a deprecation policy, and endpoint behavior has changed without notice in the past. Build your code defensively: log raw responses, handle missing fields with .get() defaults, and do not assume response shape is identical across all Thing types. The raw_json column in the SQLite schema preserves the original response for reprocessing if the schema evolves.