How to Scrape NexusMods Data in 2026: Game Mods, Downloads & Endorsements
How to Scrape NexusMods Data in 2026: Game Mods, Downloads & Endorsements
NexusMods is the dominant modding platform — over 500,000 mods across thousands of games, with some individual mods clocking tens of millions of downloads. If you're tracking modding trends, building a tool for mod authors, or doing research on game longevity and community engagement, this is where the data lives.
The site has an official REST API that covers most of what you need. For the things it doesn't cover — changelogs, full descriptions, image galleries — scraping the web pages is straightforward. This guide walks through both approaches, plus how to build useful analytics on top of the data.
What Data Is Available
Between the API and mod pages, you can collect:
- Mod metadata — name, summary, version, author, game, category, creation and update dates
- Download counts — total downloads, unique downloads (deduplicated by user)
- Endorsements — the site's upvote system; a proxy for quality and community approval
- File versions — individual file entries with sizes, version strings, and upload timestamps
- Changelogs — per-version change notes (not in the API, only on the page)
- Categories — the mod's classification within the game's category tree
- Tags — author-applied descriptive tags
- Images — screenshot URLs for the mod gallery
- Full description — rich text mod page body (API returns a truncated summary)
The API is the right starting point. Fall back to scraping only for what it misses.
NexusMods API Setup
NexusMods provides an official API at api.nexusmods.com. You need a free account to get an API key — go to Account Settings > API Keys after logging in. The key goes in every request as an apikey header.
Free tier: 100 requests per hour. Premium members get 2,500/hour. Plan accordingly.
pip install httpx beautifulsoup4
Core API Client
Rather than calling raw endpoints, wrap the API in a client class that handles rate limiting automatically:
import httpx
import time
import json
import sqlite3
from datetime import datetime, timezone
from dataclasses import dataclass, field
@dataclass
class ModData:
"""Structured representation of a NexusMods mod."""
mod_id: int = 0
game: str = ""
name: str = ""
summary: str = ""
author: str = ""
version: str = ""
category_id: int = 0
endorsements: int = 0
downloads: int = 0
unique_downloads: int = 0
created: int = 0
updated: int = 0
tags: list[str] = field(default_factory=list)
adult_content: bool = False
picture_url: str = ""
files: list[dict] = field(default_factory=list)
description: str = ""
changelog: list[dict] = field(default_factory=list)
class NexusModsClient:
"""NexusMods API client with automatic rate limit handling."""
BASE = "https://api.nexusmods.com"
def __init__(self, api_key: str, delay: float = 1.0):
self.client = httpx.Client(
headers={
"apikey": api_key,
"Accept": "application/json",
},
timeout=20,
)
self.delay = delay
self._remaining = 100
self._reset_at = 0
def _get(self, path: str) -> dict:
"""Make a rate-limited GET request to the API."""
# Check if we need to wait for rate limit reset
if self._remaining < 3:
wait = max(0, self._reset_at - time.time()) + 5
if wait > 0:
print(f"Rate limit nearly exhausted. Waiting {wait:.0f}s...")
time.sleep(wait)
resp = self.client.get(f"{self.BASE}{path}")
# Update rate limit tracking from response headers
self._remaining = int(
resp.headers.get("X-RL-Hourly-Remaining", 99)
)
reset_secs = int(
resp.headers.get("X-RL-Hourly-Reset", 3600)
)
self._reset_at = time.time() + reset_secs
resp.raise_for_status()
time.sleep(self.delay)
return resp.json()
@property
def requests_remaining(self) -> int:
return self._remaining
def get_game(self, domain: str) -> dict:
"""Fetch game info by domain name (e.g. 'skyrimspecialedition')."""
return self._get(f"/v1/games/{domain}.json")
def get_mod(self, domain: str, mod_id: int) -> dict:
"""Fetch detailed metadata for a single mod."""
return self._get(f"/v1/games/{domain}/mods/{mod_id}.json")
def get_mod_files(self, domain: str, mod_id: int) -> list[dict]:
"""List all downloadable files for a mod."""
data = self._get(f"/v1/games/{domain}/mods/{mod_id}/files.json")
return data.get("files", [])
def get_mod_changelogs(self, domain: str, mod_id: int) -> dict:
"""Fetch version changelogs (API v1 endpoint)."""
return self._get(
f"/v1/games/{domain}/mods/{mod_id}/changelogs.json"
)
def get_trending(self, domain: str) -> list[dict]:
"""Get currently trending mods for a game."""
return self._get(f"/v1/games/{domain}/mods/trending.json")
def get_latest_added(self, domain: str) -> list[dict]:
"""Get most recently uploaded mods."""
return self._get(f"/v1/games/{domain}/mods/latest_added.json")
def get_latest_updated(self, domain: str) -> list[dict]:
"""Get most recently updated mods."""
return self._get(f"/v1/games/{domain}/mods/latest_updated.json")
def get_updated_since(self, domain: str, unix_time: int) -> list[dict]:
"""Get mods updated since a Unix timestamp (last 28 days max)."""
return self._get(
f"/v1/games/{domain}/mods/updated.json?period=1m"
)
Fetching Complete Mod Data
Combine API endpoints to build a full mod record:
def fetch_complete_mod(self, domain: str, mod_id: int) -> ModData:
"""Fetch all available data for a single mod."""
mod = self.get_mod(domain, mod_id)
files = self.get_mod_files(domain, mod_id)
# Try to get changelogs (sometimes 404s for mods without them)
changelog = {}
try:
changelog = self.get_mod_changelogs(domain, mod_id)
except httpx.HTTPStatusError:
pass
changelog_list = []
for version, entries in changelog.items():
changelog_list.append({
"version": version,
"notes": entries if isinstance(entries, list) else [entries],
})
return ModData(
mod_id=mod["mod_id"],
game=domain,
name=mod["name"],
summary=mod.get("summary", ""),
author=mod.get("author", ""),
version=mod.get("version", ""),
category_id=mod.get("category_id", 0),
endorsements=mod.get("endorsement_count", 0),
downloads=mod.get("mod_downloads", 0),
unique_downloads=mod.get("mod_unique_downloads", 0),
created=mod.get("created_timestamp", 0),
updated=mod.get("updated_timestamp", 0),
tags=[t["name"] for t in mod.get("tags", [])],
adult_content=mod.get("contains_adult_content", False),
picture_url=mod.get("picture_url", ""),
files=[{
"file_id": f["file_id"],
"name": f.get("name", ""),
"version": f.get("version", ""),
"size_kb": f.get("size", 0),
"uploaded": f.get("uploaded_timestamp", 0),
"category": f.get("category_name", ""),
"primary": f.get("is_primary", False),
} for f in files],
changelog=changelog_list,
)
Web Scraping for Full Descriptions
The API returns only a truncated summary. For the full mod description (which often contains installation instructions, compatibility notes, and credit lists), you need to scrape the HTML page.
from bs4 import BeautifulSoup
def scrape_mod_page(domain: str, mod_id: int) -> dict:
"""Scrape mod page for data the API doesn't provide."""
url = f"https://www.nexusmods.com/{domain}/mods/{mod_id}"
headers = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
}
resp = httpx.get(url, headers=headers, timeout=20, follow_redirects=True)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
result = {}
# Full description tab
desc_el = soup.select_one("#tab-description")
if desc_el:
result["description"] = desc_el.get_text(
separator="\n", strip=True
)
# Also extract any requirement/compatibility notes
req_headers = desc_el.find_all(
["h2", "h3", "strong"],
string=lambda s: s and any(
k in s.lower()
for k in ("require", "compatib", "install", "depend")
),
)
requirements = []
for header in req_headers:
next_el = header.find_next_sibling(["ul", "ol", "p"])
if next_el:
requirements.append(next_el.get_text(strip=True))
if requirements:
result["requirements"] = requirements
# Gallery image URLs
images = [
img["src"]
for img in soup.select(".mod-images img[src]")
if "staticdelivery.nexusmods.com" in img.get("src", "")
]
result["images"] = images
# Mod stats sidebar (sometimes has data not in API)
stats_block = soup.select(".mod-sidebar .stat-number, .stat-value")
if stats_block:
result["page_stats"] = {
s.get("data-stat", f"stat_{i}"): s.get_text(strip=True)
for i, s in enumerate(stats_block)
}
return result
NexusMods sits behind Cloudflare, but the bot detection for read-only browsing at modest rates isn't aggressive. A 2-4 second delay between page requests with a realistic User-Agent works fine. If you start hitting JS challenges, you've ramped up too fast.
For heavy page crawling (thousands of mod pages across multiple games), rotating residential proxies help distribute the load. ThorData residential proxies work well here — their session controls let you look like organic browsing rather than a single client hammering the CDN. Don't bother with proxies for API calls though — those are keyed to your account regardless of IP.
SQLite Storage with History Tracking
Endorsement and download counts change over time. Store snapshots so you can track growth rates and detect trending mods early.
def init_db(path: str = "nexusmods.db") -> sqlite3.Connection:
"""Initialize database with mod data and history tables."""
conn = sqlite3.connect(path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS mods (
mod_id INTEGER,
game TEXT,
name TEXT,
author TEXT,
version TEXT,
category_id INTEGER,
endorsements INTEGER,
downloads INTEGER,
unique_downloads INTEGER,
tags TEXT,
description TEXT,
adult_content BOOLEAN,
created INTEGER,
updated INTEGER,
first_seen TEXT,
last_checked TEXT,
PRIMARY KEY (mod_id, game)
);
CREATE TABLE IF NOT EXISTS mod_files (
file_id INTEGER PRIMARY KEY,
mod_id INTEGER,
game TEXT,
name TEXT,
version TEXT,
size_kb INTEGER,
uploaded INTEGER,
is_primary BOOLEAN
);
CREATE TABLE IF NOT EXISTS mod_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
mod_id INTEGER NOT NULL,
game TEXT NOT NULL,
endorsements INTEGER,
downloads INTEGER,
unique_downloads INTEGER,
snapshot_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_snapshots_mod
ON mod_snapshots(mod_id, game);
CREATE INDEX IF NOT EXISTS idx_snapshots_date
ON mod_snapshots(snapshot_at);
""")
conn.commit()
return conn
def save_mod(conn: sqlite3.Connection, mod: ModData, page: dict = None):
"""Save mod data and record a snapshot of current stats."""
now = datetime.now(timezone.utc).isoformat()
conn.execute("""
INSERT INTO mods (mod_id, game, name, author, version,
category_id, endorsements, downloads,
unique_downloads, tags, description,
adult_content, created, updated,
first_seen, last_checked)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
ON CONFLICT(mod_id, game) DO UPDATE SET
name = excluded.name,
version = excluded.version,
endorsements = excluded.endorsements,
downloads = excluded.downloads,
unique_downloads = excluded.unique_downloads,
description = excluded.description,
updated = excluded.updated,
last_checked = excluded.last_checked
""", (
mod.mod_id, mod.game, mod.name, mod.author, mod.version,
mod.category_id, mod.endorsements, mod.downloads,
mod.unique_downloads, json.dumps(mod.tags),
page.get("description", mod.summary) if page else mod.summary,
mod.adult_content, mod.created, mod.updated,
now, now,
))
# Save files
for f in mod.files:
conn.execute("""
INSERT OR REPLACE INTO mod_files
VALUES (?,?,?,?,?,?,?,?)
""", (
f["file_id"], mod.mod_id, mod.game,
f["name"], f["version"], f["size_kb"],
f["uploaded"], int(f["primary"]),
))
# Record stats snapshot for trend analysis
conn.execute("""
INSERT INTO mod_snapshots
(mod_id, game, endorsements, downloads,
unique_downloads, snapshot_at)
VALUES (?, ?, ?, ?, ?, ?)
""", (
mod.mod_id, mod.game, mod.endorsements,
mod.downloads, mod.unique_downloads, now,
))
conn.commit()
Trend Analysis: Finding Rising Mods
The snapshots table lets you calculate growth rates and spot mods gaining momentum before they hit the trending page:
def find_rising_mods(
conn: sqlite3.Connection,
game: str,
days: int = 7,
min_downloads: int = 100,
) -> list[dict]:
"""Find mods with the highest download growth rate over N days.
Returns mods sorted by daily download velocity — useful for
discovering mods that are gaining traction but haven't hit
the official trending list yet.
"""
rows = conn.execute("""
WITH recent AS (
SELECT mod_id, game,
MAX(downloads) as latest_downloads,
MIN(downloads) as earliest_downloads,
MAX(endorsements) as latest_endorsements,
MIN(endorsements) as earliest_endorsements,
julianday(MAX(snapshot_at)) - julianday(MIN(snapshot_at)) as span_days
FROM mod_snapshots
WHERE game = ?
AND snapshot_at > datetime('now', ? || ' days')
GROUP BY mod_id, game
HAVING COUNT(*) >= 2
)
SELECT m.mod_id, m.name, m.author, m.endorsements, m.downloads,
r.latest_downloads - r.earliest_downloads as dl_growth,
r.latest_endorsements - r.earliest_endorsements as endorse_growth,
r.span_days,
CASE WHEN r.span_days > 0
THEN (r.latest_downloads - r.earliest_downloads) / r.span_days
ELSE 0 END as daily_velocity
FROM recent r
JOIN mods m ON m.mod_id = r.mod_id AND m.game = r.game
WHERE r.latest_downloads >= ?
AND r.latest_downloads > r.earliest_downloads
ORDER BY daily_velocity DESC
LIMIT 50
""", (game, f"-{days}", min_downloads)).fetchall()
return [{
"mod_id": r[0], "name": r[1], "author": r[2],
"endorsements": r[3], "downloads": r[4],
"download_growth": r[5], "endorsement_growth": r[6],
"span_days": round(r[7], 1),
"daily_velocity": round(r[8], 1),
} for r in rows]
def compare_games(
conn: sqlite3.Connection,
games: list[str],
) -> list[dict]:
"""Compare modding activity across multiple games.
Useful for understanding which game communities are most
active and where mod authors should focus effort.
"""
results = []
for game in games:
row = conn.execute("""
SELECT COUNT(*) as total_mods,
SUM(downloads) as total_downloads,
SUM(endorsements) as total_endorsements,
AVG(endorsements) as avg_endorsements,
COUNT(CASE WHEN updated > strftime('%s', 'now', '-30 days')
THEN 1 END) as active_30d
FROM mods WHERE game = ?
""", (game,)).fetchone()
results.append({
"game": game,
"total_mods": row[0],
"total_downloads": row[1] or 0,
"total_endorsements": row[2] or 0,
"avg_endorsements": round(row[3] or 0, 1),
"active_last_30d": row[4] or 0,
})
return sorted(results, key=lambda x: x["total_downloads"], reverse=True)
Running a Daily Collection Pipeline
Put it all together with a pipeline that tracks multiple games:
def daily_collection(
api_key: str,
games: list[str],
db_path: str = "nexusmods.db",
scrape_pages: bool = True,
):
"""Daily pipeline: fetch trending + updated mods, record snapshots.
Designed to run within the free API tier (100 req/hour).
Budget per game: ~15-20 API calls (trending + updated + details).
Safe for up to 4-5 games per run.
"""
client = NexusModsClient(api_key, delay=1.5)
conn = init_db(db_path)
for game in games:
print(f"\n=== {game} ===")
# Fetch trending and recently updated mod lists
trending = client.get_trending(game)
updated = client.get_latest_updated(game)
latest = client.get_latest_added(game)
# Combine and deduplicate by mod_id
all_mods = {}
for mod in trending + updated + latest:
all_mods[mod["mod_id"]] = mod
print(f" Found {len(all_mods)} unique mods to process")
print(f" API budget remaining: {client.requests_remaining}")
# Fetch detailed data for each mod
for i, (mod_id, _) in enumerate(all_mods.items()):
if client.requests_remaining < 10:
print(f" Stopping early — API budget low ({client.requests_remaining} left)")
break
try:
mod_data = client.fetch_complete_mod(game, mod_id)
# Optionally scrape the HTML page for full description
page_data = {}
if scrape_pages and not mod_data.summary:
page_data = scrape_mod_page(game, mod_id)
time.sleep(3) # extra delay for page scraping
save_mod(conn, mod_data, page_data)
print(
f" [{i+1}/{len(all_mods)}] {mod_data.name} "
f"— {mod_data.downloads:,} DL, "
f"{mod_data.endorsements:,} endorsements"
)
except Exception as e:
print(f" [{i+1}/{len(all_mods)}] Error on mod {mod_id}: {e}")
# Print trend analysis
print("\n=== Rising Mods (7-day velocity) ===")
for game in games:
rising = find_rising_mods(conn, game, days=7)
if rising:
print(f"\n{game}:")
for mod in rising[:5]:
print(
f" {mod['name']} — "
f"+{mod['download_growth']:,} downloads "
f"({mod['daily_velocity']:,.0f}/day)"
)
conn.close()
print(f"\nDone. API requests remaining: {client.requests_remaining}")
if __name__ == "__main__":
API_KEY = "YOUR_NEXUSMODS_API_KEY"
daily_collection(
api_key=API_KEY,
games=[
"skyrimspecialedition",
"baldursgate3",
"starfield",
"cyberpunk2077",
],
scrape_pages=True,
)
Legal and Ethical Notes
NexusMods' Terms of Service restrict automated access, but their API exists specifically to support tool developers and researchers. A few things to keep in mind:
- Don't redistribute mod files. Download URLs from the API are session-authenticated and mod authors retain their rights. Pulling metadata is fine; bulk-downloading and rehosting mods is not.
- Respect the endorsement system. The API has an endpoint to endorse mods on behalf of an authenticated user. Automating endorsements to inflate counts violates the ToS and undermines how mod authors are discovered.
- Use the API for bulk reads. Scraping the same data that the API returns is wasteful and more likely to get your IP flagged. Use page scraping only for what the API genuinely doesn't provide.
- Attribute authors. If you're building anything public-facing with mod data, show the mod author's name and link back to the original mod page.
- Stay within your tier. The 100 req/hour free limit is generous for daily monitoring. If you need more, NexusMods Premium is $4/month and bumps you to 2,500/hour — a fair trade if you're building a real tool on their data.
Practical Tips
Start with one game. Don't try to index all of NexusMods at once. Pick one game (Skyrim SE is great for testing — huge mod library, active community), get your pipeline working, then expand.
The changelog endpoint exists. The original API docs bury it, but GET /v1/games/{domain}/mods/{id}/changelogs.json returns structured version-by-version changelogs. No need to scrape the HTML page for these.
Track endorsement-to-download ratio. A mod with 100,000 downloads and 50 endorsements is being installed but not loved (possibly a required dependency). A mod with 1,000 downloads and 500 endorsements is genuinely valued by its users. This ratio is a better quality signal than raw counts.
Snapshot daily, analyze weekly. The daily velocity numbers get noisy day-to-day. Smooth over 7-day windows for more meaningful trend signals. A mod that gains 500 downloads consistently over a week is more interesting than one that spikes 2,000 in one day from a YouTube video and flatlines after.
Watch for game update cycles. Mod download spikes correlate strongly with game patches and DLC releases. When Bethesda ships a Starfield update, the mods that update fastest see the biggest download surges. Track game update dates alongside your mod data for richer analysis.