How to Scrape ESPN Sports Stats in 2026: Scores, Player Stats & Standings
How to Scrape ESPN Sports Stats in 2026: Scores, Player Stats & Standings
Sports data is a massive market — fantasy leagues, betting models, analytics dashboards, and media all depend on real-time and historical stats. ESPN is the most comprehensive free source, covering NFL, NBA, MLB, NHL, soccer, and dozens of other sports. The catch: there's no official public API. But ESPN has a well-documented hidden API that powers their website, and it's been stable for years.
This guide covers ESPN's hidden API for live data, sports-reference.com scraping for historical stats, SQLite storage, proxy integration, and building a complete sports data pipeline in Python.
What Data Can You Extract?
Between ESPN and sports-reference, you get:
- Live scores — real-time game scores, play-by-play, game status
- Player stats — season averages, game logs, career totals
- Team standings — division rankings, win/loss records, conference standings
- Schedules — upcoming and past games with dates, times, venues
- Box scores — detailed game-level stats for every player
- Injury reports — player injury status and expected return dates
- Power rankings — ESPN's editorial rankings by sport
- Historical records — sports-reference has data going back decades
Anti-Bot Measures
ESPN and sports-reference handle bot traffic differently:
- ESPN hidden API — No authentication required. Rate limits are generous (~60 requests/minute) but undocumented. They return 403 if you burst too fast, and extended abuse leads to IP blocks lasting hours.
- ESPN web pages — Cloudflare-protected with JavaScript challenges. Much harder to scrape than the API.
- Sports-Reference — Strict anti-scraping. They rate-limit to ~20 requests/minute per IP, show CAPTCHAs after ~100 requests, and have explicitly asked scrapers to use their free data exports instead.
- IP blocking — Both sites maintain IP blacklists. Datacenter IPs last minutes on sports-reference; ESPN's API is more lenient but will eventually block persistent offenders.
For large-scale collection, ThorData residential proxies work well. Sports sites flag datacenter IPs aggressively, and ThorData's residential pool provides clean IPs that don't carry reputation damage from other scrapers.
ESPN Hidden API Structure
ESPN's API follows consistent URL patterns. No API key required.
Base: https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/
Endpoints:
scoreboard - live scores, game status
standings - league/division standings
teams - team list and info
teams/{id}/roster - team roster
athletes/{id}/statistics - player stats
calendar - season schedule dates
news - sport-specific news feed
Sports/leagues:
football/nfl, football/college-football
basketball/nba, basketball/mens-college-basketball
baseball/mlb
hockey/nhl
soccer/eng.1 (EPL), soccer/usa.1 (MLS), soccer/esp.1 (La Liga)
tennis/atp
golf/pga
Fetching Live Scores
import requests
import time
import random
import json
import sqlite3
from datetime import datetime, timedelta
ESPN_BASE = "https://site.api.espn.com/apis/site/v2/sports"
HEADERS = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Accept": "application/json",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.espn.com/",
}
def espn_get(endpoint: str, params: dict = None, proxy_url: str = None) -> dict:
"""Make a request to the ESPN hidden API with error handling."""
proxies = {"https": proxy_url, "http": proxy_url} if proxy_url else None
for attempt in range(3):
try:
resp = requests.get(
endpoint,
headers=HEADERS,
params=params or {},
proxies=proxies,
timeout=15,
)
if resp.status_code == 429:
wait = 30 * (attempt + 1)
print(f"Rate limited, waiting {wait}s...")
time.sleep(wait)
continue
resp.raise_for_status()
return resp.json()
except requests.RequestException as e:
if attempt == 2:
raise
time.sleep(5 * (attempt + 1))
return {}
def get_scores(sport: str, league: str, dates: str = None,
proxy_url: str = None) -> list[dict]:
"""Fetch game scores from ESPN.
sport/league examples:
'football/nfl', 'basketball/nba', 'baseball/mlb',
'hockey/nhl', 'soccer/eng.1', 'football/college-football'
dates: 'YYYYMMDD' for specific date, None for today
"""
url = f"{ESPN_BASE}/{sport}/{league}/scoreboard"
params = {}
if dates:
params["dates"] = dates
data = espn_get(url, params, proxy_url)
games = []
for event in data.get("events", []):
competition = event["competitions"][0]
teams = competition.get("competitors", [])
home = next((t for t in teams if t.get("homeAway") == "home"), {})
away = next((t for t in teams if t.get("homeAway") == "away"), {})
# Game status details
status = event.get("status", {})
status_type = status.get("type", {})
# Odds if available
odds = competition.get("odds", [{}])
spread = odds[0].get("details", "") if odds else ""
games.append({
"game_id": event["id"],
"sport": sport,
"league": league,
"date": event.get("date"),
"name": event.get("name", ""),
"short_name": event.get("shortName", ""),
"status_state": status_type.get("state"), # pre, in, post
"status_detail": status_type.get("description"),
"period": status.get("period", 0),
"clock": status.get("displayClock", ""),
"home_team": home.get("team", {}).get("displayName"),
"home_team_id": home.get("team", {}).get("id"),
"home_abbrev": home.get("team", {}).get("abbreviation"),
"home_score": int(home.get("score", 0) or 0),
"home_record": home.get("records", [{}])[0].get("summary", "") if home.get("records") else "",
"away_team": away.get("team", {}).get("displayName"),
"away_team_id": away.get("team", {}).get("id"),
"away_abbrev": away.get("team", {}).get("abbreviation"),
"away_score": int(away.get("score", 0) or 0),
"away_record": away.get("records", [{}])[0].get("summary", "") if away.get("records") else "",
"venue": competition.get("venue", {}).get("fullName"),
"venue_city": competition.get("venue", {}).get("address", {}).get("city"),
"broadcast": competition.get("broadcasts", [{}])[0].get("names", [""])[0] if competition.get("broadcasts") else "",
"spread": spread,
"neutral_site": competition.get("neutralSite", False),
})
return games
Team Standings
def get_standings(sport: str, league: str, season: int = 2026,
proxy_url: str = None) -> list[dict]:
"""Fetch league standings from ESPN."""
url = f"{ESPN_BASE}/{sport}/{league}/standings"
params = {"season": season}
data = espn_get(url, params, proxy_url)
standings = []
for group in data.get("children", []):
group_name = group.get("name", "")
group_abbrev = group.get("abbreviation", "")
for entry in group.get("standings", {}).get("entries", []):
team = entry.get("team", {})
stats = {s["name"]: s["value"] for s in entry.get("stats", [])}
standings.append({
"sport": sport,
"league": league,
"season": season,
"group": group_name,
"group_abbrev": group_abbrev,
"team": team.get("displayName"),
"team_id": team.get("id"),
"abbreviation": team.get("abbreviation"),
"logo": team.get("logos", [{}])[0].get("href", "") if team.get("logos") else "",
"wins": int(stats.get("wins", 0)),
"losses": int(stats.get("losses", 0)),
"ties": int(stats.get("ties", 0)),
"win_pct": float(stats.get("winPercent", 0)),
"games_back": stats.get("gamesBehind", "-"),
"streak": stats.get("streak", ""),
"home_record": stats.get("Home", ""),
"away_record": stats.get("Away", ""),
"last_10": stats.get("Last 10 Games", ""),
"points_for": float(stats.get("avgPointsFor", 0)),
"points_against": float(stats.get("avgPointsAgainst", 0)),
"point_differential": float(stats.get("pointDifferential", 0)),
})
return standings
Player Stats and Rosters
def get_team_roster(sport: str, league: str, team_id: int,
proxy_url: str = None) -> list[dict]:
"""Fetch team roster with basic player info."""
url = f"{ESPN_BASE}/{sport}/{league}/teams/{team_id}/roster"
data = espn_get(url, proxy_url=proxy_url)
players = []
for group in data.get("athletes", []):
position_group = group.get("position", "")
for athlete in group.get("items", []):
players.append({
"id": athlete.get("id"),
"name": athlete.get("fullName"),
"first_name": athlete.get("firstName"),
"last_name": athlete.get("lastName"),
"position": athlete.get("position", {}).get("abbreviation"),
"position_group": position_group,
"jersey": athlete.get("jersey"),
"age": athlete.get("age"),
"height": athlete.get("displayHeight"),
"weight": athlete.get("displayWeight"),
"college": athlete.get("college", {}).get("name"),
"experience": athlete.get("experience", {}).get("years"),
"headshot": athlete.get("headshot", {}).get("href"),
"birthplace": athlete.get("birthPlace", {}).get("city"),
"nationality": athlete.get("birthPlace", {}).get("country"),
})
return players
def get_all_teams(sport: str, league: str, proxy_url: str = None) -> list[dict]:
"""Get all teams in a league."""
url = f"{ESPN_BASE}/{sport}/{league}/teams"
data = espn_get(url, proxy_url=proxy_url)
teams = []
for sport_data in data.get("sports", []):
for league_data in sport_data.get("leagues", []):
for team_data in league_data.get("teams", []):
team = team_data.get("team", {})
teams.append({
"id": team.get("id"),
"name": team.get("displayName"),
"short_name": team.get("shortDisplayName"),
"abbreviation": team.get("abbreviation"),
"nickname": team.get("name"),
"city": team.get("location"),
"color": team.get("color"),
"alt_color": team.get("alternateColor"),
"logo": team.get("logos", [{}])[0].get("href", "") if team.get("logos") else "",
})
return teams
def get_player_stats(sport: str, league: str, player_id: int,
season: int = None, proxy_url: str = None) -> dict:
"""Fetch player season statistics from ESPN."""
url = f"{ESPN_BASE}/{sport}/{league}/athletes/{player_id}/statistics"
params = {}
if season:
params["season"] = season
return espn_get(url, params, proxy_url)
Game Box Scores
def get_box_score(sport: str, league: str, game_id: str,
proxy_url: str = None) -> dict:
"""Fetch detailed box score for a game."""
url = f"https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/summary"
params = {"event": game_id}
data = espn_get(url, params, proxy_url)
box = {
"game_id": game_id,
"status": data.get("header", {}).get("competitions", [{}])[0].get("status", {}),
"teams": [],
"leaders": [],
}
# Box score by team
for team_box in data.get("boxscore", {}).get("teams", []):
team_stats = {
"team": team_box.get("team", {}).get("displayName"),
"home_away": team_box.get("homeAway"),
"stats": {}
}
for stat_group in team_box.get("statistics", []):
label = stat_group.get("label", "")
values = {}
for athlete in stat_group.get("athletes", []):
player_name = athlete.get("athlete", {}).get("displayName", "")
stats = {}
for i, key in enumerate(stat_group.get("labels", [])):
stats_list = athlete.get("stats", [])
if i < len(stats_list):
stats[key] = stats_list[i]
values[player_name] = stats
team_stats["stats"][label] = values
box["teams"].append(team_stats)
# Statistical leaders
for leader_group in data.get("leaders", []):
category = leader_group.get("displayName", "")
leaders = []
for leader in leader_group.get("leaders", [])[:3]:
athlete = leader.get("athlete", {})
leaders.append({
"name": athlete.get("displayName"),
"team": athlete.get("team", {}).get("abbreviation"),
"value": leader.get("displayValue"),
})
box["leaders"].append({"category": category, "leaders": leaders})
return box
def get_play_by_play(sport: str, league: str, game_id: str,
proxy_url: str = None) -> list[dict]:
"""Fetch play-by-play data for a game."""
url = f"https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/playbyplay"
params = {"event": game_id}
data = espn_get(url, params, proxy_url)
plays = []
for period in data.get("plays", {}).get("items", []):
plays.append({
"period": period.get("period", {}).get("number"),
"clock": period.get("clock", {}).get("displayValue"),
"team": period.get("team", {}).get("abbreviation"),
"text": period.get("text"),
"score_home": period.get("homeScore"),
"score_away": period.get("awayScore"),
"scoring_play": period.get("scoringPlay", False),
})
return plays
Scraping Sports-Reference for Historical Data
Sports-Reference (basketball-reference.com, pro-football-reference.com, etc.) has the deepest historical stats. Their pages are server-rendered HTML — no JavaScript required.
from bs4 import BeautifulSoup
import pandas as pd
def scrape_player_gamelog(player_slug: str, season: int,
sport: str = "basketball",
proxy_url: str = None) -> pd.DataFrame:
"""Scrape a player's game log from sports-reference.
player_slug: e.g., 'jamesle01' (LeBron James)
sport: 'basketball', 'baseball', 'football', 'hockey'
"""
domain_map = {
"basketball": "basketball-reference.com",
"baseball": "baseball-reference.com",
"football": "pro-football-reference.com",
"hockey": "hockey-reference.com",
}
domain = domain_map.get(sport, "basketball-reference.com")
if sport == "basketball":
url = f"https://www.{domain}/players/{player_slug[0]}/{player_slug}/gamelog/{season}"
table_id = "pgl_basic"
elif sport == "baseball":
url = f"https://www.{domain}/players/{player_slug[0]}/{player_slug}/batting_gamelogs/{season}"
table_id = "batting_gamelogs"
else:
url = f"https://www.{domain}/players/{player_slug[0]}/{player_slug}/gamelog/{season}"
table_id = "stats"
user_agents = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
]
headers_sr = {"User-Agent": random.choice(user_agents)}
proxies = {"https": proxy_url, "http": proxy_url} if proxy_url else None
resp = requests.get(url, headers=headers_sr, proxies=proxies, timeout=20)
soup = BeautifulSoup(resp.text, "html.parser")
table = soup.find("table", id=table_id)
if not table:
return pd.DataFrame()
rows = []
for tr in table.find("tbody").find_all("tr"):
if tr.get("class") and "thead" in tr.get("class", []):
continue
cells = tr.find_all(["td", "th"])
if len(cells) < 5:
continue
row = {
cell.get("data-stat", f"col_{i}"): cell.get_text(strip=True)
for i, cell in enumerate(cells)
}
if row.get("date_game") or row.get("game_date"):
rows.append(row)
return pd.DataFrame(rows)
def scrape_season_stats_table(url: str, table_id: str,
proxy_url: str = None) -> pd.DataFrame:
"""Generic function to scrape any stats table from sports-reference."""
proxies = {"https": proxy_url, "http": proxy_url} if proxy_url else None
headers_sr = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
}
resp = requests.get(url, headers=headers_sr, proxies=proxies, timeout=20)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
table = soup.find("table", id=table_id)
if not table:
return pd.DataFrame()
# Use pandas directly for clean table parsing
try:
dfs = pd.read_html(str(table))
if dfs:
df = dfs[0]
# Remove multi-header duplicates
df = df[df[df.columns[0]] != df.columns[0]]
return df
except Exception:
pass
return pd.DataFrame()
Collecting Full Season Data
def collect_league_scores(sport: str, league: str,
start: str, end: str,
proxy_url: str = None) -> list[dict]:
"""Collect all game scores for a date range.
start/end: 'YYYYMMDD'
"""
all_games = []
current = datetime.strptime(start, "%Y%m%d")
end_dt = datetime.strptime(end, "%Y%m%d")
total_days = (end_dt - current).days
day_count = 0
while current <= end_dt:
date_str = current.strftime("%Y%m%d")
try:
games = get_scores(sport, league, dates=date_str, proxy_url=proxy_url)
all_games.extend(games)
if games:
print(f"{date_str}: {len(games)} games")
except Exception as e:
print(f"{date_str}: error — {e}")
current += timedelta(days=1)
day_count += 1
# Progressive delay — be more careful later in the crawl
base_delay = 1.0 + (day_count / total_days) * 1.0
time.sleep(random.uniform(base_delay, base_delay * 1.5))
return all_games
def collect_all_rosters(sport: str, league: str,
proxy_url: str = None) -> dict[str, list]:
"""Fetch rosters for all teams in a league."""
teams = get_all_teams(sport, league, proxy_url)
all_rosters = {}
for team in teams:
team_id = team["id"]
team_name = team["name"]
print(f" Roster: {team_name}")
try:
roster = get_team_roster(sport, league, team_id, proxy_url)
all_rosters[team_name] = roster
except Exception as e:
print(f" Error fetching {team_name} roster: {e}")
all_rosters[team_name] = []
time.sleep(random.uniform(1.5, 3.0))
return all_rosters
SQLite Storage
def init_sports_db(db_path: str = "sports_data.db") -> sqlite3.Connection:
"""Initialize SQLite database for sports data."""
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("""
CREATE TABLE IF NOT EXISTS games (
game_id TEXT,
sport TEXT,
league TEXT,
game_date TEXT,
home_team TEXT,
home_team_id TEXT,
home_score INTEGER,
home_record TEXT,
away_team TEXT,
away_team_id TEXT,
away_score INTEGER,
away_record TEXT,
status_state TEXT,
status_detail TEXT,
period INTEGER,
venue TEXT,
venue_city TEXT,
broadcast TEXT,
spread TEXT,
neutral_site INTEGER DEFAULT 0,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (game_id, sport, league)
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS standings_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
sport TEXT,
league TEXT,
season INTEGER,
group_name TEXT,
team TEXT,
team_id TEXT,
wins INTEGER,
losses INTEGER,
ties INTEGER,
win_pct REAL,
games_back TEXT,
streak TEXT,
points_for REAL,
points_against REAL,
point_differential REAL,
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS players (
player_id TEXT PRIMARY KEY,
name TEXT,
position TEXT,
team_id TEXT,
jersey TEXT,
age INTEGER,
height TEXT,
weight TEXT,
college TEXT,
experience INTEGER,
headshot TEXT,
sport TEXT,
league TEXT,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS player_gamelogs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
player_slug TEXT,
sport TEXT,
season INTEGER,
game_date TEXT,
opponent TEXT,
home_away TEXT,
result TEXT,
data TEXT, -- JSON of all stats
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(player_slug, sport, season, game_date)
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_games_league ON games(sport, league)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_games_date ON games(game_date)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_standings_league ON standings_snapshots(sport, league, season)")
conn.commit()
return conn
def save_games_batch(conn: sqlite3.Connection, games: list[dict]) -> int:
"""Bulk insert games, skip duplicates."""
saved = 0
for game in games:
try:
conn.execute(
"""INSERT OR IGNORE INTO games
(game_id, sport, league, game_date, home_team, home_team_id,
home_score, home_record, away_team, away_team_id, away_score,
away_record, status_state, status_detail, period, venue,
venue_city, broadcast, spread, neutral_site)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
game["game_id"], game["sport"], game["league"],
game.get("date"), game.get("home_team"), game.get("home_team_id"),
game.get("home_score"), game.get("home_record"),
game.get("away_team"), game.get("away_team_id"),
game.get("away_score"), game.get("away_record"),
game.get("status_state"), game.get("status_detail"),
game.get("period"), game.get("venue"), game.get("venue_city"),
game.get("broadcast"), game.get("spread"),
1 if game.get("neutral_site") else 0,
)
)
saved += 1
except sqlite3.Error:
continue
conn.commit()
return saved
def save_standings(conn: sqlite3.Connection, standings: list[dict]) -> None:
"""Save a standings snapshot."""
for s in standings:
conn.execute(
"""INSERT INTO standings_snapshots
(sport, league, season, group_name, team, team_id, wins, losses, ties,
win_pct, games_back, streak, points_for, points_against, point_differential)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
s["sport"], s["league"], s["season"], s["group"], s["team"],
s["team_id"], s["wins"], s["losses"], s["ties"],
s["win_pct"], s["games_back"], s["streak"],
s["points_for"], s["points_against"], s["point_differential"],
)
)
conn.commit()
Analytics: Building Insights from Scraped Data
def head_to_head_record(db_path: str, team1: str, team2: str,
sport: str = None, seasons: int = 3) -> dict:
"""Calculate head-to-head record between two teams."""
conn = sqlite3.connect(db_path)
query = """
SELECT
SUM(CASE WHEN (home_team LIKE ? AND home_score > away_score)
OR (away_team LIKE ? AND away_score > home_score)
THEN 1 ELSE 0 END) as team1_wins,
SUM(CASE WHEN (home_team LIKE ? AND home_score < away_score)
OR (away_team LIKE ? AND away_score < home_score)
THEN 1 ELSE 0 END) as team1_losses,
COUNT(*) as total_games
FROM games
WHERE ((home_team LIKE ? AND away_team LIKE ?)
OR (home_team LIKE ? AND away_team LIKE ?))
AND status_state = 'post'
"""
args = [f"%{team1}%"] * 4 + [f"%{team1}%", f"%{team2}%", f"%{team2}%", f"%{team1}%"]
if sport:
query += " AND sport = ?"
args.append(sport)
row = conn.execute(query, args).fetchone()
conn.close()
return {
"team1": team1,
"team2": team2,
"team1_wins": row[0] or 0,
"team1_losses": row[1] or 0,
"total_games": row[2] or 0,
}
def score_distribution(db_path: str, team: str, home_away: str = "all") -> dict:
"""Get average and distribution of scores for a team."""
conn = sqlite3.connect(db_path)
if home_away == "home":
query = "SELECT home_score, away_score FROM games WHERE home_team LIKE ? AND status_state = 'post'"
rows = conn.execute(query, (f"%{team}%",)).fetchall()
scores = [(r[0], r[1]) for r in rows]
elif home_away == "away":
query = "SELECT away_score, home_score FROM games WHERE away_team LIKE ? AND status_state = 'post'"
rows = conn.execute(query, (f"%{team}%",)).fetchall()
scores = [(r[0], r[1]) for r in rows]
else:
home_rows = conn.execute(
"SELECT home_score, away_score FROM games WHERE home_team LIKE ? AND status_state = 'post'",
(f"%{team}%",)
).fetchall()
away_rows = conn.execute(
"SELECT away_score, home_score FROM games WHERE away_team LIKE ? AND status_state = 'post'",
(f"%{team}%",)
).fetchall()
scores = [(r[0], r[1]) for r in home_rows] + [(r[0], r[1]) for r in away_rows]
conn.close()
if not scores:
return {}
points_for = [s[0] for s in scores if s[0] is not None]
points_against = [s[1] for s in scores if s[1] is not None]
return {
"team": team,
"games": len(scores),
"avg_points_for": sum(points_for) / len(points_for) if points_for else 0,
"avg_points_against": sum(points_against) / len(points_against) if points_against else 0,
"wins": sum(1 for s in scores if s[0] > s[1]),
"losses": sum(1 for s in scores if s[0] < s[1]),
}
Legal Considerations
ESPN's hidden API is undocumented and technically not intended for public use, but it's been available for over a decade and ESPN hasn't enforced restrictions aggressively. Sports-Reference explicitly asks that you limit scraping to 20 requests per minute and offers bulk data downloads for research at sports-reference.com/friv/datasplay.fcgi.
For commercial projects, consider the official ESPN API partnership program or licensed providers like SportsData.io, Sportradar, or Stats Perform. Fantasy and analytical tools that use publicly displayed stats have a strong legal precedent established in cases like CBC Distribution v. Major League Baseball.
Key Takeaways
- ESPN's hidden API (
site.api.espn.com) is the easiest path to live scores, standings, and rosters — no key needed, consistent URL structure, generous rate limits. - Sports-Reference has unmatched historical depth but enforces strict rate limits. Use their CSV exports for bulk downloads when available — it's faster and kinder to their servers.
- For season-long data collection or multi-sport scraping, residential proxies prevent the IP blocks both sites enforce. ThorData's residential proxy pool keeps your scraping under the radar with clean, rotating IPs.
- Add 1-2 second delays between ESPN API calls and 3+ seconds for sports-reference pages.
- Store game data by game ID and date — you'll want to backfill and deduplicate as seasons progress.
- The box score and play-by-play endpoints are the richest data sources — they contain everything from shooting percentages to individual drive outcomes.