How to Scrape ESPN Sports Stats in 2026: Scores, Player Stats & Standings

2026-04-09 ["espn" "sports data" "web scraping" "python" "sports stats" "sqlite" "proxies" "analytics"]

How to Scrape ESPN Sports Stats in 2026: Scores, Player Stats & Standings

Sports data is a massive market — fantasy leagues, betting models, analytics dashboards, and media all depend on real-time and historical stats. ESPN is the most comprehensive free source, covering NFL, NBA, MLB, NHL, soccer, and dozens of other sports. The catch: there's no official public API. But ESPN has a well-documented hidden API that powers their website, and it's been stable for years.

This guide covers ESPN's hidden API for live data, sports-reference.com scraping for historical stats, SQLite storage, proxy integration, and building a complete sports data pipeline in Python.

What Data Can You Extract?

Between ESPN and sports-reference, you get:

Live scores — real-time game scores, play-by-play, game status
Player stats — season averages, game logs, career totals
Team standings — division rankings, win/loss records, conference standings
Schedules — upcoming and past games with dates, times, venues
Box scores — detailed game-level stats for every player
Injury reports — player injury status and expected return dates
Power rankings — ESPN's editorial rankings by sport
Historical records — sports-reference has data going back decades

Anti-Bot Measures

ESPN and sports-reference handle bot traffic differently:

ESPN hidden API — No authentication required. Rate limits are generous (~60 requests/minute) but undocumented. They return 403 if you burst too fast, and extended abuse leads to IP blocks lasting hours.
ESPN web pages — Cloudflare-protected with JavaScript challenges. Much harder to scrape than the API.
Sports-Reference — Strict anti-scraping. They rate-limit to ~20 requests/minute per IP, show CAPTCHAs after ~100 requests, and have explicitly asked scrapers to use their free data exports instead.
IP blocking — Both sites maintain IP blacklists. Datacenter IPs last minutes on sports-reference; ESPN's API is more lenient but will eventually block persistent offenders.

For large-scale collection, ThorData residential proxies work well. Sports sites flag datacenter IPs aggressively, and ThorData's residential pool provides clean IPs that don't carry reputation damage from other scrapers.

ESPN Hidden API Structure

ESPN's API follows consistent URL patterns. No API key required.

Base: https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/

Endpoints:
  scoreboard      - live scores, game status
  standings       - league/division standings
  teams           - team list and info
  teams/{id}/roster  - team roster
  athletes/{id}/statistics  - player stats
  calendar        - season schedule dates
  news            - sport-specific news feed

Sports/leagues:
  football/nfl, football/college-football
  basketball/nba, basketball/mens-college-basketball
  baseball/mlb
  hockey/nhl
  soccer/eng.1 (EPL), soccer/usa.1 (MLS), soccer/esp.1 (La Liga)
  tennis/atp
  golf/pga

Fetching Live Scores

import requests
import time
import random
import json
import sqlite3
from datetime import datetime, timedelta

ESPN_BASE = "https://site.api.espn.com/apis/site/v2/sports"

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept": "application/json",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.espn.com/",
}


def espn_get(endpoint: str, params: dict = None, proxy_url: str = None) -> dict:
    """Make a request to the ESPN hidden API with error handling."""
    proxies = {"https": proxy_url, "http": proxy_url} if proxy_url else None

    for attempt in range(3):
        try:
            resp = requests.get(
                endpoint,
                headers=HEADERS,
                params=params or {},
                proxies=proxies,
                timeout=15,
            )
            if resp.status_code == 429:
                wait = 30 * (attempt + 1)
                print(f"Rate limited, waiting {wait}s...")
                time.sleep(wait)
                continue
            resp.raise_for_status()
            return resp.json()
        except requests.RequestException as e:
            if attempt == 2:
                raise
            time.sleep(5 * (attempt + 1))

    return {}


def get_scores(sport: str, league: str, dates: str = None,
                proxy_url: str = None) -> list[dict]:
    """Fetch game scores from ESPN.

    sport/league examples:
      'football/nfl', 'basketball/nba', 'baseball/mlb',
      'hockey/nhl', 'soccer/eng.1', 'football/college-football'
    dates: 'YYYYMMDD' for specific date, None for today
    """
    url = f"{ESPN_BASE}/{sport}/{league}/scoreboard"
    params = {}
    if dates:
        params["dates"] = dates

    data = espn_get(url, params, proxy_url)
    games = []

    for event in data.get("events", []):
        competition = event["competitions"][0]
        teams = competition.get("competitors", [])

        home = next((t for t in teams if t.get("homeAway") == "home"), {})
        away = next((t for t in teams if t.get("homeAway") == "away"), {})

        # Game status details
        status = event.get("status", {})
        status_type = status.get("type", {})

        # Odds if available
        odds = competition.get("odds", [{}])
        spread = odds[0].get("details", "") if odds else ""

        games.append({
            "game_id": event["id"],
            "sport": sport,
            "league": league,
            "date": event.get("date"),
            "name": event.get("name", ""),
            "short_name": event.get("shortName", ""),
            "status_state": status_type.get("state"),   # pre, in, post
            "status_detail": status_type.get("description"),
            "period": status.get("period", 0),
            "clock": status.get("displayClock", ""),
            "home_team": home.get("team", {}).get("displayName"),
            "home_team_id": home.get("team", {}).get("id"),
            "home_abbrev": home.get("team", {}).get("abbreviation"),
            "home_score": int(home.get("score", 0) or 0),
            "home_record": home.get("records", [{}])[0].get("summary", "") if home.get("records") else "",
            "away_team": away.get("team", {}).get("displayName"),
            "away_team_id": away.get("team", {}).get("id"),
            "away_abbrev": away.get("team", {}).get("abbreviation"),
            "away_score": int(away.get("score", 0) or 0),
            "away_record": away.get("records", [{}])[0].get("summary", "") if away.get("records") else "",
            "venue": competition.get("venue", {}).get("fullName"),
            "venue_city": competition.get("venue", {}).get("address", {}).get("city"),
            "broadcast": competition.get("broadcasts", [{}])[0].get("names", [""])[0] if competition.get("broadcasts") else "",
            "spread": spread,
            "neutral_site": competition.get("neutralSite", False),
        })

    return games

Team Standings

def get_standings(sport: str, league: str, season: int = 2026,
                   proxy_url: str = None) -> list[dict]:
    """Fetch league standings from ESPN."""
    url = f"{ESPN_BASE}/{sport}/{league}/standings"
    params = {"season": season}
    data = espn_get(url, params, proxy_url)

    standings = []
    for group in data.get("children", []):
        group_name = group.get("name", "")
        group_abbrev = group.get("abbreviation", "")
        for entry in group.get("standings", {}).get("entries", []):
            team = entry.get("team", {})
            stats = {s["name"]: s["value"] for s in entry.get("stats", [])}

            standings.append({
                "sport": sport,
                "league": league,
                "season": season,
                "group": group_name,
                "group_abbrev": group_abbrev,
                "team": team.get("displayName"),
                "team_id": team.get("id"),
                "abbreviation": team.get("abbreviation"),
                "logo": team.get("logos", [{}])[0].get("href", "") if team.get("logos") else "",
                "wins": int(stats.get("wins", 0)),
                "losses": int(stats.get("losses", 0)),
                "ties": int(stats.get("ties", 0)),
                "win_pct": float(stats.get("winPercent", 0)),
                "games_back": stats.get("gamesBehind", "-"),
                "streak": stats.get("streak", ""),
                "home_record": stats.get("Home", ""),
                "away_record": stats.get("Away", ""),
                "last_10": stats.get("Last 10 Games", ""),
                "points_for": float(stats.get("avgPointsFor", 0)),
                "points_against": float(stats.get("avgPointsAgainst", 0)),
                "point_differential": float(stats.get("pointDifferential", 0)),
            })

    return standings

Player Stats and Rosters

def get_team_roster(sport: str, league: str, team_id: int,
                     proxy_url: str = None) -> list[dict]:
    """Fetch team roster with basic player info."""
    url = f"{ESPN_BASE}/{sport}/{league}/teams/{team_id}/roster"
    data = espn_get(url, proxy_url=proxy_url)

    players = []
    for group in data.get("athletes", []):
        position_group = group.get("position", "")
        for athlete in group.get("items", []):
            players.append({
                "id": athlete.get("id"),
                "name": athlete.get("fullName"),
                "first_name": athlete.get("firstName"),
                "last_name": athlete.get("lastName"),
                "position": athlete.get("position", {}).get("abbreviation"),
                "position_group": position_group,
                "jersey": athlete.get("jersey"),
                "age": athlete.get("age"),
                "height": athlete.get("displayHeight"),
                "weight": athlete.get("displayWeight"),
                "college": athlete.get("college", {}).get("name"),
                "experience": athlete.get("experience", {}).get("years"),
                "headshot": athlete.get("headshot", {}).get("href"),
                "birthplace": athlete.get("birthPlace", {}).get("city"),
                "nationality": athlete.get("birthPlace", {}).get("country"),
            })

    return players


def get_all_teams(sport: str, league: str, proxy_url: str = None) -> list[dict]:
    """Get all teams in a league."""
    url = f"{ESPN_BASE}/{sport}/{league}/teams"
    data = espn_get(url, proxy_url=proxy_url)

    teams = []
    for sport_data in data.get("sports", []):
        for league_data in sport_data.get("leagues", []):
            for team_data in league_data.get("teams", []):
                team = team_data.get("team", {})
                teams.append({
                    "id": team.get("id"),
                    "name": team.get("displayName"),
                    "short_name": team.get("shortDisplayName"),
                    "abbreviation": team.get("abbreviation"),
                    "nickname": team.get("name"),
                    "city": team.get("location"),
                    "color": team.get("color"),
                    "alt_color": team.get("alternateColor"),
                    "logo": team.get("logos", [{}])[0].get("href", "") if team.get("logos") else "",
                })

    return teams


def get_player_stats(sport: str, league: str, player_id: int,
                      season: int = None, proxy_url: str = None) -> dict:
    """Fetch player season statistics from ESPN."""
    url = f"{ESPN_BASE}/{sport}/{league}/athletes/{player_id}/statistics"
    params = {}
    if season:
        params["season"] = season
    return espn_get(url, params, proxy_url)

Game Box Scores

def get_box_score(sport: str, league: str, game_id: str,
                   proxy_url: str = None) -> dict:
    """Fetch detailed box score for a game."""
    url = f"https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/summary"
    params = {"event": game_id}
    data = espn_get(url, params, proxy_url)

    box = {
        "game_id": game_id,
        "status": data.get("header", {}).get("competitions", [{}])[0].get("status", {}),
        "teams": [],
        "leaders": [],
    }

    # Box score by team
    for team_box in data.get("boxscore", {}).get("teams", []):
        team_stats = {
            "team": team_box.get("team", {}).get("displayName"),
            "home_away": team_box.get("homeAway"),
            "stats": {}
        }
        for stat_group in team_box.get("statistics", []):
            label = stat_group.get("label", "")
            values = {}
            for athlete in stat_group.get("athletes", []):
                player_name = athlete.get("athlete", {}).get("displayName", "")
                stats = {}
                for i, key in enumerate(stat_group.get("labels", [])):
                    stats_list = athlete.get("stats", [])
                    if i < len(stats_list):
                        stats[key] = stats_list[i]
                values[player_name] = stats
            team_stats["stats"][label] = values
        box["teams"].append(team_stats)

    # Statistical leaders
    for leader_group in data.get("leaders", []):
        category = leader_group.get("displayName", "")
        leaders = []
        for leader in leader_group.get("leaders", [])[:3]:
            athlete = leader.get("athlete", {})
            leaders.append({
                "name": athlete.get("displayName"),
                "team": athlete.get("team", {}).get("abbreviation"),
                "value": leader.get("displayValue"),
            })
        box["leaders"].append({"category": category, "leaders": leaders})

    return box


def get_play_by_play(sport: str, league: str, game_id: str,
                      proxy_url: str = None) -> list[dict]:
    """Fetch play-by-play data for a game."""
    url = f"https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/playbyplay"
    params = {"event": game_id}
    data = espn_get(url, params, proxy_url)

    plays = []
    for period in data.get("plays", {}).get("items", []):
        plays.append({
            "period": period.get("period", {}).get("number"),
            "clock": period.get("clock", {}).get("displayValue"),
            "team": period.get("team", {}).get("abbreviation"),
            "text": period.get("text"),
            "score_home": period.get("homeScore"),
            "score_away": period.get("awayScore"),
            "scoring_play": period.get("scoringPlay", False),
        })

    return plays

Scraping Sports-Reference for Historical Data

Sports-Reference (basketball-reference.com, pro-football-reference.com, etc.) has the deepest historical stats. Their pages are server-rendered HTML — no JavaScript required.

from bs4 import BeautifulSoup
import pandas as pd


def scrape_player_gamelog(player_slug: str, season: int,
                           sport: str = "basketball",
                           proxy_url: str = None) -> pd.DataFrame:
    """Scrape a player's game log from sports-reference.

    player_slug: e.g., 'jamesle01' (LeBron James)
    sport: 'basketball', 'baseball', 'football', 'hockey'
    """
    domain_map = {
        "basketball": "basketball-reference.com",
        "baseball": "baseball-reference.com",
        "football": "pro-football-reference.com",
        "hockey": "hockey-reference.com",
    }
    domain = domain_map.get(sport, "basketball-reference.com")

    if sport == "basketball":
        url = f"https://www.{domain}/players/{player_slug[0]}/{player_slug}/gamelog/{season}"
        table_id = "pgl_basic"
    elif sport == "baseball":
        url = f"https://www.{domain}/players/{player_slug[0]}/{player_slug}/batting_gamelogs/{season}"
        table_id = "batting_gamelogs"
    else:
        url = f"https://www.{domain}/players/{player_slug[0]}/{player_slug}/gamelog/{season}"
        table_id = "stats"

    user_agents = [
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    ]
    headers_sr = {"User-Agent": random.choice(user_agents)}
    proxies = {"https": proxy_url, "http": proxy_url} if proxy_url else None

    resp = requests.get(url, headers=headers_sr, proxies=proxies, timeout=20)
    soup = BeautifulSoup(resp.text, "html.parser")
    table = soup.find("table", id=table_id)

    if not table:
        return pd.DataFrame()

    rows = []
    for tr in table.find("tbody").find_all("tr"):
        if tr.get("class") and "thead" in tr.get("class", []):
            continue

        cells = tr.find_all(["td", "th"])
        if len(cells) < 5:
            continue

        row = {
            cell.get("data-stat", f"col_{i}"): cell.get_text(strip=True)
            for i, cell in enumerate(cells)
        }

        if row.get("date_game") or row.get("game_date"):
            rows.append(row)

    return pd.DataFrame(rows)


def scrape_season_stats_table(url: str, table_id: str,
                               proxy_url: str = None) -> pd.DataFrame:
    """Generic function to scrape any stats table from sports-reference."""
    proxies = {"https": proxy_url, "http": proxy_url} if proxy_url else None
    headers_sr = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    }

    resp = requests.get(url, headers=headers_sr, proxies=proxies, timeout=20)
    resp.raise_for_status()

    soup = BeautifulSoup(resp.text, "html.parser")
    table = soup.find("table", id=table_id)

    if not table:
        return pd.DataFrame()

    # Use pandas directly for clean table parsing
    try:
        dfs = pd.read_html(str(table))
        if dfs:
            df = dfs[0]
            # Remove multi-header duplicates
            df = df[df[df.columns[0]] != df.columns[0]]
            return df
    except Exception:
        pass

    return pd.DataFrame()

Collecting Full Season Data

def collect_league_scores(sport: str, league: str,
                            start: str, end: str,
                            proxy_url: str = None) -> list[dict]:
    """Collect all game scores for a date range.

    start/end: 'YYYYMMDD'
    """
    all_games = []
    current = datetime.strptime(start, "%Y%m%d")
    end_dt = datetime.strptime(end, "%Y%m%d")
    total_days = (end_dt - current).days

    day_count = 0
    while current <= end_dt:
        date_str = current.strftime("%Y%m%d")
        try:
            games = get_scores(sport, league, dates=date_str, proxy_url=proxy_url)
            all_games.extend(games)
            if games:
                print(f"{date_str}: {len(games)} games")
        except Exception as e:
            print(f"{date_str}: error — {e}")

        current += timedelta(days=1)
        day_count += 1

        # Progressive delay — be more careful later in the crawl
        base_delay = 1.0 + (day_count / total_days) * 1.0
        time.sleep(random.uniform(base_delay, base_delay * 1.5))

    return all_games


def collect_all_rosters(sport: str, league: str,
                          proxy_url: str = None) -> dict[str, list]:
    """Fetch rosters for all teams in a league."""
    teams = get_all_teams(sport, league, proxy_url)
    all_rosters = {}

    for team in teams:
        team_id = team["id"]
        team_name = team["name"]
        print(f"  Roster: {team_name}")

        try:
            roster = get_team_roster(sport, league, team_id, proxy_url)
            all_rosters[team_name] = roster
        except Exception as e:
            print(f"  Error fetching {team_name} roster: {e}")
            all_rosters[team_name] = []

        time.sleep(random.uniform(1.5, 3.0))

    return all_rosters

SQLite Storage

def init_sports_db(db_path: str = "sports_data.db") -> sqlite3.Connection:
    """Initialize SQLite database for sports data."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")

    conn.execute("""
        CREATE TABLE IF NOT EXISTS games (
            game_id TEXT,
            sport TEXT,
            league TEXT,
            game_date TEXT,
            home_team TEXT,
            home_team_id TEXT,
            home_score INTEGER,
            home_record TEXT,
            away_team TEXT,
            away_team_id TEXT,
            away_score INTEGER,
            away_record TEXT,
            status_state TEXT,
            status_detail TEXT,
            period INTEGER,
            venue TEXT,
            venue_city TEXT,
            broadcast TEXT,
            spread TEXT,
            neutral_site INTEGER DEFAULT 0,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            PRIMARY KEY (game_id, sport, league)
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS standings_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            sport TEXT,
            league TEXT,
            season INTEGER,
            group_name TEXT,
            team TEXT,
            team_id TEXT,
            wins INTEGER,
            losses INTEGER,
            ties INTEGER,
            win_pct REAL,
            games_back TEXT,
            streak TEXT,
            points_for REAL,
            points_against REAL,
            point_differential REAL,
            recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS players (
            player_id TEXT PRIMARY KEY,
            name TEXT,
            position TEXT,
            team_id TEXT,
            jersey TEXT,
            age INTEGER,
            height TEXT,
            weight TEXT,
            college TEXT,
            experience INTEGER,
            headshot TEXT,
            sport TEXT,
            league TEXT,
            last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS player_gamelogs (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            player_slug TEXT,
            sport TEXT,
            season INTEGER,
            game_date TEXT,
            opponent TEXT,
            home_away TEXT,
            result TEXT,
            data TEXT,  -- JSON of all stats
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            UNIQUE(player_slug, sport, season, game_date)
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_games_league ON games(sport, league)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_games_date ON games(game_date)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_standings_league ON standings_snapshots(sport, league, season)")

    conn.commit()
    return conn


def save_games_batch(conn: sqlite3.Connection, games: list[dict]) -> int:
    """Bulk insert games, skip duplicates."""
    saved = 0
    for game in games:
        try:
            conn.execute(
                """INSERT OR IGNORE INTO games
                   (game_id, sport, league, game_date, home_team, home_team_id,
                    home_score, home_record, away_team, away_team_id, away_score,
                    away_record, status_state, status_detail, period, venue,
                    venue_city, broadcast, spread, neutral_site)
                   VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
                (
                    game["game_id"], game["sport"], game["league"],
                    game.get("date"), game.get("home_team"), game.get("home_team_id"),
                    game.get("home_score"), game.get("home_record"),
                    game.get("away_team"), game.get("away_team_id"),
                    game.get("away_score"), game.get("away_record"),
                    game.get("status_state"), game.get("status_detail"),
                    game.get("period"), game.get("venue"), game.get("venue_city"),
                    game.get("broadcast"), game.get("spread"),
                    1 if game.get("neutral_site") else 0,
                )
            )
            saved += 1
        except sqlite3.Error:
            continue
    conn.commit()
    return saved


def save_standings(conn: sqlite3.Connection, standings: list[dict]) -> None:
    """Save a standings snapshot."""
    for s in standings:
        conn.execute(
            """INSERT INTO standings_snapshots
               (sport, league, season, group_name, team, team_id, wins, losses, ties,
                win_pct, games_back, streak, points_for, points_against, point_differential)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (
                s["sport"], s["league"], s["season"], s["group"], s["team"],
                s["team_id"], s["wins"], s["losses"], s["ties"],
                s["win_pct"], s["games_back"], s["streak"],
                s["points_for"], s["points_against"], s["point_differential"],
            )
        )
    conn.commit()

Analytics: Building Insights from Scraped Data

def head_to_head_record(db_path: str, team1: str, team2: str,
                          sport: str = None, seasons: int = 3) -> dict:
    """Calculate head-to-head record between two teams."""
    conn = sqlite3.connect(db_path)
    query = """
        SELECT
            SUM(CASE WHEN (home_team LIKE ? AND home_score > away_score)
                       OR (away_team LIKE ? AND away_score > home_score)
                THEN 1 ELSE 0 END) as team1_wins,
            SUM(CASE WHEN (home_team LIKE ? AND home_score < away_score)
                       OR (away_team LIKE ? AND away_score < home_score)
                THEN 1 ELSE 0 END) as team1_losses,
            COUNT(*) as total_games
        FROM games
        WHERE ((home_team LIKE ? AND away_team LIKE ?)
               OR (home_team LIKE ? AND away_team LIKE ?))
          AND status_state = 'post'
    """
    args = [f"%{team1}%"] * 4 + [f"%{team1}%", f"%{team2}%", f"%{team2}%", f"%{team1}%"]
    if sport:
        query += " AND sport = ?"
        args.append(sport)

    row = conn.execute(query, args).fetchone()
    conn.close()

    return {
        "team1": team1,
        "team2": team2,
        "team1_wins": row[0] or 0,
        "team1_losses": row[1] or 0,
        "total_games": row[2] or 0,
    }


def score_distribution(db_path: str, team: str, home_away: str = "all") -> dict:
    """Get average and distribution of scores for a team."""
    conn = sqlite3.connect(db_path)

    if home_away == "home":
        query = "SELECT home_score, away_score FROM games WHERE home_team LIKE ? AND status_state = 'post'"
        rows = conn.execute(query, (f"%{team}%",)).fetchall()
        scores = [(r[0], r[1]) for r in rows]
    elif home_away == "away":
        query = "SELECT away_score, home_score FROM games WHERE away_team LIKE ? AND status_state = 'post'"
        rows = conn.execute(query, (f"%{team}%",)).fetchall()
        scores = [(r[0], r[1]) for r in rows]
    else:
        home_rows = conn.execute(
            "SELECT home_score, away_score FROM games WHERE home_team LIKE ? AND status_state = 'post'",
            (f"%{team}%",)
        ).fetchall()
        away_rows = conn.execute(
            "SELECT away_score, home_score FROM games WHERE away_team LIKE ? AND status_state = 'post'",
            (f"%{team}%",)
        ).fetchall()
        scores = [(r[0], r[1]) for r in home_rows] + [(r[0], r[1]) for r in away_rows]

    conn.close()

    if not scores:
        return {}

    points_for = [s[0] for s in scores if s[0] is not None]
    points_against = [s[1] for s in scores if s[1] is not None]

    return {
        "team": team,
        "games": len(scores),
        "avg_points_for": sum(points_for) / len(points_for) if points_for else 0,
        "avg_points_against": sum(points_against) / len(points_against) if points_against else 0,
        "wins": sum(1 for s in scores if s[0] > s[1]),
        "losses": sum(1 for s in scores if s[0] < s[1]),
    }

Legal Considerations

ESPN's hidden API is undocumented and technically not intended for public use, but it's been available for over a decade and ESPN hasn't enforced restrictions aggressively. Sports-Reference explicitly asks that you limit scraping to 20 requests per minute and offers bulk data downloads for research at sports-reference.com/friv/datasplay.fcgi.

For commercial projects, consider the official ESPN API partnership program or licensed providers like SportsData.io, Sportradar, or Stats Perform. Fantasy and analytical tools that use publicly displayed stats have a strong legal precedent established in cases like CBC Distribution v. Major League Baseball.

Key Takeaways

ESPN's hidden API (site.api.espn.com) is the easiest path to live scores, standings, and rosters — no key needed, consistent URL structure, generous rate limits.
Sports-Reference has unmatched historical depth but enforces strict rate limits. Use their CSV exports for bulk downloads when available — it's faster and kinder to their servers.
For season-long data collection or multi-sport scraping, residential proxies prevent the IP blocks both sites enforce. ThorData's residential proxy pool keeps your scraping under the radar with clean, rotating IPs.
Add 1-2 second delays between ESPN API calls and 3+ seconds for sports-reference pages.
Store game data by game ID and date — you'll want to backfill and deduplicate as seasons progress.
The box score and play-by-play endpoints are the richest data sources — they contain everything from shooting percentages to individual drive outcomes.