← Back to blog

Scrape SofaScore: Live Sports Scores, Player Ratings & Match Stats with Python (2026)

Scrape SofaScore: Live Sports Scores, Player Ratings & Match Stats with Python (2026)

SofaScore is one of the better sports data sites out there. Live scores across football, basketball, tennis, cricket, hockey — and their player rating system (the 0-10 scale) has become genuinely influential. Scouts use it. Fantasy players use it. Journalists cite it.

If you want programmatic access to that data, the good news is that SofaScore is entirely API-driven under the hood. Every piece of data you see — scores, lineups, ratings, stats — gets fetched from their internal JSON API via XHR requests in the browser. This means scraping HTML is useless. You want the API calls directly.

This guide covers how their API works, gives you working code to pull match stats, player ratings, and historical data, explains anti-detection, and shows how to store everything in a structured database.

How SofaScore Actually Delivers Data

SofaScore doesn't render match data server-side into HTML. Open DevTools in Chrome or Firefox, go to the Network tab, filter by Fetch/XHR, then load a match page. You'll see requests to api.sofascore.com. The response bodies are clean JSON.

The key insight: the HTML is just a shell. All meaningful data lives in the API. Scraping the HTML gives you almost nothing useful.

API Base URL and Key Endpoints

Base URL: https://api.sofascore.com/api/v1

Events (matches) by sport and date:

GET /sport/{sport}/scheduled-events/{date}

Where {sport} is football, basketball, tennis, cricket, ice-hockey, volleyball, handball, rugby, etc. and {date} is YYYY-MM-DD.

Live events (currently in progress):

GET /sport/{sport}/events/live

Match statistics:

GET /event/{id}/statistics

Lineups and player ratings:

GET /event/{id}/lineups

Match incidents (goals, cards, substitutions):

GET /event/{id}/incidents

Head-to-head history:

GET /event/{id}/h2h

Tournament standings:

GET /unique-tournament/{tournament_id}/season/{season_id}/standings/total

Player statistics for a tournament:

GET /unique-tournament/{tournament_id}/season/{season_id}/top-players/overall

The {id} is SofaScore's internal event ID. You get it from the scheduled-events endpoint or from the URL when you visit a match page (sofascore.com/football/[teams]/[event-id]).

Required Headers

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept": "application/json",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer": "https://www.sofascore.com/",
    "Origin": "https://www.sofascore.com",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-site",
}

Setting a real Referer and Origin matters — SofaScore validates that API requests appear to come from their web frontend.

Core Data Fetching Functions

import requests
import json
import time
import random
from typing import Optional

BASE_URL = "https://api.sofascore.com/api/v1"


def api_get(
    path: str,
    params: dict | None = None,
    proxy_url: str | None = None,
    max_retries: int = 5,
) -> Optional[dict]:
    """Make a single API call with retry logic."""
    proxies = {"http": proxy_url, "https": proxy_url} if proxy_url else None

    for attempt in range(max_retries):
        try:
            resp = requests.get(
                f"{BASE_URL}{path}",
                headers=HEADERS,
                params=params,
                proxies=proxies,
                timeout=15,
            )
            if resp.status_code == 429:
                wait = (2 ** attempt) + random.uniform(0, 2)
                print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})...")
                time.sleep(wait)
                continue
            if resp.status_code == 404:
                return None
            resp.raise_for_status()
            return resp.json()
        except requests.HTTPError as e:
            if attempt == max_retries - 1:
                print(f"HTTP error after {max_retries} attempts: {e}")
                return None
            time.sleep(2 ** attempt)
        except requests.RequestException as e:
            if attempt == max_retries - 1:
                print(f"Request failed: {e}")
                return None
            time.sleep(2)
    return None


def get_scheduled_events(sport: str, date: str) -> list[dict]:
    """
    Get all scheduled events for a sport on a given date.
    date format: YYYY-MM-DD
    sport: football, basketball, tennis, cricket, ice-hockey, etc.
    """
    data = api_get(f"/sport/{sport}/scheduled-events/{date}")
    if not data:
        return []

    events = []
    for event in data.get("events", []):
        status = event.get("status", {})
        events.append({
            "id": event.get("id"),
            "slug": event.get("slug"),
            "home_team": event.get("homeTeam", {}).get("name"),
            "away_team": event.get("awayTeam", {}).get("name"),
            "home_id": event.get("homeTeam", {}).get("id"),
            "away_id": event.get("awayTeam", {}).get("id"),
            "home_score": event.get("homeScore", {}).get("current"),
            "away_score": event.get("awayScore", {}).get("current"),
            "start_timestamp": event.get("startTimestamp"),
            "status_type": status.get("type"),  # notstarted, inprogress, finished, cancelled
            "status_description": status.get("description"),
            "tournament": event.get("tournament", {}).get("name"),
            "tournament_id": event.get("tournament", {}).get("uniqueTournament", {}).get("id"),
            "season_id": event.get("season", {}).get("id"),
            "round": event.get("roundInfo", {}).get("round"),
        })

    return events


def get_live_events(sport: str) -> list[dict]:
    """Get all currently live events for a sport."""
    data = api_get(f"/sport/{sport}/events/live")
    if not data:
        return []
    return [
        {
            "id": e.get("id"),
            "home_team": e.get("homeTeam", {}).get("name"),
            "away_team": e.get("awayTeam", {}).get("name"),
            "home_score": e.get("homeScore", {}).get("current"),
            "away_score": e.get("awayScore", {}).get("current"),
            "minute": e.get("time", {}).get("played"),
            "period": e.get("time", {}).get("period"),
            "tournament": e.get("tournament", {}).get("name"),
        }
        for e in data.get("events", [])
    ]

Match Statistics

def get_match_stats(event_id: int) -> dict:
    """
    Fetch match statistics broken down by period.
    Returns possession, shots, passes, tackles, etc.
    """
    data = api_get(f"/event/{event_id}/statistics")
    if not data:
        return {}

    result = {"periods": {}}

    for period_data in data.get("statistics", []):
        period = period_data.get("period", "unknown")  # ALL, 1ST, 2ND
        stats = {}

        for group in period_data.get("groups", []):
            group_name = group.get("groupName", "")
            for item in group.get("statisticsItems", []):
                stat_name = item.get("name", "")
                stats[stat_name] = {
                    "home": item.get("home"),
                    "away": item.get("away"),
                    "compare_code": item.get("compareCode"),  # 1=home better, 2=away, 3=equal
                }

        result["periods"][period] = {"group": group_name, "stats": stats}

    return result


def parse_possession(stats: dict) -> tuple[float | None, float | None]:
    """Extract home/away ball possession percentages."""
    all_stats = stats.get("periods", {}).get("ALL", {}).get("stats", {})
    possession = all_stats.get("Ball possession", {})
    home_poss = possession.get("home")
    away_poss = possession.get("away")

    def pct(val):
        if val is None:
            return None
        s = str(val).replace("%", "").strip()
        try:
            return float(s)
        except ValueError:
            return None

    return pct(home_poss), pct(away_poss)

Player Ratings and Lineups

def get_player_ratings(event_id: int) -> dict:
    """
    Fetch starting lineups and player ratings for a completed match.
    Ratings are only available after the match finishes.
    Ratings range from 0-10, with 6.0 being average.
    """
    data = api_get(f"/event/{event_id}/lineups")
    if not data:
        return {"home": [], "away": []}

    result = {"home": [], "away": [], "formation": {}}

    for side in ("home", "away"):
        side_data = data.get(side, {})
        result["formation"][side] = side_data.get("formation")

        players = side_data.get("players", [])
        for p in players:
            player_info = p.get("player", {})
            stats = p.get("statistics", {})
            result[side].append({
                "name": player_info.get("name"),
                "short_name": player_info.get("shortName"),
                "id": player_info.get("id"),
                "position": p.get("position"),  # G, D, M, F
                "shirt_number": p.get("shirtNumber"),
                "substitute": p.get("substitute", False),
                "captain": p.get("captain", False),
                # Performance metrics
                "rating": stats.get("rating"),
                "minutes_played": stats.get("minutesPlayed"),
                "goals": stats.get("goals", 0),
                "assists": stats.get("goalAssist", 0),
                "shots": stats.get("onTargetScoringAttempt", 0) + stats.get("blockedScoringAttempt", 0),
                "shots_on_target": stats.get("onTargetScoringAttempt", 0),
                "passes": stats.get("totalPass", 0),
                "pass_accuracy_pct": stats.get("accuratePass", 0) / max(stats.get("totalPass", 1), 1) * 100,
                "tackles": stats.get("totalTackle", 0),
                "interceptions": stats.get("interceptionWon", 0),
                "yellow_cards": stats.get("yellowCard", 0),
                "red_cards": stats.get("redCard", 0),
                "dribbles_completed": stats.get("wonContest", 0),
            })

    return result


def get_match_incidents(event_id: int) -> list[dict]:
    """
    Fetch timeline of match events: goals, cards, substitutions, VAR decisions.
    """
    data = api_get(f"/event/{event_id}/incidents")
    if not data:
        return []

    incidents = []
    for inc in data.get("incidents", []):
        incidents.append({
            "type": inc.get("incidentType"),  # goal, card, substitution, periodStart, etc.
            "minute": inc.get("time"),
            "added_time": inc.get("addedTime"),
            "team_side": inc.get("isHome") and "home" or "away",
            "player": inc.get("player", {}).get("name") if inc.get("player") else None,
            "description": inc.get("incidentClass"),  # regular, ownGoal, penalty, yellowCard, etc.
            "from_player": inc.get("playerIn", {}).get("name") if inc.get("playerIn") else None,
            "to_player": inc.get("playerOut", {}).get("name") if inc.get("playerOut") else None,
        })

    return sorted(incidents, key=lambda x: x["minute"] or 0)

Head-to-Head History

def get_h2h(event_id: int, max_matches: int = 20) -> dict:
    """Get head-to-head history between the two teams in a match."""
    data = api_get(f"/event/{event_id}/h2h")
    if not data:
        return {}

    def parse_match_list(matches: list) -> list[dict]:
        parsed = []
        for m in matches[:max_matches]:
            parsed.append({
                "id": m.get("id"),
                "date": m.get("startTimestamp"),
                "home": m.get("homeTeam", {}).get("name"),
                "away": m.get("awayTeam", {}).get("name"),
                "home_score": m.get("homeScore", {}).get("current"),
                "away_score": m.get("awayScore", {}).get("current"),
                "tournament": m.get("tournament", {}).get("name"),
            })
        return parsed

    return {
        "teams_duels": data.get("teamDuel", {}),
        "manager_duels": data.get("managerDuel", {}),
        "previous_events": parse_match_list(data.get("previousEventList", [])),
    }

Tournament Standings and Top Players

def get_tournament_standings(
    tournament_id: int,
    season_id: int,
    standing_type: str = "total",  # total, home, away
) -> list[dict]:
    """Get league table / tournament standings."""
    data = api_get(
        f"/unique-tournament/{tournament_id}/season/{season_id}/standings/{standing_type}"
    )
    if not data:
        return []

    rows = []
    for group in data.get("standings", []):
        for row in group.get("rows", []):
            rows.append({
                "position": row.get("position"),
                "team": row.get("team", {}).get("name"),
                "team_id": row.get("team", {}).get("id"),
                "played": row.get("matches"),
                "wins": row.get("wins"),
                "draws": row.get("draws"),
                "losses": row.get("losses"),
                "goals_for": row.get("scoresFor"),
                "goals_against": row.get("scoresAgainst"),
                "goal_difference": row.get("goalDifference"),
                "points": row.get("points"),
                "form": row.get("promotion", {}).get("text"),  # e.g., "W W L W D"
            })

    return rows


def get_top_players(
    tournament_id: int,
    season_id: int,
    stat: str = "overall",  # overall, goals, assists, rating
) -> list[dict]:
    """Get top players in a tournament ranked by a statistic."""
    data = api_get(
        f"/unique-tournament/{tournament_id}/season/{season_id}/top-players/{stat}"
    )
    if not data:
        return []

    players = []
    for item in data.get("topPlayers", []):
        p = item.get("player", {})
        stats = item.get("statistics", {})
        players.append({
            "rank": len(players) + 1,
            "name": p.get("name"),
            "id": p.get("id"),
            "team": p.get("team", {}).get("name"),
            "position": p.get("position"),
            "nationality": p.get("country", {}).get("name"),
            "rating": stats.get("rating"),
            "goals": stats.get("goals"),
            "assists": stats.get("assists"),
            "matches": stats.get("appearances"),
            "minutes_per_goal": stats.get("minutesPerGoal"),
        })

    return players

Anti-Detection Strategy

SofaScore's defenses are real. Several things they do:

The header set above gets you past header validation. For sustained collection, IP rotation via residential proxies is the main tool.

ThorData is a solid option — a large residential proxy pool with city-level targeting, useful for testing geo-specific content or spreading requests across locations. Sticky sessions help when you're paginating through tournament standings and don't want IP changes mid-sequence.

# ThorData proxy integration
PROXY_USER = "your_user"
PROXY_PASS = "your_pass"
PROXY_HOST = "proxy.thordata.com"
PROXY_PORT = 9000


def get_rotating_proxy() -> str:
    """Get a proxy URL with rotating IP."""
    return f"http://{PROXY_USER}-rotate:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"


def get_sticky_proxy(session_id: str) -> str:
    """Get a proxy URL that sticks to one IP for the session."""
    return f"http://{PROXY_USER}-session-{session_id}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"

Storing Data in SQLite

import sqlite3
from datetime import datetime, timezone


def init_sofascore_db(path: str = "sofascore.db") -> sqlite3.Connection:
    conn = sqlite3.connect(path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS events (
            id INTEGER PRIMARY KEY,
            sport TEXT,
            home_team TEXT,
            away_team TEXT,
            home_score INTEGER,
            away_score INTEGER,
            date TEXT,
            tournament TEXT,
            tournament_id INTEGER,
            season_id INTEGER,
            round INTEGER,
            status TEXT,
            fetched_at TEXT
        );

        CREATE TABLE IF NOT EXISTS player_ratings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            event_id INTEGER NOT NULL,
            player_id INTEGER,
            player_name TEXT,
            team_side TEXT,
            position TEXT,
            rating REAL,
            minutes_played INTEGER,
            goals INTEGER,
            assists INTEGER,
            shots INTEGER,
            passes INTEGER,
            tackles INTEGER,
            yellow_cards INTEGER,
            red_cards INTEGER,
            FOREIGN KEY(event_id) REFERENCES events(id)
        );

        CREATE TABLE IF NOT EXISTS match_stats (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            event_id INTEGER NOT NULL,
            stat_name TEXT,
            home_value TEXT,
            away_value TEXT,
            period TEXT DEFAULT 'ALL',
            FOREIGN KEY(event_id) REFERENCES events(id)
        );

        CREATE TABLE IF NOT EXISTS live_scores (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            event_id INTEGER NOT NULL,
            home_score INTEGER,
            away_score INTEGER,
            minute INTEGER,
            period TEXT,
            recorded_at TEXT NOT NULL
        );

        CREATE INDEX IF NOT EXISTS idx_events_date ON events(date);
        CREATE INDEX IF NOT EXISTS idx_ratings_event ON player_ratings(event_id);
        CREATE INDEX IF NOT EXISTS idx_live_event ON live_scores(event_id, recorded_at);
    """)
    conn.commit()
    return conn


def save_event(conn: sqlite3.Connection, event: dict, sport: str) -> None:
    import datetime as dt
    date_str = None
    if event.get("start_timestamp"):
        date_str = dt.datetime.fromtimestamp(
            event["start_timestamp"], tz=dt.timezone.utc
        ).strftime("%Y-%m-%d")

    conn.execute("""
        INSERT OR REPLACE INTO events
        (id, sport, home_team, away_team, home_score, away_score, date,
         tournament, tournament_id, season_id, round, status, fetched_at)
        VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)
    """, (
        event["id"], sport, event.get("home_team"), event.get("away_team"),
        event.get("home_score"), event.get("away_score"), date_str,
        event.get("tournament"), event.get("tournament_id"),
        event.get("season_id"), event.get("round"),
        event.get("status_type"),
        datetime.now(timezone.utc).isoformat(),
    ))
    conn.commit()


def save_player_ratings(
    conn: sqlite3.Connection,
    event_id: int,
    lineups: dict,
) -> None:
    for side in ("home", "away"):
        for p in lineups.get(side, []):
            conn.execute("""
                INSERT INTO player_ratings
                (event_id, player_id, player_name, team_side, position,
                 rating, minutes_played, goals, assists, shots, passes,
                 tackles, yellow_cards, red_cards)
                VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
            """, (
                event_id, p.get("id"), p.get("name"), side,
                p.get("position"), p.get("rating"), p.get("minutes_played"),
                p.get("goals"), p.get("assists"), p.get("shots"),
                p.get("passes"), p.get("tackles"),
                p.get("yellow_cards"), p.get("red_cards"),
            ))
    conn.commit()


def record_live_score(
    conn: sqlite3.Connection,
    event_id: int,
    home_score: int,
    away_score: int,
    minute: int | None,
    period: str | None,
) -> None:
    conn.execute("""
        INSERT INTO live_scores (event_id, home_score, away_score, minute, period, recorded_at)
        VALUES (?,?,?,?,?,?)
    """, (
        event_id, home_score, away_score, minute, period,
        datetime.now(timezone.utc).isoformat(),
    ))
    conn.commit()

Complete Collection Pipeline

def collect_match_day(
    sport: str,
    date: str,
    proxy_url: str | None = None,
    collect_stats: bool = True,
    collect_ratings: bool = True,
) -> None:
    """
    Full pipeline for one day of matches:
    1. Get scheduled events
    2. For finished matches: pull stats, ratings, incidents
    3. Store everything in SQLite
    """
    conn = init_sofascore_db()
    print(f"Collecting {sport} events for {date}...")

    events = get_scheduled_events(sport, date)
    print(f"Found {len(events)} events")

    finished = [e for e in events if e.get("status_type") == "finished"]
    print(f"Finished: {len(finished)}, In progress: "
          f"{len([e for e in events if e.get('status_type') == 'inprogress'])}")

    for event in events:
        save_event(conn, event, sport)

    for event in finished:
        eid = event["id"]
        print(f"\n{event['home_team']} {event['home_score']}-{event['away_score']} {event['away_team']}")

        if collect_stats:
            stats = get_match_stats(eid)
            if stats:
                for period, period_data in stats.get("periods", {}).items():
                    for stat_name, values in period_data.get("stats", {}).items():
                        conn.execute("""
                            INSERT OR REPLACE INTO match_stats
                            (event_id, stat_name, home_value, away_value, period)
                            VALUES (?,?,?,?,?)
                        """, (eid, stat_name, str(values.get("home", "")),
                               str(values.get("away", "")), period))
                conn.commit()
                home_poss, away_poss = parse_possession(stats)
                print(f"  Possession: {home_poss}% / {away_poss}%")

        if collect_ratings:
            lineups = get_player_ratings(eid)
            save_player_ratings(conn, eid, lineups)
            if lineups.get("home"):
                top_home = max(lineups["home"], key=lambda x: x.get("rating") or 0)
                top_away = max(lineups["away"], key=lambda x: x.get("rating") or 0)
                print(f"  Best player: {top_home['name']} ({top_home['rating']:.1f}) "
                      f"vs {top_away['name']} ({top_away['rating']:.1f})")

        time.sleep(random.uniform(1, 2.5))

    conn.close()
    print(f"\nDone. Results saved to sofascore.db")


# Live score polling loop
def poll_live_scores(
    sport: str,
    interval_seconds: int = 60,
    proxy_url: str | None = None,
) -> None:
    """Poll live scores every interval_seconds and store time-series data."""
    conn = init_sofascore_db()
    print(f"Polling {sport} live scores every {interval_seconds}s. Ctrl+C to stop.")

    try:
        while True:
            live = get_live_events(sport)
            if live:
                print(f"{datetime.now(timezone.utc).strftime('%H:%M:%S')} — "
                      f"{len(live)} live matches")
                for event in live:
                    record_live_score(
                        conn,
                        event["id"],
                        event.get("home_score") or 0,
                        event.get("away_score") or 0,
                        event.get("minute"),
                        str(event.get("period")),
                    )
                    print(f"  {event['home_team']} {event['home_score']}-"
                          f"{event['away_score']} {event['away_team']} "
                          f"({event.get('minute', '?')}')")
            else:
                print(f"{datetime.now(timezone.utc).strftime('%H:%M:%S')} — no live matches")

            time.sleep(interval_seconds)
    except KeyboardInterrupt:
        print("Polling stopped.")
    finally:
        conn.close()


if __name__ == "__main__":
    # Collect yesterday's Premier League data
    from datetime import date, timedelta
    yesterday = (date.today() - timedelta(days=1)).isoformat()
    collect_match_day("football", yesterday, collect_stats=True, collect_ratings=True)

    # Or poll live scores
    # poll_live_scores("football", interval_seconds=60)

Useful Tournament and Season IDs

Finding tournament/season IDs: browse to a tournament page on SofaScore, open DevTools, and check the XHR calls for the numeric IDs. Common ones:

Tournament Tournament ID
Premier League 17
La Liga 8
Bundesliga 35
Serie A 23
Ligue 1 34
Champions League 7
NBA 132

Season IDs change each year. Fetch current season from:

GET /unique-tournament/{tournament_id}/seasons

Common Gotchas

Ratings are post-match only: The lineups endpoint returns player data immediately (formation, starting XI), but statistics.rating is null until the match finishes. Don't scrape ratings for live matches.

Timestamps are Unix: startTimestamp in event data is a Unix timestamp (seconds since epoch). Use datetime.fromtimestamp() to convert.

Sport names are slugs: Use ice-hockey, not icehockey. Use american-football, not nfl. Check the URL on SofaScore's site if unsure.

API endpoint changes: SofaScore has changed endpoint paths before without notice. If something stops working, open DevTools on a fresh match page, trace the XHR calls, and update the path. The data structures stay fairly consistent even when paths change.

No auth for most endpoints: Most endpoints work without authentication. A handful of premium data points (player market values, detailed injury histories) require a session cookie from a paid account.

Score precision: homeScore.current is the current/final score. homeScore.period1 gives first-half score. For in-progress matches, period1 may be set while period2 is null.

SofaScore's API doesn't require authentication for most endpoints, which makes it genuinely accessible. The data — player ratings across an entire season, head-to-head stats, live score monitoring with minute-by-minute resolution — is a solid foundation for analysis tools, fantasy sports applications, or any project tracking live sports performance.