Scrape SofaScore: Live Sports Scores, Player Ratings & Match Stats with Python (2026)
Scrape SofaScore: Live Sports Scores, Player Ratings & Match Stats with Python (2026)
SofaScore is one of the better sports data sites out there. Live scores across football, basketball, tennis, cricket, hockey — and their player rating system (the 0-10 scale) has become genuinely influential. Scouts use it. Fantasy players use it. Journalists cite it.
If you want programmatic access to that data, the good news is that SofaScore is entirely API-driven under the hood. Every piece of data you see — scores, lineups, ratings, stats — gets fetched from their internal JSON API via XHR requests in the browser. This means scraping HTML is useless. You want the API calls directly.
This guide covers how their API works, gives you working code to pull match stats, player ratings, and historical data, explains anti-detection, and shows how to store everything in a structured database.
How SofaScore Actually Delivers Data
SofaScore doesn't render match data server-side into HTML. Open DevTools in Chrome or Firefox, go to the Network tab, filter by Fetch/XHR, then load a match page. You'll see requests to api.sofascore.com. The response bodies are clean JSON.
The key insight: the HTML is just a shell. All meaningful data lives in the API. Scraping the HTML gives you almost nothing useful.
API Base URL and Key Endpoints
Base URL: https://api.sofascore.com/api/v1
Events (matches) by sport and date:
GET /sport/{sport}/scheduled-events/{date}
Where {sport} is football, basketball, tennis, cricket, ice-hockey, volleyball, handball, rugby, etc. and {date} is YYYY-MM-DD.
Live events (currently in progress):
GET /sport/{sport}/events/live
Match statistics:
GET /event/{id}/statistics
Lineups and player ratings:
GET /event/{id}/lineups
Match incidents (goals, cards, substitutions):
GET /event/{id}/incidents
Head-to-head history:
GET /event/{id}/h2h
Tournament standings:
GET /unique-tournament/{tournament_id}/season/{season_id}/standings/total
Player statistics for a tournament:
GET /unique-tournament/{tournament_id}/season/{season_id}/top-players/overall
The {id} is SofaScore's internal event ID. You get it from the scheduled-events endpoint or from the URL when you visit a match page (sofascore.com/football/[teams]/[event-id]).
Required Headers
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept": "application/json",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.sofascore.com/",
"Origin": "https://www.sofascore.com",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
}
Setting a real Referer and Origin matters — SofaScore validates that API requests appear to come from their web frontend.
Core Data Fetching Functions
import requests
import json
import time
import random
from typing import Optional
BASE_URL = "https://api.sofascore.com/api/v1"
def api_get(
path: str,
params: dict | None = None,
proxy_url: str | None = None,
max_retries: int = 5,
) -> Optional[dict]:
"""Make a single API call with retry logic."""
proxies = {"http": proxy_url, "https": proxy_url} if proxy_url else None
for attempt in range(max_retries):
try:
resp = requests.get(
f"{BASE_URL}{path}",
headers=HEADERS,
params=params,
proxies=proxies,
timeout=15,
)
if resp.status_code == 429:
wait = (2 ** attempt) + random.uniform(0, 2)
print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})...")
time.sleep(wait)
continue
if resp.status_code == 404:
return None
resp.raise_for_status()
return resp.json()
except requests.HTTPError as e:
if attempt == max_retries - 1:
print(f"HTTP error after {max_retries} attempts: {e}")
return None
time.sleep(2 ** attempt)
except requests.RequestException as e:
if attempt == max_retries - 1:
print(f"Request failed: {e}")
return None
time.sleep(2)
return None
def get_scheduled_events(sport: str, date: str) -> list[dict]:
"""
Get all scheduled events for a sport on a given date.
date format: YYYY-MM-DD
sport: football, basketball, tennis, cricket, ice-hockey, etc.
"""
data = api_get(f"/sport/{sport}/scheduled-events/{date}")
if not data:
return []
events = []
for event in data.get("events", []):
status = event.get("status", {})
events.append({
"id": event.get("id"),
"slug": event.get("slug"),
"home_team": event.get("homeTeam", {}).get("name"),
"away_team": event.get("awayTeam", {}).get("name"),
"home_id": event.get("homeTeam", {}).get("id"),
"away_id": event.get("awayTeam", {}).get("id"),
"home_score": event.get("homeScore", {}).get("current"),
"away_score": event.get("awayScore", {}).get("current"),
"start_timestamp": event.get("startTimestamp"),
"status_type": status.get("type"), # notstarted, inprogress, finished, cancelled
"status_description": status.get("description"),
"tournament": event.get("tournament", {}).get("name"),
"tournament_id": event.get("tournament", {}).get("uniqueTournament", {}).get("id"),
"season_id": event.get("season", {}).get("id"),
"round": event.get("roundInfo", {}).get("round"),
})
return events
def get_live_events(sport: str) -> list[dict]:
"""Get all currently live events for a sport."""
data = api_get(f"/sport/{sport}/events/live")
if not data:
return []
return [
{
"id": e.get("id"),
"home_team": e.get("homeTeam", {}).get("name"),
"away_team": e.get("awayTeam", {}).get("name"),
"home_score": e.get("homeScore", {}).get("current"),
"away_score": e.get("awayScore", {}).get("current"),
"minute": e.get("time", {}).get("played"),
"period": e.get("time", {}).get("period"),
"tournament": e.get("tournament", {}).get("name"),
}
for e in data.get("events", [])
]
Match Statistics
def get_match_stats(event_id: int) -> dict:
"""
Fetch match statistics broken down by period.
Returns possession, shots, passes, tackles, etc.
"""
data = api_get(f"/event/{event_id}/statistics")
if not data:
return {}
result = {"periods": {}}
for period_data in data.get("statistics", []):
period = period_data.get("period", "unknown") # ALL, 1ST, 2ND
stats = {}
for group in period_data.get("groups", []):
group_name = group.get("groupName", "")
for item in group.get("statisticsItems", []):
stat_name = item.get("name", "")
stats[stat_name] = {
"home": item.get("home"),
"away": item.get("away"),
"compare_code": item.get("compareCode"), # 1=home better, 2=away, 3=equal
}
result["periods"][period] = {"group": group_name, "stats": stats}
return result
def parse_possession(stats: dict) -> tuple[float | None, float | None]:
"""Extract home/away ball possession percentages."""
all_stats = stats.get("periods", {}).get("ALL", {}).get("stats", {})
possession = all_stats.get("Ball possession", {})
home_poss = possession.get("home")
away_poss = possession.get("away")
def pct(val):
if val is None:
return None
s = str(val).replace("%", "").strip()
try:
return float(s)
except ValueError:
return None
return pct(home_poss), pct(away_poss)
Player Ratings and Lineups
def get_player_ratings(event_id: int) -> dict:
"""
Fetch starting lineups and player ratings for a completed match.
Ratings are only available after the match finishes.
Ratings range from 0-10, with 6.0 being average.
"""
data = api_get(f"/event/{event_id}/lineups")
if not data:
return {"home": [], "away": []}
result = {"home": [], "away": [], "formation": {}}
for side in ("home", "away"):
side_data = data.get(side, {})
result["formation"][side] = side_data.get("formation")
players = side_data.get("players", [])
for p in players:
player_info = p.get("player", {})
stats = p.get("statistics", {})
result[side].append({
"name": player_info.get("name"),
"short_name": player_info.get("shortName"),
"id": player_info.get("id"),
"position": p.get("position"), # G, D, M, F
"shirt_number": p.get("shirtNumber"),
"substitute": p.get("substitute", False),
"captain": p.get("captain", False),
# Performance metrics
"rating": stats.get("rating"),
"minutes_played": stats.get("minutesPlayed"),
"goals": stats.get("goals", 0),
"assists": stats.get("goalAssist", 0),
"shots": stats.get("onTargetScoringAttempt", 0) + stats.get("blockedScoringAttempt", 0),
"shots_on_target": stats.get("onTargetScoringAttempt", 0),
"passes": stats.get("totalPass", 0),
"pass_accuracy_pct": stats.get("accuratePass", 0) / max(stats.get("totalPass", 1), 1) * 100,
"tackles": stats.get("totalTackle", 0),
"interceptions": stats.get("interceptionWon", 0),
"yellow_cards": stats.get("yellowCard", 0),
"red_cards": stats.get("redCard", 0),
"dribbles_completed": stats.get("wonContest", 0),
})
return result
def get_match_incidents(event_id: int) -> list[dict]:
"""
Fetch timeline of match events: goals, cards, substitutions, VAR decisions.
"""
data = api_get(f"/event/{event_id}/incidents")
if not data:
return []
incidents = []
for inc in data.get("incidents", []):
incidents.append({
"type": inc.get("incidentType"), # goal, card, substitution, periodStart, etc.
"minute": inc.get("time"),
"added_time": inc.get("addedTime"),
"team_side": inc.get("isHome") and "home" or "away",
"player": inc.get("player", {}).get("name") if inc.get("player") else None,
"description": inc.get("incidentClass"), # regular, ownGoal, penalty, yellowCard, etc.
"from_player": inc.get("playerIn", {}).get("name") if inc.get("playerIn") else None,
"to_player": inc.get("playerOut", {}).get("name") if inc.get("playerOut") else None,
})
return sorted(incidents, key=lambda x: x["minute"] or 0)
Head-to-Head History
def get_h2h(event_id: int, max_matches: int = 20) -> dict:
"""Get head-to-head history between the two teams in a match."""
data = api_get(f"/event/{event_id}/h2h")
if not data:
return {}
def parse_match_list(matches: list) -> list[dict]:
parsed = []
for m in matches[:max_matches]:
parsed.append({
"id": m.get("id"),
"date": m.get("startTimestamp"),
"home": m.get("homeTeam", {}).get("name"),
"away": m.get("awayTeam", {}).get("name"),
"home_score": m.get("homeScore", {}).get("current"),
"away_score": m.get("awayScore", {}).get("current"),
"tournament": m.get("tournament", {}).get("name"),
})
return parsed
return {
"teams_duels": data.get("teamDuel", {}),
"manager_duels": data.get("managerDuel", {}),
"previous_events": parse_match_list(data.get("previousEventList", [])),
}
Tournament Standings and Top Players
def get_tournament_standings(
tournament_id: int,
season_id: int,
standing_type: str = "total", # total, home, away
) -> list[dict]:
"""Get league table / tournament standings."""
data = api_get(
f"/unique-tournament/{tournament_id}/season/{season_id}/standings/{standing_type}"
)
if not data:
return []
rows = []
for group in data.get("standings", []):
for row in group.get("rows", []):
rows.append({
"position": row.get("position"),
"team": row.get("team", {}).get("name"),
"team_id": row.get("team", {}).get("id"),
"played": row.get("matches"),
"wins": row.get("wins"),
"draws": row.get("draws"),
"losses": row.get("losses"),
"goals_for": row.get("scoresFor"),
"goals_against": row.get("scoresAgainst"),
"goal_difference": row.get("goalDifference"),
"points": row.get("points"),
"form": row.get("promotion", {}).get("text"), # e.g., "W W L W D"
})
return rows
def get_top_players(
tournament_id: int,
season_id: int,
stat: str = "overall", # overall, goals, assists, rating
) -> list[dict]:
"""Get top players in a tournament ranked by a statistic."""
data = api_get(
f"/unique-tournament/{tournament_id}/season/{season_id}/top-players/{stat}"
)
if not data:
return []
players = []
for item in data.get("topPlayers", []):
p = item.get("player", {})
stats = item.get("statistics", {})
players.append({
"rank": len(players) + 1,
"name": p.get("name"),
"id": p.get("id"),
"team": p.get("team", {}).get("name"),
"position": p.get("position"),
"nationality": p.get("country", {}).get("name"),
"rating": stats.get("rating"),
"goals": stats.get("goals"),
"assists": stats.get("assists"),
"matches": stats.get("appearances"),
"minutes_per_goal": stats.get("minutesPerGoal"),
})
return players
Anti-Detection Strategy
SofaScore's defenses are real. Several things they do:
- Cloudflare — on the web frontend (sofascore.com), not on the API domain (api.sofascore.com). The API is accessible without Cloudflare challenges if you send correct headers.
- IP rate limiting — more than ~60 requests/minute from one IP starts returning 429s or silently failing
- Header fingerprinting — missing
Referer,Origin, orSec-Fetch-*headers trips detection - API key/token requirements — some premium endpoints require authentication (player market value data, some advanced stats)
The header set above gets you past header validation. For sustained collection, IP rotation via residential proxies is the main tool.
ThorData is a solid option — a large residential proxy pool with city-level targeting, useful for testing geo-specific content or spreading requests across locations. Sticky sessions help when you're paginating through tournament standings and don't want IP changes mid-sequence.
# ThorData proxy integration
PROXY_USER = "your_user"
PROXY_PASS = "your_pass"
PROXY_HOST = "proxy.thordata.com"
PROXY_PORT = 9000
def get_rotating_proxy() -> str:
"""Get a proxy URL with rotating IP."""
return f"http://{PROXY_USER}-rotate:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"
def get_sticky_proxy(session_id: str) -> str:
"""Get a proxy URL that sticks to one IP for the session."""
return f"http://{PROXY_USER}-session-{session_id}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"
Storing Data in SQLite
import sqlite3
from datetime import datetime, timezone
def init_sofascore_db(path: str = "sofascore.db") -> sqlite3.Connection:
conn = sqlite3.connect(path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY,
sport TEXT,
home_team TEXT,
away_team TEXT,
home_score INTEGER,
away_score INTEGER,
date TEXT,
tournament TEXT,
tournament_id INTEGER,
season_id INTEGER,
round INTEGER,
status TEXT,
fetched_at TEXT
);
CREATE TABLE IF NOT EXISTS player_ratings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id INTEGER NOT NULL,
player_id INTEGER,
player_name TEXT,
team_side TEXT,
position TEXT,
rating REAL,
minutes_played INTEGER,
goals INTEGER,
assists INTEGER,
shots INTEGER,
passes INTEGER,
tackles INTEGER,
yellow_cards INTEGER,
red_cards INTEGER,
FOREIGN KEY(event_id) REFERENCES events(id)
);
CREATE TABLE IF NOT EXISTS match_stats (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id INTEGER NOT NULL,
stat_name TEXT,
home_value TEXT,
away_value TEXT,
period TEXT DEFAULT 'ALL',
FOREIGN KEY(event_id) REFERENCES events(id)
);
CREATE TABLE IF NOT EXISTS live_scores (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id INTEGER NOT NULL,
home_score INTEGER,
away_score INTEGER,
minute INTEGER,
period TEXT,
recorded_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_events_date ON events(date);
CREATE INDEX IF NOT EXISTS idx_ratings_event ON player_ratings(event_id);
CREATE INDEX IF NOT EXISTS idx_live_event ON live_scores(event_id, recorded_at);
""")
conn.commit()
return conn
def save_event(conn: sqlite3.Connection, event: dict, sport: str) -> None:
import datetime as dt
date_str = None
if event.get("start_timestamp"):
date_str = dt.datetime.fromtimestamp(
event["start_timestamp"], tz=dt.timezone.utc
).strftime("%Y-%m-%d")
conn.execute("""
INSERT OR REPLACE INTO events
(id, sport, home_team, away_team, home_score, away_score, date,
tournament, tournament_id, season_id, round, status, fetched_at)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)
""", (
event["id"], sport, event.get("home_team"), event.get("away_team"),
event.get("home_score"), event.get("away_score"), date_str,
event.get("tournament"), event.get("tournament_id"),
event.get("season_id"), event.get("round"),
event.get("status_type"),
datetime.now(timezone.utc).isoformat(),
))
conn.commit()
def save_player_ratings(
conn: sqlite3.Connection,
event_id: int,
lineups: dict,
) -> None:
for side in ("home", "away"):
for p in lineups.get(side, []):
conn.execute("""
INSERT INTO player_ratings
(event_id, player_id, player_name, team_side, position,
rating, minutes_played, goals, assists, shots, passes,
tackles, yellow_cards, red_cards)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
""", (
event_id, p.get("id"), p.get("name"), side,
p.get("position"), p.get("rating"), p.get("minutes_played"),
p.get("goals"), p.get("assists"), p.get("shots"),
p.get("passes"), p.get("tackles"),
p.get("yellow_cards"), p.get("red_cards"),
))
conn.commit()
def record_live_score(
conn: sqlite3.Connection,
event_id: int,
home_score: int,
away_score: int,
minute: int | None,
period: str | None,
) -> None:
conn.execute("""
INSERT INTO live_scores (event_id, home_score, away_score, minute, period, recorded_at)
VALUES (?,?,?,?,?,?)
""", (
event_id, home_score, away_score, minute, period,
datetime.now(timezone.utc).isoformat(),
))
conn.commit()
Complete Collection Pipeline
def collect_match_day(
sport: str,
date: str,
proxy_url: str | None = None,
collect_stats: bool = True,
collect_ratings: bool = True,
) -> None:
"""
Full pipeline for one day of matches:
1. Get scheduled events
2. For finished matches: pull stats, ratings, incidents
3. Store everything in SQLite
"""
conn = init_sofascore_db()
print(f"Collecting {sport} events for {date}...")
events = get_scheduled_events(sport, date)
print(f"Found {len(events)} events")
finished = [e for e in events if e.get("status_type") == "finished"]
print(f"Finished: {len(finished)}, In progress: "
f"{len([e for e in events if e.get('status_type') == 'inprogress'])}")
for event in events:
save_event(conn, event, sport)
for event in finished:
eid = event["id"]
print(f"\n{event['home_team']} {event['home_score']}-{event['away_score']} {event['away_team']}")
if collect_stats:
stats = get_match_stats(eid)
if stats:
for period, period_data in stats.get("periods", {}).items():
for stat_name, values in period_data.get("stats", {}).items():
conn.execute("""
INSERT OR REPLACE INTO match_stats
(event_id, stat_name, home_value, away_value, period)
VALUES (?,?,?,?,?)
""", (eid, stat_name, str(values.get("home", "")),
str(values.get("away", "")), period))
conn.commit()
home_poss, away_poss = parse_possession(stats)
print(f" Possession: {home_poss}% / {away_poss}%")
if collect_ratings:
lineups = get_player_ratings(eid)
save_player_ratings(conn, eid, lineups)
if lineups.get("home"):
top_home = max(lineups["home"], key=lambda x: x.get("rating") or 0)
top_away = max(lineups["away"], key=lambda x: x.get("rating") or 0)
print(f" Best player: {top_home['name']} ({top_home['rating']:.1f}) "
f"vs {top_away['name']} ({top_away['rating']:.1f})")
time.sleep(random.uniform(1, 2.5))
conn.close()
print(f"\nDone. Results saved to sofascore.db")
# Live score polling loop
def poll_live_scores(
sport: str,
interval_seconds: int = 60,
proxy_url: str | None = None,
) -> None:
"""Poll live scores every interval_seconds and store time-series data."""
conn = init_sofascore_db()
print(f"Polling {sport} live scores every {interval_seconds}s. Ctrl+C to stop.")
try:
while True:
live = get_live_events(sport)
if live:
print(f"{datetime.now(timezone.utc).strftime('%H:%M:%S')} — "
f"{len(live)} live matches")
for event in live:
record_live_score(
conn,
event["id"],
event.get("home_score") or 0,
event.get("away_score") or 0,
event.get("minute"),
str(event.get("period")),
)
print(f" {event['home_team']} {event['home_score']}-"
f"{event['away_score']} {event['away_team']} "
f"({event.get('minute', '?')}')")
else:
print(f"{datetime.now(timezone.utc).strftime('%H:%M:%S')} — no live matches")
time.sleep(interval_seconds)
except KeyboardInterrupt:
print("Polling stopped.")
finally:
conn.close()
if __name__ == "__main__":
# Collect yesterday's Premier League data
from datetime import date, timedelta
yesterday = (date.today() - timedelta(days=1)).isoformat()
collect_match_day("football", yesterday, collect_stats=True, collect_ratings=True)
# Or poll live scores
# poll_live_scores("football", interval_seconds=60)
Useful Tournament and Season IDs
Finding tournament/season IDs: browse to a tournament page on SofaScore, open DevTools, and check the XHR calls for the numeric IDs. Common ones:
| Tournament | Tournament ID |
|---|---|
| Premier League | 17 |
| La Liga | 8 |
| Bundesliga | 35 |
| Serie A | 23 |
| Ligue 1 | 34 |
| Champions League | 7 |
| NBA | 132 |
Season IDs change each year. Fetch current season from:
GET /unique-tournament/{tournament_id}/seasons
Common Gotchas
Ratings are post-match only: The lineups endpoint returns player data immediately (formation, starting XI), but statistics.rating is null until the match finishes. Don't scrape ratings for live matches.
Timestamps are Unix: startTimestamp in event data is a Unix timestamp (seconds since epoch). Use datetime.fromtimestamp() to convert.
Sport names are slugs: Use ice-hockey, not icehockey. Use american-football, not nfl. Check the URL on SofaScore's site if unsure.
API endpoint changes: SofaScore has changed endpoint paths before without notice. If something stops working, open DevTools on a fresh match page, trace the XHR calls, and update the path. The data structures stay fairly consistent even when paths change.
No auth for most endpoints: Most endpoints work without authentication. A handful of premium data points (player market values, detailed injury histories) require a session cookie from a paid account.
Score precision: homeScore.current is the current/final score. homeScore.period1 gives first-half score. For in-progress matches, period1 may be set while period2 is null.
SofaScore's API doesn't require authentication for most endpoints, which makes it genuinely accessible. The data — player ratings across an entire season, head-to-head stats, live score monitoring with minute-by-minute resolution — is a solid foundation for analysis tools, fantasy sports applications, or any project tracking live sports performance.