← Back to blog

Scraping Transfermarkt Player Values and Transfer History with Python (2026)

Scraping Transfermarkt Player Values and Transfer History with Python (2026)

Transfermarkt is the definitive source for football transfer data. Every serious analysis of player valuations, transfer market trends, or club spending patterns starts here. The site tracks estimated market values for over 800,000 players across more than 1,000 competitions worldwide, along with complete transfer histories, contract details, injury records, and active transfer rumors.

There is no public API. Transfermarkt briefly offered a paid data product through a partnership with a sports data company, but for independent researchers and developers, scraping the HTML is the only path. The site is server-rendered (no JavaScript framework needed for most pages), which actually makes extraction easier than many modern platforms — but their anti-bot defenses have caught up in 2026.

What Data Is Available

Transfermarkt pages expose an enormous amount of structured football data:

The valuation history is particularly valuable — it gives you time-series data on how a player's perceived worth has changed across their career, correlated with performance, injuries, age, and contract status.

Why Football Data Is Commercially Useful

Football analytics has grown from a niche hobby to a major industry. Beyond hobbyist use, Transfermarkt data powers:

Anti-Bot Measures

Transfermarkt has progressively hardened their defenses in 2026:

Cloudflare with aggressive bot scoring. Transfermarkt uses Cloudflare's Bot Management product (not just the basic WAF). This evaluates TLS fingerprint, IP reputation, request patterns, and JavaScript challenge results. Datacenter IPs are blocked outright in most cases.

Mandatory User-Agent and headers. Requests without a complete set of browser-like headers (Accept, Accept-Language, Accept-Encoding, plus a valid User-Agent) get 403 responses even before the Cloudflare layer.

Rate limiting. Sustained request rates above roughly 20 requests per minute from a single IP trigger temporary blocks. The blocks escalate — first a Cloudflare challenge page, then a hard 403 for the IP that can last hours.

Cookie validation. Transfermarkt sets multiple tracking cookies on first visit. Subsequent requests without these cookies are treated as new sessions and face repeated challenges. Maintaining a persistent cookie jar across requests is critical.

Page structure obfuscation. While the site is server-rendered HTML, class names and element IDs change periodically. Transfermarkt appears to rotate some CSS class names on a roughly monthly basis, likely to break hardcoded selectors in scrapers.

Setting Up the Scraper

Because Transfermarkt is server-rendered, you can often get by with httpx plus proper headers — you don't always need a full browser. But Cloudflare challenges on datacenter IPs push most setups toward Playwright anyway.

pip install httpx playwright playwright-stealth parsel
playwright install chromium

The httpx approach (works with residential IPs that pass Cloudflare):

import httpx
import time
import random
from parsel import Selector

def create_session(proxy: str = None) -> httpx.Client:
    """Create a persistent session with browser-like headers."""
    headers = {
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/125.0.0.0 Safari/537.36"
        ),
        "Accept": (
            "text/html,application/xhtml+xml,"
            "application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"
        ),
        "Accept-Language": "en-US,en;q=0.9,de;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://www.transfermarkt.com/",
        "DNT": "1",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "same-origin",
    }

    client_kwargs = {
        "headers": headers,
        "follow_redirects": True,
        "timeout": 30.0,
    }
    if proxy:
        client_kwargs["proxy"] = proxy

    client = httpx.Client(**client_kwargs)

    # Warm up session with homepage
    resp = client.get("https://www.transfermarkt.com/")
    if resp.status_code == 200:
        print("Session established successfully")
    else:
        print(f"Warning: homepage returned {resp.status_code}")

    time.sleep(random.uniform(1.5, 3.0))
    return client

Scraping Player Profiles

Transfermarkt player URLs follow a predictable pattern: /player-name/profil/spieler/{player_id}. The player page contains the core profile data and current market value:

def scrape_player(player_url: str, session: httpx.Client) -> dict | None:
    """Scrape a player profile page for current metadata."""
    resp = session.get(player_url)
    if resp.status_code != 200:
        print(f"Failed to fetch {player_url}: {resp.status_code}")
        return None

    sel = Selector(text=resp.text)
    result = {"url": player_url}

    # Player name — primary h1
    result["name"] = sel.css(
        "h1.data-header__headline-wrapper::text"
    ).get("").strip()

    # Market value (the main headline figure)
    value_parts = sel.css(
        "a.data-header__market-value-wrapper::text"
    ).getall()
    result["market_value"] = " ".join(
        v.strip() for v in value_parts if v.strip()
    )

    # Profile detail items from the info table
    info_items = sel.css("ul.data-header__items li")
    for item in info_items:
        label = item.css("span.data-header__label::text").get("").strip().lower()
        value = item.css(
            "span.data-header__content::text, a::text"
        ).get("").strip()

        if "date of birth" in label:
            result["birth_date"] = value
        elif "citizenship" in label:
            result["nationality"] = value
        elif "height" in label:
            result["height"] = value
        elif "position" in label:
            result["position"] = value
        elif "foot" in label:
            result["preferred_foot"] = value
        elif "agent" in label:
            result["agent"] = value
        elif "shirt" in label or "number" in label:
            result["shirt_number"] = value

    # Current club
    club = sel.css("span.data-header__club a::text").get("").strip()
    result["current_club"] = club if club else None

    # Contract expiration
    contract = sel.css(
        "span.data-header__content[itemprop='endDate']::text"
    ).get("").strip()
    result["contract_until"] = contract if contract else None

    # Player ID from URL
    parts = player_url.rstrip("/").split("/")
    result["player_id"] = parts[-1] if parts else None

    return result

Extracting Market Value History

The value history is loaded on the player's market value page and embedded as JavaScript data for the chart rendering:

import json
import re

def scrape_value_history(player_id: str,
                          session: httpx.Client) -> list[dict]:
    """Extract historical market value data from the chart JavaScript."""
    url = (f"https://www.transfermarkt.com/player/"
           f"marktwertverlauf/spieler/{player_id}")
    resp = session.get(url)
    if resp.status_code != 200:
        return []

    # The chart data is embedded in a JavaScript variable
    # Try multiple patterns Transfermarkt has used over the years
    patterns = [
        r"var\s+chartData\s*=\s*(\[.*?\]);",
        r"'data':\s*(\[.*?\])\s*[,}]",
        r"marktwertverlauf\s*=\s*(\[.*?\]);",
    ]

    match = None
    for pattern in patterns:
        match = re.search(pattern, resp.text, re.DOTALL)
        if match:
            break

    if not match:
        return []

    try:
        raw = match.group(1)
        # Normalize JavaScript object notation to valid JSON
        # JS objects use unquoted keys; JSON requires quoted keys
        raw = re.sub(r"(?<=[{,\[])\s*(\w+)\s*:", r'"\1":', raw)
        raw = raw.replace("'", '"')
        data = json.loads(raw)

        values = []
        for point in data:
            values.append({
                "date": point.get("datum_mw") or point.get("x") or point.get("date"),
                "value": point.get("mw") or point.get("y") or point.get("value"),
                "club": point.get("verein") or point.get("club"),
                "age": point.get("age"),
                "nationality": point.get("nat"),
            })
        return values

    except (json.JSONDecodeError, TypeError) as e:
        print(f"Failed to parse value history for player {player_id}: {e}")
        return []


def get_peak_value(value_history: list[dict]) -> dict | None:
    """Find the peak market value entry in a player's history."""
    if not value_history:
        return None

    def parse_value(v: str) -> float:
        """Parse '€45.00m' or '€900k' to float."""
        if not v:
            return 0.0
        v = v.replace("€", "").replace(",", "").strip()
        if "m" in v.lower():
            return float(v.lower().replace("m", "")) * 1_000_000
        elif "k" in v.lower():
            return float(v.lower().replace("k", "")) * 1_000
        try:
            return float(v)
        except ValueError:
            return 0.0

    return max(value_history,
               key=lambda x: parse_value(str(x.get("value", "0"))))

Transfer History Extraction

The transfer history table is standard HTML and parseable without JavaScript:

def scrape_transfers(player_id: str,
                      session: httpx.Client) -> list[dict]:
    """Extract complete transfer history for a player."""
    url = (f"https://www.transfermarkt.com/player/"
           f"transfers/spieler/{player_id}")
    resp = session.get(url)
    if resp.status_code != 200:
        return []

    sel = Selector(text=resp.text)
    transfers = []

    # Main transfer table rows
    rows = sel.css("div.grid-view table.items tbody tr")
    for row in rows:
        cells = row.css("td")
        if len(cells) < 5:
            continue

        raw_fee = cells[5].css("a::text, ::text").get("").strip() if len(cells) > 5 else ""
        mv_at_time = cells[6].css("::text").get("").strip() if len(cells) > 6 else None

        transfer = {
            "season": cells[0].css("::text").get("").strip(),
            "date": cells[1].css("::text").get("").strip(),
            "from_club": cells[2].css("a::text").get("").strip(),
            "from_club_country": cells[2].css("img::attr(title)").get(""),
            "to_club": cells[4].css("a::text").get("").strip(),
            "to_club_country": cells[4].css("img::attr(title)").get(""),
            "fee": raw_fee,
            "market_value_at_time": mv_at_time,
        }

        # Classify transfer type from fee text
        fee_lower = raw_fee.lower()
        if "loan" in fee_lower or "leihe" in fee_lower:
            transfer["type"] = "loan"
        elif "free" in fee_lower or "ablösefrei" in fee_lower:
            transfer["type"] = "free_transfer"
        elif "end of loan" in fee_lower:
            transfer["type"] = "loan_end"
        elif any(c in raw_fee for c in ["€", "$", "£", "¥"]):
            transfer["type"] = "paid"
        elif "?" in raw_fee:
            transfer["type"] = "undisclosed"
        else:
            transfer["type"] = "unknown"

        if transfer["from_club"] or transfer["to_club"]:
            transfers.append(transfer)

    time.sleep(random.uniform(1.5, 3.0))
    return transfers

Transfer Rumors

The rumors page aggregates active transfer speculation:

def scrape_rumors(player_id: str,
                   session: httpx.Client) -> list[dict]:
    """Scrape active transfer rumors for a player."""
    url = (f"https://www.transfermarkt.com/player/"
           f"geruechte/spieler/{player_id}")
    resp = session.get(url)
    if resp.status_code != 200:
        return []

    sel = Selector(text=resp.text)
    rumors = []

    for row in sel.css("div.large-8 table.items tbody tr"):
        destination = row.css("td.hauptlink a::text").get("").strip()
        if not destination:
            continue

        rumors.append({
            "destination_club": destination,
            "destination_league": row.css(
                "td.hauptlink img::attr(title)"
            ).get(""),
            "source": row.css("td.zentriert a::text").get("").strip(),
            "probability": row.css(
                "td.zentriert img::attr(title)"
            ).get(""),
            "date_reported": row.css("td.zentriert::text").get("").strip(),
            "fee_expectation": row.css(
                "td.rechts::text"
            ).get("").strip(),
        })

    return rumors


def scrape_injury_history(player_id: str,
                            session: httpx.Client) -> list[dict]:
    """Scrape injury history for a player."""
    url = (f"https://www.transfermarkt.com/player/"
           f"verletzungen/spieler/{player_id}")
    resp = session.get(url)
    if resp.status_code != 200:
        return []

    sel = Selector(text=resp.text)
    injuries = []

    for row in sel.css("table.items tbody tr"):
        cells = row.css("td")
        if len(cells) < 5:
            continue

        injuries.append({
            "season": cells[0].css("::text").get("").strip(),
            "injury": cells[1].css("::text").get("").strip(),
            "from_date": cells[2].css("::text").get("").strip(),
            "until_date": cells[3].css("::text").get("").strip(),
            "days_missed": cells[4].css("::text").get("").strip(),
            "games_missed": cells[5].css("::text").get("").strip() if len(cells) > 5 else "",
        })

    return injuries

Proxy Configuration

Transfermarkt's Cloudflare Bot Management is the hardest obstacle in this entire scraping pipeline. With a datacenter IP, you won't get past the challenge page consistently — the IP reputation score alone is enough to trigger a block, regardless of how well your headers and TLS fingerprint match a real browser.

ThorData's residential proxies are the practical solution. Residential IPs carry the trust score needed to pass Cloudflare's checks on the first request. For Transfermarkt specifically, European residential IPs (Germany, UK, Spain) tend to work best since the site is based in Hamburg and the majority of its traffic is European.

PROXY_USER = "your_user"
PROXY_PASS = "your_pass"
PROXY_HOST = "proxy.thordata.com"
PROXY_PORT = 9000

def get_eu_proxy() -> str:
    """Get a European residential proxy URL for Transfermarkt."""
    # Rotate between Germany, UK, and Spain for best results
    country = random.choice(["de", "gb", "es"])
    auth = f"{PROXY_USER}:{PROXY_PASS}"
    return f"http://{auth}@{PROXY_HOST}:{PROXY_PORT}?country={country}"


# Create session with European proxy
session = create_session(proxy=get_eu_proxy())

Keep your request rate under 15 per minute. Transfermarkt's rate detection is IP-based, and even residential IPs get flagged if the request cadence looks automated. Add time.sleep(random.uniform(4.0, 8.0)) between page requests.

Storing Results in SQLite

import sqlite3

def init_db(path: str = "transfermarkt.db") -> sqlite3.Connection:
    """Initialize database schema for Transfermarkt data."""
    conn = sqlite3.connect(path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS players (
            id TEXT PRIMARY KEY,
            name TEXT,
            nationality TEXT,
            position TEXT,
            height TEXT,
            preferred_foot TEXT,
            current_club TEXT,
            market_value TEXT,
            birth_date TEXT,
            contract_until TEXT,
            agent TEXT,
            shirt_number TEXT,
            scraped_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS transfers (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            player_id TEXT,
            season TEXT,
            date TEXT,
            from_club TEXT,
            from_club_country TEXT,
            to_club TEXT,
            to_club_country TEXT,
            fee TEXT,
            type TEXT,
            market_value_at_time TEXT,
            FOREIGN KEY (player_id) REFERENCES players(id)
        );

        CREATE TABLE IF NOT EXISTS valuations (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            player_id TEXT,
            date TEXT,
            value TEXT,
            club TEXT,
            age TEXT,
            FOREIGN KEY (player_id) REFERENCES players(id)
        );

        CREATE TABLE IF NOT EXISTS injuries (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            player_id TEXT,
            season TEXT,
            injury TEXT,
            from_date TEXT,
            until_date TEXT,
            days_missed TEXT,
            games_missed TEXT,
            FOREIGN KEY (player_id) REFERENCES players(id)
        );

        CREATE INDEX IF NOT EXISTS idx_transfers_player
            ON transfers(player_id);
        CREATE INDEX IF NOT EXISTS idx_valuations_player
            ON valuations(player_id);
    """)
    conn.commit()
    return conn


def save_player_complete(conn: sqlite3.Connection,
                          player: dict,
                          transfers: list[dict],
                          valuations: list[dict],
                          injuries: list[dict] = None) -> None:
    """Save complete player data including history tables."""
    pid = player.get("player_id")
    if not pid:
        return

    conn.execute("""
        INSERT OR REPLACE INTO players
        (id, name, nationality, position, height, preferred_foot,
         current_club, market_value, birth_date, contract_until,
         agent, shirt_number)
        VALUES (?,?,?,?,?,?,?,?,?,?,?,?)
    """, (
        pid, player.get("name"), player.get("nationality"),
        player.get("position"), player.get("height"),
        player.get("preferred_foot"), player.get("current_club"),
        player.get("market_value"), player.get("birth_date"),
        player.get("contract_until"), player.get("agent"),
        player.get("shirt_number"),
    ))

    for t in transfers:
        conn.execute("""
            INSERT INTO transfers
            (player_id, season, date, from_club, from_club_country,
             to_club, to_club_country, fee, type, market_value_at_time)
            VALUES (?,?,?,?,?,?,?,?,?,?)
        """, (
            pid, t.get("season"), t.get("date"),
            t.get("from_club"), t.get("from_club_country"),
            t.get("to_club"), t.get("to_club_country"),
            t.get("fee"), t.get("type"),
            t.get("market_value_at_time"),
        ))

    for v in valuations:
        conn.execute("""
            INSERT INTO valuations (player_id, date, value, club, age)
            VALUES (?,?,?,?,?)
        """, (pid, v.get("date"), v.get("value"),
               v.get("club"), v.get("age")))

    if injuries:
        for inj in injuries:
            conn.execute("""
                INSERT INTO injuries
                (player_id, season, injury, from_date,
                 until_date, days_missed, games_missed)
                VALUES (?,?,?,?,?,?,?)
            """, (
                pid, inj.get("season"), inj.get("injury"),
                inj.get("from_date"), inj.get("until_date"),
                inj.get("days_missed"), inj.get("games_missed"),
            ))

    conn.commit()

Full Scrape Pipeline

def scrape_player_complete(player_id: str,
                             player_name_slug: str,
                             session: httpx.Client,
                             conn: sqlite3.Connection) -> None:
    """Scrape all data for a single player and save to DB."""
    base_url = "https://www.transfermarkt.com"

    # Profile
    profile_url = f"{base_url}/{player_name_slug}/profil/spieler/{player_id}"
    profile = scrape_player(profile_url, session)
    if not profile:
        print(f"Failed to scrape profile for {player_id}")
        return

    print(f"  {profile.get('name', player_id)}: "
          f"{profile.get('market_value', 'N/A')}")
    time.sleep(random.uniform(3, 6))

    # Transfer history
    transfers = scrape_transfers(player_id, session)
    print(f"  {len(transfers)} transfers")
    time.sleep(random.uniform(3, 6))

    # Value history
    valuations = scrape_value_history(player_id, session)
    print(f"  {len(valuations)} valuation data points")
    time.sleep(random.uniform(3, 6))

    # Injury history
    injuries = scrape_injury_history(player_id, session)
    print(f"  {len(injuries)} injury records")
    time.sleep(random.uniform(3, 6))

    # Save everything
    save_player_complete(conn, profile, transfers, valuations, injuries)

Transfermarkt's Terms of Service prohibit automated scraping. The site has sent cease-and-desist letters to projects that published large-scale scraped datasets. While market value estimates are Transfermarkt's editorial product (not raw facts), transfer records are factual data that may have different legal treatment depending on your jurisdiction. EU database directive protections may apply to the compiled dataset. Keep your usage limited, don't redistribute raw data, and consult local regulations before building anything public.

Key Takeaways

Advanced Market Value Analysis

With a player database populated, you can run sophisticated analysis on the football transfer market:

def market_value_analysis(conn: sqlite3.Connection) -> None:
    """Analyze market value patterns across collected players."""
    print("=== Transfermarkt Market Value Analysis ===\n")

    # Distribution of current values
    print("Player value tiers:")
    for row in conn.execute("""
        SELECT
            CASE
                WHEN market_value LIKE '%€50m%'
                    OR market_value LIKE '%€100m%'
                    OR market_value LIKE '%€200m%' THEN 'Elite (€50m+)'
                WHEN market_value LIKE '%€20m%'
                    OR market_value LIKE '%€30m%'
                    OR market_value LIKE '%€40m%' THEN 'High (€20-50m)'
                WHEN market_value LIKE '%€10m%'
                    OR market_value LIKE '%€15m%' THEN 'Mid (€10-20m)'
                WHEN market_value LIKE '%€%m%' THEN 'Low (< €10m)'
                ELSE 'Unknown'
            END as tier,
            COUNT(*) as count
        FROM players
        GROUP BY tier
        ORDER BY count DESC
    """):
        print(f"  {row[0]:20}: {row[1]} players")

    # Most expensive positions
    print("\nMost transferred positions:")
    for row in conn.execute("""
        SELECT position, COUNT(*) as transfer_count,
               COUNT(DISTINCT player_id) as players
        FROM transfers t
        JOIN players p ON t.player_id = p.id
        WHERE t.type = 'paid'
          AND p.position IS NOT NULL
        GROUP BY position
        ORDER BY transfer_count DESC LIMIT 10
    """):
        print(f"  {row[0]:20}: {row[1]:4} paid transfers "
              f"({row[2]} players)")

    # Transfer fee escalation over seasons
    print("\nAverage transfer activity by season:")
    for row in conn.execute("""
        SELECT season, COUNT(*) as transfers,
               COUNT(CASE WHEN type = 'paid' THEN 1 END) as paid
        FROM transfers
        WHERE season IS NOT NULL AND season != ''
        GROUP BY season
        ORDER BY season DESC LIMIT 8
    """):
        print(f"  {row[0]}: {row[1]:4} total transfers, "
              f"{row[2]:3} paid")

    # Players with most clubs
    print("\nMost-traveled players (most clubs):")
    for row in conn.execute("""
        SELECT p.name, p.nationality, p.position,
               COUNT(DISTINCT t.to_club) as clubs
        FROM players p
        JOIN transfers t ON p.id = t.player_id
        WHERE t.to_club != '' AND t.to_club IS NOT NULL
        GROUP BY p.id
        ORDER BY clubs DESC LIMIT 10
    """):
        print(f"  {row[0]:25} ({row[1]}, {row[2]}): {row[3]} clubs")


def find_value_trajectory(conn: sqlite3.Connection,
                            player_id: str) -> None:
    """Analyze how a player's value changed over their career."""
    player = conn.execute(
        "SELECT name, position, current_club FROM players WHERE id = ?",
        (player_id,)
    ).fetchone()

    if not player:
        print(f"Player {player_id} not found")
        return

    print(f"\n=== Value History: {player[0]} ({player[1]}) ===\n")

    history = conn.execute("""
        SELECT date, value, club
        FROM valuations
        WHERE player_id = ?
        ORDER BY date
    """, (player_id,)).fetchall()

    if not history:
        print("No valuation history available")
        return

    for entry in history:
        print(f"  {entry[0]:12}: {entry[1]:12} — {entry[2]}")

    # Find peak
    print(f"\nCurrent club: {player[2]}")
    print(f"Career entries: {len(history)}")


def transfer_network_analysis(conn: sqlite3.Connection,
                                top_n_clubs: int = 15) -> dict:
    """Analyze transfer flows between clubs."""
    print(f"\n=== Transfer Flow Analysis (Top {top_n_clubs} Clubs) ===\n")

    # Most active clubs as buyers
    print("Top buying clubs (by paid transfer count):")
    for row in conn.execute("""
        SELECT to_club, COUNT(*) as purchases
        FROM transfers
        WHERE type = 'paid' AND to_club IS NOT NULL AND to_club != ''
        GROUP BY to_club
        ORDER BY purchases DESC LIMIT 10
    """):
        print(f"  {row[0]:30}: {row[1]:3} purchases")

    # Most active clubs as sellers
    print("\nTop selling clubs:")
    for row in conn.execute("""
        SELECT from_club, COUNT(*) as sales
        FROM transfers
        WHERE type = 'paid'
          AND from_club IS NOT NULL AND from_club != ''
        GROUP BY from_club
        ORDER BY sales DESC LIMIT 10
    """):
        print(f"  {row[0]:30}: {row[1]:3} sales")

    # Most common transfer corridors
    print("\nMost common transfer routes:")
    for row in conn.execute("""
        SELECT from_club, to_club, COUNT(*) as moves
        FROM transfers
        WHERE type IN ('paid', 'free_transfer')
          AND from_club != '' AND to_club != ''
          AND from_club IS NOT NULL AND to_club IS NOT NULL
        GROUP BY from_club, to_club
        ORDER BY moves DESC LIMIT 10
    """):
        print(f"  {row[0]:25} -> {row[1]:25}: {row[2]} players")

    return {}

These analytical capabilities — combined with the injury and rumor data — give you the building blocks for a proper football analytics product. The transfer market data is particularly valuable for betting models, where historical transfer activity between clubs and player career trajectory analysis can inform match outcome predictions.