How to Scrape Last.fm Music Data with Python (2026)

2026-04-09 [python lastfm scraping music-data api]

How to Scrape Last.fm Music Data with Python (2026)

Last.fm has been tracking listening habits since 2002, and their dataset is massive — billions of scrobbles across millions of users. What makes it interesting for data collection is that Last.fm still offers a generous free API with no OAuth required for read-only access. You just need an API key, and you can pull artist metadata, track stats, album info, user listening history, and tag data.

This guide walks through all the major Last.fm API endpoints with working Python code, covering everything from basic artist info to large-scale user scrobble collection.

Getting Your API Key

Last.fm hands out API keys for free at last.fm/api/account/create. Fill in an app name and description, and you'll get an API key and shared secret immediately. The key is all you need for read operations.

import httpx
import time

API_KEY = "your_lastfm_api_key"
BASE_URL = "http://ws.audioscrobbler.com/2.0/"

def lastfm_request(method: str, params: dict = None) -> dict:
    """Make a request to the Last.fm API."""
    request_params = {
        "method": method,
        "api_key": API_KEY,
        "format": "json",
    }
    if params:
        request_params.update(params)

    response = httpx.get(BASE_URL, params=request_params, timeout=30)
    response.raise_for_status()
    data = response.json()

    # Last.fm returns errors inside a 200 response
    if "error" in data:
        raise Exception(f"Last.fm API error {data['error']}: {data.get('message', 'Unknown error')}")

    return data

Fetching Artist Information

The artist.getInfo method returns biography, stats, tags, and similar artists:

def get_artist_info(artist: str, mbid: str = None) -> dict:
    """Get detailed info about an artist."""
    params = {"artist": artist} if not mbid else {"mbid": mbid}
    data = lastfm_request("artist.getInfo", params)
    artist_data = data["artist"]

    return {
        "name": artist_data["name"],
        "mbid": artist_data.get("mbid"),
        "listeners": int(artist_data["stats"]["listeners"]),
        "playcount": int(artist_data["stats"]["playcount"]),
        "bio_summary": artist_data["bio"]["summary"],
        "bio_full": artist_data["bio"].get("content", ""),
        "published": artist_data["bio"].get("published"),
        "tags": [t["name"] for t in artist_data["tags"]["tag"]],
        "similar": [s["name"] for s in artist_data["similar"]["artist"]],
        "url": artist_data["url"],
        "image": next(
            (img["#text"] for img in artist_data.get("image", []) if img["size"] == "extralarge"),
            None
        ),
    }


info = get_artist_info("Radiohead")
print(f"{info['name']}: {info['listeners']:,} listeners, {info['playcount']:,} plays")
print(f"Tags: {', '.join(info['tags'])}")
print(f"Similar: {', '.join(info['similar'][:5])}")

Scraping User Listening History

This is where Last.fm data gets really interesting. The user.getRecentTracks endpoint returns a user's full scrobble history, paginated at up to 200 tracks per page:

def get_user_scrobbles(username: str, limit: int = 200, page: int = 1,
                        from_ts: int = None, to_ts: int = None) -> dict:
    """Get a user's recent scrobbles with optional time range."""
    params = {
        "user": username,
        "limit": limit,
        "page": page,
    }
    if from_ts:
        params["from"] = from_ts
    if to_ts:
        params["to"] = to_ts

    data = lastfm_request("user.getRecentTracks", params)

    tracks_data = data["recenttracks"]
    total_pages = int(tracks_data["@attr"]["totalPages"])
    total_tracks = int(tracks_data["@attr"]["total"])

    tracks = []
    for t in tracks_data["track"]:
        # Skip currently playing tracks (no date)
        if "@attr" in t and t["@attr"].get("nowplaying") == "true":
            continue
        tracks.append({
            "artist": t["artist"]["#text"],
            "artist_mbid": t["artist"].get("mbid"),
            "track": t["name"],
            "album": t["album"]["#text"],
            "album_mbid": t["album"].get("mbid"),
            "timestamp": int(t["date"]["uts"]) if t.get("date") else None,
            "url": t["url"],
            "image": next(
                (img["#text"] for img in t.get("image", []) if img["size"] == "medium"),
                None
            ),
        })

    return {
        "tracks": tracks,
        "total_pages": total_pages,
        "total_tracks": total_tracks,
        "page": page
    }


def get_full_history(username: str, max_pages: int = None) -> list:
    """Fetch multiple pages of scrobble history."""
    all_tracks = []

    # Get first page to learn total
    first = get_user_scrobbles(username, page=1)
    all_tracks.extend(first["tracks"])
    total = first["total_pages"]
    if max_pages:
        total = min(total, max_pages)

    print(f"{username}: {first['total_tracks']:,} total scrobbles across {first['total_pages']} pages")

    for page in range(2, total + 1):
        time.sleep(0.3)  # ~3 req/s, under the 5 req/s limit
        result = get_user_scrobbles(username, page=page)
        all_tracks.extend(result["tracks"])
        if page % 10 == 0:
            print(f"  Page {page}/{total} -- {len(all_tracks)} tracks collected")

    return all_tracks


history = get_full_history("RJ", max_pages=5)
print(f"\nCollected {len(history)} scrobbles")
for t in history[:5]:
    print(f"  {t['artist']} -- {t['track']} (album: {t['album']})")

Track and Album Statistics

Pull detailed stats for individual tracks and albums:

def get_track_info(artist: str, track: str) -> dict:
    """Get detailed track info including listener counts and tags."""
    data = lastfm_request("track.getInfo", {
        "artist": artist,
        "track": track,
    })

    if "track" not in data:
        return None

    t = data["track"]
    return {
        "name": t["name"],
        "artist": t["artist"]["name"],
        "listeners": int(t.get("listeners", 0)),
        "playcount": int(t.get("playcount", 0)),
        "duration": int(t.get("duration", 0)),  # milliseconds
        "album": t.get("album", {}).get("title"),
        "tags": [tag["name"] for tag in t.get("toptags", {}).get("tag", [])],
        "url": t["url"],
        "mbid": t.get("mbid"),
    }


def get_top_tracks(artist: str, limit: int = 20) -> list:
    """Get an artist's top tracks by play count."""
    data = lastfm_request("artist.getTopTracks", {
        "artist": artist,
        "limit": limit,
    })

    return [
        {
            "name": t["name"],
            "playcount": int(t["playcount"]),
            "listeners": int(t["listeners"]),
            "url": t["url"],
            "rank": int(t["@attr"]["rank"]),
        }
        for t in data["toptracks"]["track"]
    ]


def get_album_info(artist: str, album: str) -> dict:
    """Get album details including track listing and stats."""
    data = lastfm_request("album.getInfo", {
        "artist": artist,
        "album": album,
    })
    album_data = data["album"]

    tracks = album_data.get("tracks", {}).get("track", [])
    if isinstance(tracks, dict):
        tracks = [tracks]  # single-track albums come as dict, not list

    return {
        "name": album_data["name"],
        "artist": album_data["artist"],
        "mbid": album_data.get("mbid"),
        "listeners": int(album_data.get("listeners", 0)),
        "playcount": int(album_data.get("playcount", 0)),
        "tracks": [
            {
                "name": t["name"],
                "duration": int(t.get("duration", 0)),
                "rank": int(t.get("@attr", {}).get("rank", 0)),
                "url": t.get("url"),
            }
            for t in tracks
        ],
        "tags": [t["name"] for t in album_data.get("tags", {}).get("tag", [])],
        "url": album_data["url"],
    }


top = get_top_tracks("Daft Punk")
for t in top[:5]:
    print(f"{t['rank']}. {t['name']} -- {t['playcount']:,} plays, {t['listeners']:,} listeners")

Chart Data and Tag Exploration

Last.fm tags are user-generated and surprisingly useful for music classification:

def get_tag_info(tag: str) -> dict:
    """Get info about a tag including description and stats."""
    data = lastfm_request("tag.getInfo", {"tag": tag})
    tag_data = data["tag"]

    return {
        "name": tag_data["name"],
        "reach": int(tag_data.get("reach", 0)),     # number of listeners who use this tag
        "total": int(tag_data.get("total", 0)),      # total number of times applied
        "wiki": tag_data.get("wiki", {}).get("summary", ""),
        "url": tag_data.get("url", ""),
    }


def get_tag_top_artists(tag: str, limit: int = 50) -> list:
    """Get top artists for a given tag/genre."""
    data = lastfm_request("tag.getTopArtists", {
        "tag": tag,
        "limit": limit,
    })

    return [
        {
            "name": a["name"],
            "url": a["url"],
            "rank": int(a["@attr"]["rank"]),
            "mbid": a.get("mbid"),
        }
        for a in data["topartists"]["artist"]
    ]


def get_tag_top_tracks(tag: str, limit: int = 50) -> list:
    """Get top tracks for a given tag."""
    data = lastfm_request("tag.getTopTracks", {
        "tag": tag,
        "limit": limit,
    })

    return [
        {
            "name": t["name"],
            "artist": t["artist"]["name"],
            "url": t["url"],
            "rank": int(t["@attr"]["rank"]),
        }
        for t in data["tracks"]["track"]
    ]


def get_user_top_artists(username: str, period: str = "12month",
                          limit: int = 50) -> list:
    """Get a user's top artists. Period: overall, 7day, 1month, 3month, 6month, 12month."""
    data = lastfm_request("user.getTopArtists", {
        "user": username,
        "period": period,
        "limit": limit,
    })

    return [
        {
            "name": a["name"],
            "playcount": int(a["playcount"]),
            "rank": int(a["@attr"]["rank"]),
            "url": a["url"],
            "mbid": a.get("mbid"),
        }
        for a in data["topartists"]["artist"]
    ]


shoegaze = get_tag_top_artists("shoegaze", limit=10)
print("Top shoegaze artists on Last.fm:")
for a in shoegaze:
    print(f"  {a['rank']}. {a['name']}")

User Stats and Profiles

Get statistics and profile info for any public user:

def get_user_info(username: str) -> dict:
    """Get user profile information."""
    data = lastfm_request("user.getInfo", {"user": username})
    user = data["user"]

    return {
        "name": user["name"],
        "realname": user.get("realname", ""),
        "country": user.get("country"),
        "age": user.get("age"),
        "subscriber": user.get("subscriber") == "1",
        "playcount": int(user.get("playcount", 0)),
        "artist_count": int(user.get("artist_count", 0)),
        "track_count": int(user.get("track_count", 0)),
        "album_count": int(user.get("album_count", 0)),
        "playlists": int(user.get("playlists", 0)),
        "registered": user.get("registered", {}).get("unixtime"),
        "url": user["url"],
        "image": next(
            (img["#text"] for img in user.get("image", []) if img["size"] == "large"),
            None
        ),
    }


def get_user_top_tracks(username: str, period: str = "overall",
                         limit: int = 50) -> list:
    """Get a user's most-played tracks."""
    data = lastfm_request("user.getTopTracks", {
        "user": username,
        "period": period,
        "limit": limit,
    })

    return [
        {
            "name": t["name"],
            "artist": t["artist"]["name"],
            "playcount": int(t["playcount"]),
            "rank": int(t["@attr"]["rank"]),
            "url": t["url"],
        }
        for t in data["toptracks"]["track"]
    ]


user = get_user_info("RJ")
print(f"{user['name']}: {user['playcount']:,} scrobbles, {user['artist_count']:,} artists")

Searching for Artists and Tracks

Last.fm's search API lets you find artists and tracks by name:

def search_artists(query: str, limit: int = 20) -> list:
    """Search for artists by name."""
    data = lastfm_request("artist.search", {
        "artist": query,
        "limit": limit,
    })

    return [
        {
            "name": a["name"],
            "listeners": int(a.get("listeners", 0)),
            "url": a["url"],
            "mbid": a.get("mbid"),
        }
        for a in data["results"]["artistmatches"]["artist"]
    ]


def search_tracks(query: str, artist: str = None, limit: int = 20) -> list:
    """Search for tracks."""
    params = {"track": query, "limit": limit}
    if artist:
        params["artist"] = artist

    data = lastfm_request("track.search", params)

    return [
        {
            "name": t["name"],
            "artist": t["artist"],
            "listeners": int(t.get("listeners", 0)),
            "url": t["url"],
        }
        for t in data["results"]["trackmatches"]["track"]
    ]


results = search_artists("the national")
for r in results[:5]:
    print(f"{r['name']}: {r['listeners']:,} listeners")

Anti-Bot Measures and Rate Limits

Last.fm's API is relatively open, but there are limits you'll hit on larger collection jobs:

Rate limiting. The official limit is 5 requests per second per API key. Exceed that consistently, and you'll get 429 responses. The user.getRecentTracks endpoint is stricter — heavy pagination triggers temporary blocks faster than other methods.

API key suspension. Last.fm monitors for abusive usage patterns. Collecting full scrobble histories for thousands of users in a short period can get your API key revoked without warning.

IP-based throttling. Even with a valid API key, datacenter IPs get rate-limited more aggressively. If you're running collection from a VPS, your effective rate limit may be lower than the documented 5 req/s.

Pagination depth. Some users have 500,000+ scrobbles. Paginating through all of that at 200 per page means 2,500+ requests for a single user. At the rate limit, that's over 8 minutes per user.

For large-scale music data projects — like building a recommendation engine from thousands of user profiles — you need to spread requests across multiple IPs. ThorData's residential proxies work well here because their IPs rotate automatically and don't carry the datacenter stigma that triggers Last.fm's stricter throttling:

import random
import httpx

PROXY_URL = "http://USER:[email protected]:9000"

def lastfm_request_proxied(method: str, params: dict = None) -> dict:
    """Make a proxied Last.fm API request via ThorData."""
    request_params = {
        "method": method,
        "api_key": API_KEY,
        "format": "json",
    }
    if params:
        request_params.update(params)

    with httpx.Client(proxy=PROXY_URL, timeout=30) as client:
        response = client.get(BASE_URL, params=request_params)
        response.raise_for_status()

    data = response.json()
    if "error" in data:
        raise Exception(f"Last.fm error {data['error']}: {data.get('message')}")

    return data

# Scrape multiple users with proxy rotation
users_to_scrape = ["RJ", "user2", "user3", "user4", "user5"]
for username in users_to_scrape:
    try:
        data = lastfm_request_proxied("user.getTopArtists", {
            "user": username,
            "period": "12month",
            "limit": 50,
        })
        count = len(data.get("topartists", {}).get("artist", []))
        print(f"{username}: {count} top artists")
    except Exception as e:
        print(f"{username}: error - {e}")

    time.sleep(random.uniform(0.3, 0.8))

Storing Data in SQLite

For large collection jobs, store results locally:

import sqlite3
import json
from datetime import datetime

def init_db(db_path: str = "lastfm.db") -> sqlite3.Connection:
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS artists (
            name TEXT PRIMARY KEY,
            mbid TEXT,
            listeners INTEGER,
            playcount INTEGER,
            bio TEXT,
            tags TEXT,
            similar TEXT,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS scrobbles (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            username TEXT NOT NULL,
            artist TEXT,
            track TEXT,
            album TEXT,
            timestamp INTEGER,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS users (
            username TEXT PRIMARY KEY,
            playcount INTEGER,
            artist_count INTEGER,
            track_count INTEGER,
            album_count INTEGER,
            country TEXT,
            registered TEXT,
            subscriber INTEGER DEFAULT 0,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );

        CREATE INDEX IF NOT EXISTS idx_scrobbles_user ON scrobbles(username);
        CREATE INDEX IF NOT EXISTS idx_scrobbles_artist ON scrobbles(artist);
        CREATE INDEX IF NOT EXISTS idx_scrobbles_ts ON scrobbles(timestamp);
    """)
    conn.commit()
    return conn

def save_artist(conn: sqlite3.Connection, artist: dict) -> None:
    conn.execute(
        """INSERT OR REPLACE INTO artists
           (name, mbid, listeners, playcount, bio, tags, similar)
           VALUES (?,?,?,?,?,?,?)""",
        (
            artist["name"],
            artist.get("mbid"),
            artist.get("listeners", 0),
            artist.get("playcount", 0),
            artist.get("bio_summary", ""),
            json.dumps(artist.get("tags", [])),
            json.dumps(artist.get("similar", [])),
        )
    )
    conn.commit()

def save_scrobbles(conn: sqlite3.Connection, username: str,
                    tracks: list) -> None:
    rows = [
        (username, t["artist"], t["track"], t["album"], t.get("timestamp"))
        for t in tracks
    ]
    conn.executemany(
        "INSERT INTO scrobbles (username, artist, track, album, timestamp) VALUES (?,?,?,?,?)",
        rows
    )
    conn.commit()

conn = init_db()
artist_data = get_artist_info("Radiohead")
save_artist(conn, artist_data)

Building a Music Recommendation Dataset

Last.fm data is particularly well-suited for building collaborative filtering recommendation systems:

def build_cf_dataset(conn: sqlite3.Connection,
                      min_scrobbles_per_user: int = 100,
                      output_file: str = "cf_matrix.csv") -> None:
    """
    Export user-artist play count data for collaborative filtering.
    Suitable for use with scikit-learn, Surprise, or LightFM.
    """
    import csv

    users = conn.execute("""
        SELECT username, COUNT(*) as total
        FROM scrobbles
        GROUP BY username
        HAVING total >= ?
        ORDER BY total DESC
    """, (min_scrobbles_per_user,)).fetchall()

    print(f"Building CF dataset from {len(users)} users...")

    with open(output_file, "w", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["username", "artist", "play_count"])

        for username, _ in users:
            artist_counts = conn.execute("""
                SELECT artist, COUNT(*) as plays
                FROM scrobbles
                WHERE username = ?
                GROUP BY artist
                ORDER BY plays DESC
                LIMIT 100
            """, (username,)).fetchall()

            for artist, plays in artist_counts:
                if artist:
                    writer.writerow([username, artist, plays])

    print(f"CF dataset saved to {output_file}")


def get_similar_users_jaccard(conn: sqlite3.Connection,
                               target_user: str,
                               top_n: int = 10) -> list[dict]:
    """
    Find users with similar taste using Jaccard similarity
    on top-50 artist sets.
    """
    target_artists = set(
        row[0] for row in conn.execute("""
            SELECT artist FROM scrobbles
            WHERE username = ?
            GROUP BY artist
            ORDER BY COUNT(*) DESC LIMIT 50
        """, (target_user,))
    )

    if not target_artists:
        return []

    other_users = conn.execute("""
        SELECT DISTINCT username FROM scrobbles WHERE username != ?
    """, (target_user,)).fetchall()

    similarities = []
    for (username,) in other_users:
        user_artists = set(
            row[0] for row in conn.execute("""
                SELECT artist FROM scrobbles
                WHERE username = ?
                GROUP BY artist
                ORDER BY COUNT(*) DESC LIMIT 50
            """, (username,))
        )

        intersection = len(target_artists & user_artists)
        union = len(target_artists | user_artists)
        jaccard = intersection / union if union > 0 else 0

        if jaccard > 0.1:  # Only include reasonably similar users
            similarities.append({
                "username": username,
                "similarity": jaccard,
                "shared_artists": intersection,
            })

    return sorted(similarities, key=lambda x: -x["similarity"])[:top_n]

Genre Classification with Last.fm Tags

Last.fm's tag system is one of the most comprehensive genre/style taxonomies available:

GENRE_MAP = {
    "rock": ["rock", "alternative rock", "indie rock", "classic rock",
             "hard rock", "progressive rock", "post-rock", "grunge"],
    "pop": ["pop", "indie pop", "synth-pop", "art pop", "electropop",
            "dance pop", "k-pop"],
    "electronic": ["electronic", "electronica", "ambient", "techno",
                   "house", "edm", "idm", "dubstep", "drum and bass",
                   "trance"],
    "hip-hop": ["hip-hop", "hip hop", "rap", "trap", "r&b", "soul",
                "grime"],
    "metal": ["metal", "heavy metal", "death metal", "black metal",
              "doom metal", "post-metal", "progressive metal", "thrash"],
    "jazz": ["jazz", "bebop", "jazz fusion", "free jazz", "cool jazz"],
    "classical": ["classical", "orchestral", "contemporary classical",
                  "opera", "chamber music", "baroque"],
    "folk": ["folk", "folk rock", "acoustic", "singer-songwriter",
             "americana", "country", "bluegrass"],
}


def classify_artists_by_genre(artist_names: list[str]) -> dict:
    """Classify a list of artists into broad genre buckets."""
    results = {}

    for artist_name in artist_names:
        try:
            info = get_artist_info(artist_name)
            tags = [t.lower() for t in info.get("tags", [])]

            genres = set()
            for genre, keywords in GENRE_MAP.items():
                if any(kw in tags for kw in keywords):
                    genres.add(genre)

            results[artist_name] = {
                "genres": list(genres) or ["other"],
                "raw_tags": tags[:10],
                "listeners": info.get("listeners", 0),
            }
            time.sleep(0.3)

        except Exception as e:
            results[artist_name] = {"genres": ["unknown"], "error": str(e)}

    return results

Practical Tips

Cache aggressively. Artist and album metadata doesn't change often. Store results in SQLite and only re-fetch weekly. User scrobble history that's more than 24 hours old is effectively immutable.
Use MBID where possible. Last.fm returns MusicBrainz IDs for many entities. Using MBID instead of text names avoids ambiguity issues with artists that share names.
Handle missing data. Not all fields are always present. Bio sections, images, and tags can be empty — always use .get() with fallbacks.
Use time ranges for incremental updates. The user.getRecentTracks endpoint accepts from and to Unix timestamps. For updating a user's history, only fetch since the last stored timestamp instead of re-paginating everything.
Combine with MusicBrainz. Last.fm has great social data (scrobbles, listeners) but limited structured metadata. Cross-reference with MusicBrainz for release dates, labels, and relationships.
Parallel collection with multiple API keys. If you need to collect at higher speeds, create multiple Last.fm API keys under different accounts and distribute the load. Keep each key under the 5 req/s limit.

Conclusion

Last.fm remains one of the best public sources for music listening data in 2026. The API is straightforward, the data is deep, and with reasonable rate limiting you can build impressive music datasets without much friction. For large-scale collection across many users or artists, rotating residential proxies via ThorData help you stay under IP-based throttling limits while keeping your API key in good standing. The combination of scrobble history, artist metadata, and tag taxonomy makes Last.fm genuinely valuable for music recommendation systems, listening analytics dashboards, and music discovery tools.