Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API

2026-04-09 spotify web-scraping api music python

Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API

Spotify is unusual among major platforms -- they actually want developers using their data. The Spotify Web API is free, well-documented, and gives you access to a staggering amount of metadata: 100 million+ tracks, artist profiles, album details, playlist contents, and even audio analysis features like tempo, key, danceability, and acousticness. No scraping required for most use cases.

The catch? Rate limits are strict, some endpoints have become more restrictive, and certain data (like actual play counts) is deliberately withheld. Here's the practical guide to getting everything you need.

Setting Up Authentication
Client Credentials vs. Authorization Code Flow
Extracting Playlist Data and Tracks
Audio Features: The Hidden Gold
Artist Deep Dives: Profiles, Top Tracks, and Albums
Album Data and Track Listings
Search Across the Catalog
New Releases and Category Browsing
Related Artists and Recommendation Seeds
User Data with Authorization Code Flow
Pagination: Handling Large Result Sets
Rate Limits and How to Handle Them
Storing Spotify Data: SQLite Schema
Building Complete Datasets
Spotify Web Playback and Embed APIs
Real Use Cases
Common Errors and Fixes

1. Setting Up Authentication {#auth}

Spotify uses OAuth 2.0. For data extraction (no user-specific data), the Client Credentials flow is simplest -- you get an app token without any user login.

Creating a Spotify App

Go to the Spotify Developer Dashboard and create an app
Set a redirect URI (even http://localhost:8080 works for Client Credentials -- it's not used)
Note your Client ID and Client Secret
No approval process required for basic API access

import requests
import base64
import time
import json
from functools import lru_cache

class SpotifyClient:
    """Spotify API client with automatic token refresh and retry logic."""

    BASE_URL = "https://api.spotify.com/v1"
    TOKEN_URL = "https://accounts.spotify.com/api/token"

    def __init__(self, client_id: str, client_secret: str):
        self.client_id = client_id
        self.client_secret = client_secret
        self._token = None
        self._token_expires = 0
        self._request_count = 0

    def _get_token(self) -> str:
        """Get or refresh access token using Client Credentials flow."""
        if self._token and time.time() < self._token_expires - 60:
            return self._token

        auth = base64.b64encode(
            f"{self.client_id}:{self.client_secret}".encode()
        ).decode()

        resp = requests.post(
            self.TOKEN_URL,
            headers={
                "Authorization": f"Basic {auth}",
                "Content-Type": "application/x-www-form-urlencoded",
            },
            data={"grant_type": "client_credentials"},
            timeout=10,
        )
        resp.raise_for_status()
        data = resp.json()

        self._token = data["access_token"]
        self._token_expires = time.time() + data["expires_in"]
        return self._token

    def get(self, endpoint: str, params: dict = None,
            retries: int = 3) -> dict:
        """Make authenticated GET request with rate limit handling."""
        url = endpoint if endpoint.startswith("http") else f"{self.BASE_URL}/{endpoint}"
        headers = {"Authorization": f"Bearer {self._get_token()}"}
        self._request_count += 1

        for attempt in range(retries):
            resp = requests.get(url, headers=headers,
                                params=params or {}, timeout=15)

            if resp.status_code == 429:
                retry_after = int(resp.headers.get("Retry-After", 5))
                print(f"Rate limited, waiting {retry_after}s "
                      f"(total requests: {self._request_count})")
                time.sleep(retry_after + 1)
                continue

            if resp.status_code == 401:
                # Token expired mid-session
                self._token = None
                headers["Authorization"] = f"Bearer {self._get_token()}"
                continue

            if resp.status_code == 503:
                time.sleep(2 ** attempt)
                continue

            resp.raise_for_status()
            return resp.json()

        raise Exception(f"Failed after {retries} retries: {url}")

# Initialize client
spotify = SpotifyClient("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET")

2. Client Credentials vs. Authorization Code Flow {#auth-flows}

Feature	Client Credentials	Authorization Code
User data	No	Yes
Public catalog	Yes	Yes
Rate limit	Per-app	Per-user + per-app
Setup complexity	Low	Medium
Use case	Data collection	User integrations

For data collection, always use Client Credentials. Authorization Code is only needed for accessing user-specific data (saved tracks, listening history, etc.).

Authorization Code Flow (for user data)

import urllib.parse
import secrets

class SpotifyUserClient(SpotifyClient):
    """Extended client supporting user authorization."""

    AUTHORIZE_URL = "https://accounts.spotify.com/authorize"

    def get_auth_url(self, redirect_uri: str,
                     scopes: list[str]) -> tuple[str, str]:
        """Generate authorization URL and state token."""
        state = secrets.token_urlsafe(16)
        params = {
            "client_id": self.client_id,
            "response_type": "code",
            "redirect_uri": redirect_uri,
            "scope": " ".join(scopes),
            "state": state,
        }
        url = f"{self.AUTHORIZE_URL}?{urllib.parse.urlencode(params)}"
        return url, state

    def exchange_code_for_token(self, code: str,
                                 redirect_uri: str) -> dict:
        """Exchange authorization code for access + refresh tokens."""
        auth = base64.b64encode(
            f"{self.client_id}:{self.client_secret}".encode()
        ).decode()

        resp = requests.post(
            self.TOKEN_URL,
            headers={"Authorization": f"Basic {auth}"},
            data={
                "grant_type": "authorization_code",
                "code": code,
                "redirect_uri": redirect_uri,
            }
        )
        resp.raise_for_status()
        return resp.json()

    def refresh_user_token(self, refresh_token: str) -> str:
        """Refresh an expired user access token."""
        auth = base64.b64encode(
            f"{self.client_id}:{self.client_secret}".encode()
        ).decode()
        resp = requests.post(
            self.TOKEN_URL,
            headers={"Authorization": f"Basic {auth}"},
            data={"grant_type": "refresh_token", "refresh_token": refresh_token}
        )
        resp.raise_for_status()
        return resp.json()["access_token"]

3. Extracting Playlist Data and Tracks {#playlists}

Playlists are the most common target. A single playlist can have up to 10,000 tracks, returned in pages of 100:

def get_playlist_info(playlist_id: str) -> dict:
    """Get playlist metadata."""
    data = spotify.get(f"playlists/{playlist_id}", params={
        "fields": "id,name,description,owner,followers,public,"
                  "snapshot_id,images,tracks.total"
    })
    return {
        "id": data["id"],
        "name": data["name"],
        "description": data.get("description", ""),
        "owner": data["owner"]["display_name"],
        "owner_id": data["owner"]["id"],
        "followers": data.get("followers", {}).get("total", 0),
        "is_public": data.get("public"),
        "total_tracks": data["tracks"]["total"],
        "snapshot_id": data.get("snapshot_id"),
        "image": data.get("images", [{}])[0].get("url"),
    }

def get_playlist_tracks(playlist_id: str,
                         market: str = "US") -> list[dict]:
    """Get all tracks from a Spotify playlist."""
    tracks = []
    offset = 0
    limit = 100

    while True:
        data = spotify.get(
            f"playlists/{playlist_id}/tracks",
            params={
                "offset": offset,
                "limit": limit,
                "market": market,
                "fields": ("items(added_at,added_by.id,"
                           "track(id,name,artists,album,duration_ms,"
                           "popularity,explicit,preview_url,"
                           "external_urls,is_local,type)),"
                           "next,total"),
            }
        )

        for item in data.get("items", []):
            track = item.get("track")
            if not track or not track.get("id"):
                # Skip local files and unavailable tracks
                continue

            tracks.append({
                "id": track["id"],
                "name": track["name"],
                "artists": [{"id": a["id"], "name": a["name"]}
                            for a in track.get("artists", [])],
                "artist_names": ", ".join(a["name"]
                                          for a in track.get("artists", [])),
                "album": track.get("album", {}).get("name"),
                "album_id": track.get("album", {}).get("id"),
                "album_type": track.get("album", {}).get("album_type"),
                "release_date": track.get("album", {}).get("release_date"),
                "duration_ms": track["duration_ms"],
                "duration_seconds": track["duration_ms"] / 1000,
                "popularity": track.get("popularity", 0),
                "explicit": track.get("explicit", False),
                "preview_url": track.get("preview_url"),
                "spotify_url": track.get("external_urls", {}).get("spotify"),
                "added_at": item.get("added_at"),
                "added_by": item.get("added_by", {}).get("id"),
            })

        if not data.get("next"):
            break

        offset += limit
        time.sleep(0.3)  # Polite delay

    return tracks

def get_playlist_full(playlist_id: str) -> dict:
    """Get complete playlist data including all tracks and metadata."""
    info = get_playlist_info(playlist_id)
    tracks = get_playlist_tracks(playlist_id)
    info["tracks"] = tracks
    info["actual_track_count"] = len(tracks)
    return info

# Example: Get Today's Top Hits
top_hits = get_playlist_full("37i9dQZF1DXcBWIGoYBM5M")
print(f"Playlist: {top_hits['name']}")
print(f"Followers: {top_hits['followers']:,}")
print(f"Tracks: {len(top_hits['tracks'])}")

4. Audio Features: The Hidden Gold {#audio-features}

This is what makes Spotify's API special. The /audio-features endpoint returns machine-analyzed attributes for every track. You can request up to 100 tracks in a single batch call:

def get_audio_features(track_ids: list[str]) -> dict[str, dict]:
    """Get audio features for tracks (batch of up to 100). Returns dict keyed by track ID."""
    all_features = {}

    for i in range(0, len(track_ids), 100):
        batch = track_ids[i:i+100]
        data = spotify.get("audio-features", params={"ids": ",".join(batch)})

        for feat in data.get("audio_features", []):
            if not feat:
                continue
            all_features[feat["id"]] = {
                "id": feat["id"],
                # Rhythm and energy
                "danceability": feat["danceability"],    # 0.0-1.0: dance-ability
                "energy": feat["energy"],                # 0.0-1.0: intensity
                "tempo": feat["tempo"],                  # BPM
                "time_signature": feat["time_signature"],# beats per bar (3,4,5,6,7)
                "loudness": feat["loudness"],            # dB, typically -60 to 0
                # Tone and mood
                "key": feat["key"],                      # -1=no key, 0=C, 1=C#...11=B
                "mode": feat["mode"],                    # 0=minor, 1=major
                "valence": feat["valence"],              # 0=sad/negative, 1=happy
                "speechiness": feat["speechiness"],      # 0=no speech, 1=all speech
                "acousticness": feat["acousticness"],    # 0=electric, 1=acoustic
                "instrumentalness": feat["instrumentalness"], # >0.5 = likely no vocals
                "liveness": feat["liveness"],            # >0.8 = likely live recording
                # Duration
                "duration_ms": feat["duration_ms"],
            }

        time.sleep(0.5)

    return all_features

# Key legend
KEY_NAMES = {-1: "No key", 0: "C", 1: "C#/Db", 2: "D", 3: "D#/Eb",
             4: "E", 5: "F", 6: "F#/Gb", 7: "G", 8: "G#/Ab",
             9: "A", 10: "A#/Bb", 11: "B"}

def describe_audio_features(features: dict) -> str:
    """Human-readable description of audio features."""
    key = KEY_NAMES.get(features.get("key", -1), "Unknown")
    mode = "major" if features.get("mode") == 1 else "minor"
    return (
        f"Key: {key} {mode} | "
        f"Tempo: {features.get('tempo', 0):.0f} BPM | "
        f"Energy: {features.get('energy', 0):.2f} | "
        f"Danceability: {features.get('danceability', 0):.2f} | "
        f"Valence: {features.get('valence', 0):.2f} | "
        f"Acousticness: {features.get('acousticness', 0):.2f}"
    )

# Usage example
track_ids = ["4iV5W9uYEdYUVa79Axb7Rh", "1301WleyT98MSxVHPZCA6M"]
features = get_audio_features(track_ids)
for tid, feat in features.items():
    print(f"{tid}: {describe_audio_features(feat)}")

Audio Analysis (More Granular, More Expensive)

For beat-level analysis (individual beats, bars, sections, tatums), use the /audio-analysis endpoint -- but it's one track at a time and significantly slower:

def get_audio_analysis(track_id: str) -> dict:
    """Get detailed audio analysis for a single track."""
    data = spotify.get(f"audio-analysis/{track_id}")
    return {
        "track_id": track_id,
        "duration": data.get("track", {}).get("duration"),
        "tempo": data.get("track", {}).get("tempo"),
        "key": data.get("track", {}).get("key"),
        "time_signature": data.get("track", {}).get("time_signature"),
        "bars": len(data.get("bars", [])),
        "beats": len(data.get("beats", [])),
        "sections": len(data.get("sections", [])),
        "segments": len(data.get("segments", [])),
        "tatums": len(data.get("tatums", [])),
        "sections_data": [
            {
                "start": s["start"],
                "duration": s["duration"],
                "tempo": s["tempo"],
                "key": s["key"],
                "loudness": s["loudness"],
            }
            for s in data.get("sections", [])
        ],
    }

5. Artist Deep Dives: Profiles, Top Tracks, and Albums {#artists}

def get_artist_complete(artist_id: str, market: str = "US") -> dict:
    """Get comprehensive artist data including profile, top tracks, and discography."""
    # Base profile
    artist = spotify.get(f"artists/{artist_id}")

    # Top tracks in market
    top_tracks_data = spotify.get(
        f"artists/{artist_id}/top-tracks",
        params={"market": market}
    )

    # Albums and singles
    albums_data = spotify.get(
        f"artists/{artist_id}/albums",
        params={
            "limit": 50,
            "include_groups": "album,single,compilation",
            "market": market,
        }
    )

    # Related artists
    related_data = spotify.get(f"artists/{artist_id}/related-artists")

    return {
        "id": artist["id"],
        "name": artist["name"],
        "genres": artist.get("genres", []),
        "followers": artist.get("followers", {}).get("total", 0),
        "popularity": artist["popularity"],
        "images": [img["url"] for img in artist.get("images", [])],
        "external_url": artist.get("external_urls", {}).get("spotify"),
        "top_tracks": [
            {
                "id": t["id"],
                "name": t["name"],
                "album": t["album"]["name"],
                "popularity": t["popularity"],
                "preview_url": t.get("preview_url"),
                "duration_ms": t["duration_ms"],
            }
            for t in top_tracks_data.get("tracks", [])
        ],
        "discography": [
            {
                "id": a["id"],
                "name": a["name"],
                "type": a["album_type"],
                "release_date": a["release_date"],
                "total_tracks": a["total_tracks"],
                "image": a.get("images", [{}])[0].get("url"),
            }
            for a in albums_data.get("items", [])
        ],
        "related_artists": [
            {
                "id": r["id"],
                "name": r["name"],
                "genres": r.get("genres", []),
                "followers": r.get("followers", {}).get("total", 0),
                "popularity": r["popularity"],
            }
            for r in related_data.get("artists", [])
        ],
    }

def get_multiple_artists(artist_ids: list[str]) -> list[dict]:
    """Batch fetch artist profiles (up to 50 per request)."""
    results = []
    for i in range(0, len(artist_ids), 50):
        batch = artist_ids[i:i+50]
        data = spotify.get("artists", params={"ids": ",".join(batch)})
        for artist in data.get("artists", []):
            if artist:
                results.append({
                    "id": artist["id"],
                    "name": artist["name"],
                    "genres": artist.get("genres", []),
                    "followers": artist.get("followers", {}).get("total", 0),
                    "popularity": artist["popularity"],
                })
        time.sleep(0.3)
    return results

6. Album Data and Track Listings {#albums}

def get_album_complete(album_id: str, market: str = "US") -> dict:
    """Get album data with all tracks."""
    album = spotify.get(f"albums/{album_id}", params={"market": market})

    tracks = []
    page = spotify.get(f"albums/{album_id}/tracks",
                       params={"limit": 50, "market": market})
    while True:
        for t in page.get("items", []):
            tracks.append({
                "id": t["id"],
                "name": t["name"],
                "track_number": t["track_number"],
                "disc_number": t["disc_number"],
                "duration_ms": t["duration_ms"],
                "explicit": t.get("explicit", False),
                "artists": [a["name"] for a in t.get("artists", [])],
                "preview_url": t.get("preview_url"),
            })
        if not page.get("next"):
            break
        page = spotify.get(page["next"])
        time.sleep(0.3)

    return {
        "id": album["id"],
        "name": album["name"],
        "type": album["album_type"],
        "artists": [a["name"] for a in album.get("artists", [])],
        "release_date": album["release_date"],
        "total_tracks": album["total_tracks"],
        "label": album.get("label"),
        "copyright": [c["text"] for c in album.get("copyrights", [])],
        "genres": album.get("genres", []),
        "popularity": album.get("popularity"),
        "image": album.get("images", [{}])[0].get("url"),
        "tracks": tracks,
        "external_url": album.get("external_urls", {}).get("spotify"),
    }

def get_albums_batch(album_ids: list[str], market: str = "US") -> list[dict]:
    """Fetch up to 20 albums in a single request."""
    results = []
    for i in range(0, len(album_ids), 20):
        batch = album_ids[i:i+20]
        data = spotify.get("albums", params={
            "ids": ",".join(batch),
            "market": market,
        })
        for album in data.get("albums", []):
            if album:
                results.append({
                    "id": album["id"],
                    "name": album["name"],
                    "release_date": album["release_date"],
                    "total_tracks": album["total_tracks"],
                    "artists": [a["name"] for a in album.get("artists", [])],
                    "popularity": album.get("popularity"),
                })
        time.sleep(0.3)
    return results

7. Search Across the Catalog {#search}

def search_spotify(query: str, search_types: list[str] = None,
                   market: str = "US", limit: int = 50) -> dict:
    """Search Spotify catalog. Types: track, artist, album, playlist, show, episode."""
    if search_types is None:
        search_types = ["track"]

    results = {t: [] for t in search_types}
    offset = 0

    while offset < limit:
        batch_size = min(50, limit - offset)
        data = spotify.get("search", params={
            "q": query,
            "type": ",".join(search_types),
            "limit": batch_size,
            "offset": offset,
            "market": market,
        })

        for search_type in search_types:
            items_key = f"{search_type}s"
            items = data.get(items_key, {}).get("items", [])
            results[search_type].extend(items)

        # Check if any type has more results
        has_more = any(
            data.get(f"{t}s", {}).get("next")
            for t in search_types
        )
        if not has_more:
            break

        offset += batch_size
        time.sleep(0.3)

    return results

def search_tracks(query: str, limit: int = 50) -> list[dict]:
    """Search for tracks and return cleaned results."""
    raw = search_spotify(query, ["track"], limit=limit)
    return [
        {
            "id": t["id"],
            "name": t["name"],
            "artists": [a["name"] for a in t.get("artists", [])],
            "album": t.get("album", {}).get("name"),
            "release_date": t.get("album", {}).get("release_date"),
            "popularity": t.get("popularity", 0),
            "duration_ms": t["duration_ms"],
            "explicit": t.get("explicit", False),
            "preview_url": t.get("preview_url"),
        }
        for t in raw["track"]
        if t  # filter out None entries
    ]

def search_by_genre(genre: str, limit: int = 50) -> list[dict]:
    """Search for tracks in a specific genre."""
    return search_tracks(f"genre:{genre}", limit=limit)

def search_artist_discography(artist_name: str) -> dict:
    """Search for an artist and get their full discography."""
    results = search_spotify(artist_name, ["artist"], limit=5)
    artists = results.get("artist", [])
    if not artists:
        return {}

    # Take the most popular result
    artist = max(artists, key=lambda a: a.get("popularity", 0))
    return get_artist_complete(artist["id"])

8. New Releases and Category Browsing {#new-releases}

def get_new_releases(country: str = "US",
                     limit: int = 50) -> list[dict]:
    """Get new album releases in a country."""
    all_releases = []
    offset = 0

    while offset < limit:
        batch_size = min(50, limit - offset)
        data = spotify.get("browse/new-releases", params={
            "country": country,
            "limit": batch_size,
            "offset": offset,
        })

        albums = data.get("albums", {})
        for album in albums.get("items", []):
            all_releases.append({
                "id": album["id"],
                "name": album["name"],
                "type": album["album_type"],
                "artists": [a["name"] for a in album.get("artists", [])],
                "release_date": album["release_date"],
                "total_tracks": album["total_tracks"],
                "image": album.get("images", [{}])[0].get("url"),
            })

        if not albums.get("next"):
            break
        offset += batch_size
        time.sleep(0.3)

    return all_releases

def get_featured_playlists(country: str = "US",
                            limit: int = 20) -> list[dict]:
    """Get Spotify's editorially featured playlists."""
    data = spotify.get("browse/featured-playlists", params={
        "country": country,
        "limit": limit,
    })
    return [
        {
            "id": p["id"],
            "name": p["name"],
            "description": p.get("description", ""),
            "followers": p.get("followers", {}).get("total"),
            "total_tracks": p.get("tracks", {}).get("total"),
            "image": p.get("images", [{}])[0].get("url"),
        }
        for p in data.get("playlists", {}).get("items", [])
        if p
    ]

def get_categories() -> list[dict]:
    """Get Spotify's browse categories."""
    categories = []
    offset = 0
    while True:
        data = spotify.get("browse/categories", params={
            "limit": 50,
            "offset": offset,
            "country": "US",
        })
        items = data.get("categories", {}).get("items", [])
        if not items:
            break
        categories.extend([{"id": c["id"], "name": c["name"]} for c in items])
        if not data.get("categories", {}).get("next"):
            break
        offset += 50
    return categories

def get_category_playlists(category_id: str,
                            limit: int = 20) -> list[dict]:
    """Get playlists for a specific Spotify category."""
    data = spotify.get(f"browse/categories/{category_id}/playlists",
                       params={"limit": limit, "country": "US"})
    playlists = data.get("playlists", {}).get("items", [])
    return [
        {"id": p["id"], "name": p["name"],
         "description": p.get("description", "")}
        for p in playlists if p
    ]

def get_recommendations(seed_artists: list[str] = None,
                          seed_tracks: list[str] = None,
                          seed_genres: list[str] = None,
                          target_features: dict = None,
                          limit: int = 100) -> list[dict]:
    """Get track recommendations based on seeds and audio feature targets."""
    params = {
        "limit": min(limit, 100),
        "market": "US",
    }

    if seed_artists:
        params["seed_artists"] = ",".join(seed_artists[:2])
    if seed_tracks:
        params["seed_tracks"] = ",".join(seed_tracks[:2])
    if seed_genres:
        params["seed_genres"] = ",".join(seed_genres[:1])

    # Target audio features for filtered recommendations
    feature_targets = {
        "target_danceability": None,
        "target_energy": None,
        "target_valence": None,
        "target_tempo": None,
        "target_popularity": None,
        "min_popularity": None,
        "max_popularity": None,
        "min_tempo": None,
        "max_tempo": None,
    }
    if target_features:
        for key, val in target_features.items():
            param_key = f"target_{key}" if not key.startswith(("min_", "max_")) else key
            if val is not None:
                params[param_key] = val

    data = spotify.get("recommendations", params=params)
    return [
        {
            "id": t["id"],
            "name": t["name"],
            "artists": [a["name"] for a in t.get("artists", [])],
            "album": t.get("album", {}).get("name"),
            "popularity": t.get("popularity", 0),
            "duration_ms": t["duration_ms"],
            "preview_url": t.get("preview_url"),
        }
        for t in data.get("tracks", [])
    ]

# Example: Find high-energy dance tracks similar to a seed
recs = get_recommendations(
    seed_genres=["edm"],
    target_features={
        "danceability": 0.9,
        "energy": 0.85,
        "valence": 0.7,
        "min_popularity": 40,
    },
    limit=50
)

def get_available_genre_seeds() -> list[str]:
    """Get all available genre seeds for recommendations."""
    data = spotify.get("recommendations/available-genre-seeds")
    return data.get("genres", [])

10. User Data with Authorization Code Flow {#user-data}

When users authorize your app, you can access their personal Spotify data:

def get_user_top_tracks(user_token: str,
                         time_range: str = "medium_term",
                         limit: int = 50) -> list[dict]:
    """Get a user's top tracks. time_range: short_term/medium_term/long_term."""
    headers = {"Authorization": f"Bearer {user_token}"}
    all_tracks = []
    offset = 0

    while offset < limit:
        resp = requests.get(
            "https://api.spotify.com/v1/me/top/tracks",
            headers=headers,
            params={
                "time_range": time_range,
                "limit": min(50, limit - offset),
                "offset": offset,
            }
        )
        resp.raise_for_status()
        data = resp.json()
        items = data.get("items", [])
        if not items:
            break

        for item in items:
            all_tracks.append({
                "id": item["id"],
                "name": item["name"],
                "artists": [a["name"] for a in item.get("artists", [])],
                "popularity": item.get("popularity"),
            })

        if not data.get("next"):
            break
        offset += 50

    return all_tracks

def get_user_saved_tracks(user_token: str,
                           limit: int = 200) -> list[dict]:
    """Get tracks saved to a user's library."""
    headers = {"Authorization": f"Bearer {user_token}"}
    saved = []
    offset = 0

    while len(saved) < limit:
        resp = requests.get(
            "https://api.spotify.com/v1/me/tracks",
            headers=headers,
            params={"limit": 50, "offset": offset, "market": "US"}
        )
        resp.raise_for_status()
        data = resp.json()
        items = data.get("items", [])
        if not items:
            break

        for item in items:
            track = item.get("track")
            if track and track.get("id"):
                saved.append({
                    "id": track["id"],
                    "name": track["name"],
                    "artists": [a["name"] for a in track.get("artists", [])],
                    "added_at": item.get("added_at"),
                })

        if not data.get("next"):
            break
        offset += 50
        time.sleep(0.3)

    return saved[:limit]

11. Pagination: Handling Large Result Sets {#pagination}

Spotify's pagination works via offset and limit, or via cursor-based next URLs:

def paginate_spotify_endpoint(endpoint: str,
                               params: dict = None,
                               items_key: str = "items",
                               max_items: int = None) -> list:
    """Generic paginator for any Spotify endpoint using offset/limit."""
    all_items = []
    params = dict(params or {})
    params.setdefault("limit", 50)

    while True:
        data = spotify.get(endpoint, params=params)

        # Handle both direct lists and wrapped objects
        container = data
        if isinstance(data.get(items_key), list):
            items = data[items_key]
        elif data.get("items"):
            items = data["items"]
        else:
            break

        all_items.extend([i for i in items if i])  # filter None

        if max_items and len(all_items) >= max_items:
            all_items = all_items[:max_items]
            break

        next_url = data.get("next")
        if not next_url:
            break

        # Use the full next URL directly
        endpoint = next_url
        params = {}
        time.sleep(0.3)

    return all_items

# Examples
all_playlist_items = paginate_spotify_endpoint(
    f"playlists/37i9dQZF1DXcBWIGoYBM5M/tracks",
    params={"market": "US", "fields": "items(track(id,name)),next"},
    max_items=500
)

12. Rate Limits and How to Handle Them {#rate-limits}

Spotify's rate limits are per-app, not per-endpoint. Based on real-world testing:

Client Credentials flow: Roughly 100-200 requests per 30 seconds
429 responses include a Retry-After header (in seconds)
Token lifetime: 3600 seconds (1 hour), then needs refresh
Batch endpoints count as 1 request regardless of IDs included -- always use them

import time
import threading
from collections import deque

class TokenBucketRateLimiter:
    """Sliding window rate limiter for Spotify API."""

    def __init__(self, max_requests: int = 90,
                 window_seconds: int = 30):
        self.max_requests = max_requests
        self.window = window_seconds
        self.requests = deque()
        self._lock = threading.Lock()

    def wait(self):
        with self._lock:
            now = time.time()
            # Remove old requests outside the window
            while self.requests and now - self.requests[0] > self.window:
                self.requests.popleft()

            if len(self.requests) >= self.max_requests:
                sleep_time = self.window - (now - self.requests[0]) + 0.1
                time.sleep(sleep_time)

            self.requests.append(time.time())

rate_limiter = TokenBucketRateLimiter(max_requests=80, window_seconds=30)

# For large-scale collection using multiple app credentials
def create_rotating_client_pool(credentials: list[dict]) -> list[SpotifyClient]:
    """Create multiple clients to distribute rate limits."""
    return [SpotifyClient(c["client_id"], c["client_secret"])
            for c in credentials]

For large-scale extraction -- mapping entire genres or building recommendation datasets -- multiple Spotify app credentials rotating under their individual rate limit buckets is the practical path. For any supplementary scraping of music platforms (lyrics sites, chart data, setlist databases), routing through ThorData residential proxies keeps your scraping stable without affecting your Spotify API rate limits.

13. Storing Spotify Data: SQLite Schema {#storage}

import sqlite3
import json
import time

def init_spotify_db(db_path: str = "spotify.db") -> sqlite3.Connection:
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA synchronous=NORMAL")

    conn.execute("""
        CREATE TABLE IF NOT EXISTS artists (
            id TEXT PRIMARY KEY,
            name TEXT,
            genres TEXT,
            followers INTEGER,
            popularity INTEGER,
            images TEXT,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS albums (
            id TEXT PRIMARY KEY,
            name TEXT,
            artist_ids TEXT,
            artist_names TEXT,
            release_date TEXT,
            total_tracks INTEGER,
            album_type TEXT,
            label TEXT,
            popularity INTEGER,
            image_url TEXT,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS tracks (
            id TEXT PRIMARY KEY,
            name TEXT,
            artist_ids TEXT,
            artist_names TEXT,
            album_id TEXT,
            album_name TEXT,
            release_date TEXT,
            duration_ms INTEGER,
            popularity INTEGER,
            explicit INTEGER,
            preview_url TEXT,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS audio_features (
            track_id TEXT PRIMARY KEY,
            danceability REAL,
            energy REAL,
            key INTEGER,
            loudness REAL,
            mode INTEGER,
            speechiness REAL,
            acousticness REAL,
            instrumentalness REAL,
            liveness REAL,
            valence REAL,
            tempo REAL,
            time_signature INTEGER,
            duration_ms INTEGER,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS playlist_tracks (
            playlist_id TEXT,
            track_id TEXT,
            position INTEGER,
            added_at TEXT,
            added_by TEXT,
            PRIMARY KEY (playlist_id, track_id)
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_tracks_artist ON tracks(artist_ids)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_tracks_album ON tracks(album_id)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_af_track ON audio_features(track_id)")

    conn.commit()
    return conn

def save_track_with_features(conn: sqlite3.Connection,
                               track: dict, features: dict = None):
    """Save a track and its audio features atomically."""
    now = time.time()
    conn.execute("""
        INSERT OR REPLACE INTO tracks VALUES (?,?,?,?,?,?,?,?,?,?,?,?)
    """, (
        track["id"], track["name"],
        json.dumps([a["id"] for a in track.get("artists", [])]),
        track.get("artist_names") or ", ".join(a.get("name","") for a in track.get("artists",[])),
        track.get("album_id") or track.get("album", {}).get("id"),
        track.get("album") if isinstance(track.get("album"), str)
            else track.get("album", {}).get("name"),
        track.get("release_date"),
        track.get("duration_ms"),
        track.get("popularity", 0),
        int(track.get("explicit", False)),
        track.get("preview_url"),
        now
    ))
    if features:
        conn.execute("""
            INSERT OR REPLACE INTO audio_features VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
        """, (
            features["id"],
            features.get("danceability"), features.get("energy"),
            features.get("key"), features.get("loudness"),
            features.get("mode"), features.get("speechiness"),
            features.get("acousticness"), features.get("instrumentalness"),
            features.get("liveness"), features.get("valence"),
            features.get("tempo"), features.get("time_signature"),
            features.get("duration_ms"), now
        ))
    conn.commit()

14. Building Complete Datasets {#datasets}

Here's how to build a full genre dataset with audio features for machine learning or analysis:

def build_genre_dataset(genres: list[str],
                         tracks_per_genre: int = 200,
                         db_path: str = "spotify_genre_dataset.db") -> dict:
    """Build a labeled dataset of tracks by genre with audio features."""
    conn = init_spotify_db(db_path)
    dataset_summary = {}

    for genre in genres:
        print(f"\nCollecting genre: {genre}")
        genre_tracks = []

        # Search for tracks in this genre
        results = search_tracks(f"genre:{genre}", limit=min(tracks_per_genre, 1000))
        genre_tracks.extend(results)

        # Also get playlist tracks for this genre category
        playlists = get_category_playlists(genre, limit=5)
        for pl in playlists[:3]:
            pl_tracks = get_playlist_tracks(pl["id"])
            genre_tracks.extend(pl_tracks[:50])

        # Deduplicate by track ID
        seen = set()
        unique_tracks = []
        for t in genre_tracks:
            if t["id"] not in seen:
                seen.add(t["id"])
                unique_tracks.append(t)

        unique_tracks = unique_tracks[:tracks_per_genre]

        # Get audio features in batches
        track_ids = [t["id"] for t in unique_tracks]
        features_map = get_audio_features(track_ids)

        # Save to database
        for track in unique_tracks:
            track["genre_label"] = genre
            features = features_map.get(track["id"])
            save_track_with_features(conn, track, features)

        dataset_summary[genre] = {
            "tracks_collected": len(unique_tracks),
            "with_audio_features": sum(1 for t in unique_tracks
                                       if t["id"] in features_map),
        }
        print(f"  {genre}: {len(unique_tracks)} tracks, "
              f"{dataset_summary[genre]['with_audio_features']} with features")

        time.sleep(1.0)  # Respect rate limits between genres

    conn.close()
    return dataset_summary

def build_playlist_dataset(playlist_id: str,
                            db_path: str = "playlist_dataset.db") -> int:
    """Build a complete single-playlist dataset with audio features."""
    conn = init_spotify_db(db_path)

    # Get playlist info
    info = get_playlist_info(playlist_id)
    print(f"Building dataset for: {info['name']} ({info['total_tracks']} tracks)")

    # Get all tracks
    tracks = get_playlist_tracks(playlist_id)

    # Get audio features in batches
    track_ids = [t["id"] for t in tracks]
    features_map = get_audio_features(track_ids)

    # Save everything
    for i, track in enumerate(tracks):
        features = features_map.get(track["id"])
        save_track_with_features(conn, track, features)
        # Record playlist membership
        conn.execute("""
            INSERT OR IGNORE INTO playlist_tracks VALUES (?,?,?,?,?)
        """, (playlist_id, track["id"], i,
              track.get("added_at"), track.get("added_by")))

    conn.commit()
    conn.close()
    print(f"Saved {len(tracks)} tracks with "
          f"{len(features_map)} audio feature records")
    return len(tracks)

15. Spotify Web Playback and Embed APIs {#playback}

For displaying Spotify content in web apps (not for data extraction):

<!-- Embed a track player -->
<iframe
  src="https://open.spotify.com/embed/track/4iV5W9uYEdYUVa79Axb7Rh"
  width="300"
  height="80"
  frameborder="0"
  allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture">
</iframe>

<!-- Embed a playlist -->
<iframe
  src="https://open.spotify.com/embed/playlist/37i9dQZF1DXcBWIGoYBM5M"
  width="300"
  height="380"
  frameborder="0"
  allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture">
</iframe>

16. Real Use Cases {#use-cases}

Music Mood Analysis

def classify_mood(features: dict) -> str:
    """Classify track mood based on audio features."""
    valence = features.get("valence", 0.5)
    energy = features.get("energy", 0.5)

    if valence > 0.6 and energy > 0.6:
        return "happy_energetic"
    elif valence > 0.6 and energy < 0.4:
        return "happy_calm"
    elif valence < 0.4 and energy > 0.6:
        return "angry_intense"
    elif valence < 0.4 and energy < 0.4:
        return "sad_melancholic"
    else:
        return "neutral"

def analyze_playlist_moods(playlist_id: str) -> dict:
    tracks = get_playlist_tracks(playlist_id)
    track_ids = [t["id"] for t in tracks]
    features_map = get_audio_features(track_ids)

    mood_counts = {}
    for track_id, feat in features_map.items():
        mood = classify_mood(feat)
        mood_counts[mood] = mood_counts.get(mood, 0) + 1

    return mood_counts

Genre Popularity Tracking

def track_genre_popularity(genres: list[str],
                            db: sqlite3.Connection) -> dict:
    """Track average popularity of tracks across genres."""
    result = {}
    for genre in genres:
        tracks = search_tracks(f"genre:{genre}", limit=50)
        if tracks:
            avg_pop = sum(t["popularity"] for t in tracks) / len(tracks)
            result[genre] = {
                "avg_popularity": round(avg_pop, 1),
                "sample_size": len(tracks),
            }
    return result

17. Common Errors and Fixes {#errors}

Error	Cause	Fix
`401 Unauthorized`	Token expired or invalid	Re-authenticate, check client_id/secret
`403 Forbidden`	Endpoint requires user auth	Use Authorization Code flow, not Client Credentials
`429 Too Many Requests`	Rate limit exceeded	Check `Retry-After` header, implement backoff
`404 Not Found`	Track/playlist/artist deleted	Remove from tracking list
`Track: null` in playlist	Local file or unavailable in market	Filter out null tracks
Empty `audio_features` array	Track has no audio analysis	Filter and handle missing data
Token refresh fails	Invalid refresh token	User must re-authorize
`No active device`	Web Playback SDK issue	Unrelated to data API
Very low `popularity` scores	Recent release, few plays	Normal; scores update weekly

Final Thoughts

Spotify is a rare case where the official API is genuinely better than scraping. Free access, rich metadata, and audio features you can't get anywhere else. The main limitations are:

No actual play counts -- only a 0-100 popularity score updated weekly
No chart positions -- use third-party chart APIs for that
No lyrics -- use Genius API or Musixmatch for lyrics

The audio features endpoint alone is worth the setup time. valence (emotional positivity), danceability, and tempo together enable sophisticated content classification that powers recommendation systems, mood-based playlists, music research, and marketing analytics.

If you're building anything music-related -- recommendation engines, genre analysis, mood-based playlist tools, or market research -- start here. Set up the Client Credentials flow, grab your first playlist, and run the audio features on it. You'll immediately see why Spotify's API is the most developer-friendly in the social/media space.

Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API

Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API

Table of Contents

1. Setting Up Authentication {#auth}

Creating a Spotify App

2. Client Credentials vs. Authorization Code Flow {#auth-flows}

Authorization Code Flow (for user data)

3. Extracting Playlist Data and Tracks {#playlists}

4. Audio Features: The Hidden Gold {#audio-features}

Audio Analysis (More Granular, More Expensive)

5. Artist Deep Dives: Profiles, Top Tracks, and Albums {#artists}

6. Album Data and Track Listings {#albums}

7. Search Across the Catalog {#search}

8. New Releases and Category Browsing {#new-releases}

9. Related Artists and Recommendation Seeds {#recommendations}

10. User Data with Authorization Code Flow {#user-data}

11. Pagination: Handling Large Result Sets {#pagination}

12. Rate Limits and How to Handle Them {#rate-limits}

13. Storing Spotify Data: SQLite Schema {#storage}

14. Building Complete Datasets {#datasets}

15. Spotify Web Playback and Embed APIs {#playback}

16. Real Use Cases {#use-cases}

Music Mood Analysis

Genre Popularity Tracking

17. Common Errors and Fixes {#errors}

Final Thoughts