← Back to blog

Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API

Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API

Spotify is unusual among major platforms -- they actually want developers using their data. The Spotify Web API is free, well-documented, and gives you access to a staggering amount of metadata: 100 million+ tracks, artist profiles, album details, playlist contents, and even audio analysis features like tempo, key, danceability, and acousticness. No scraping required for most use cases.

The catch? Rate limits are strict, some endpoints have become more restrictive, and certain data (like actual play counts) is deliberately withheld. Here's the practical guide to getting everything you need.


Table of Contents

  1. Setting Up Authentication
  2. Client Credentials vs. Authorization Code Flow
  3. Extracting Playlist Data and Tracks
  4. Audio Features: The Hidden Gold
  5. Artist Deep Dives: Profiles, Top Tracks, and Albums
  6. Album Data and Track Listings
  7. Search Across the Catalog
  8. New Releases and Category Browsing
  9. Related Artists and Recommendation Seeds
  10. User Data with Authorization Code Flow
  11. Pagination: Handling Large Result Sets
  12. Rate Limits and How to Handle Them
  13. Storing Spotify Data: SQLite Schema
  14. Building Complete Datasets
  15. Spotify Web Playback and Embed APIs
  16. Real Use Cases
  17. Common Errors and Fixes

1. Setting Up Authentication {#auth}

Spotify uses OAuth 2.0. For data extraction (no user-specific data), the Client Credentials flow is simplest -- you get an app token without any user login.

Creating a Spotify App

  1. Go to the Spotify Developer Dashboard and create an app
  2. Set a redirect URI (even http://localhost:8080 works for Client Credentials -- it's not used)
  3. Note your Client ID and Client Secret
  4. No approval process required for basic API access
import requests
import base64
import time
import json
from functools import lru_cache

class SpotifyClient:
    """Spotify API client with automatic token refresh and retry logic."""

    BASE_URL = "https://api.spotify.com/v1"
    TOKEN_URL = "https://accounts.spotify.com/api/token"

    def __init__(self, client_id: str, client_secret: str):
        self.client_id = client_id
        self.client_secret = client_secret
        self._token = None
        self._token_expires = 0
        self._request_count = 0

    def _get_token(self) -> str:
        """Get or refresh access token using Client Credentials flow."""
        if self._token and time.time() < self._token_expires - 60:
            return self._token

        auth = base64.b64encode(
            f"{self.client_id}:{self.client_secret}".encode()
        ).decode()

        resp = requests.post(
            self.TOKEN_URL,
            headers={
                "Authorization": f"Basic {auth}",
                "Content-Type": "application/x-www-form-urlencoded",
            },
            data={"grant_type": "client_credentials"},
            timeout=10,
        )
        resp.raise_for_status()
        data = resp.json()

        self._token = data["access_token"]
        self._token_expires = time.time() + data["expires_in"]
        return self._token

    def get(self, endpoint: str, params: dict = None,
            retries: int = 3) -> dict:
        """Make authenticated GET request with rate limit handling."""
        url = endpoint if endpoint.startswith("http") else f"{self.BASE_URL}/{endpoint}"
        headers = {"Authorization": f"Bearer {self._get_token()}"}
        self._request_count += 1

        for attempt in range(retries):
            resp = requests.get(url, headers=headers,
                                params=params or {}, timeout=15)

            if resp.status_code == 429:
                retry_after = int(resp.headers.get("Retry-After", 5))
                print(f"Rate limited, waiting {retry_after}s "
                      f"(total requests: {self._request_count})")
                time.sleep(retry_after + 1)
                continue

            if resp.status_code == 401:
                # Token expired mid-session
                self._token = None
                headers["Authorization"] = f"Bearer {self._get_token()}"
                continue

            if resp.status_code == 503:
                time.sleep(2 ** attempt)
                continue

            resp.raise_for_status()
            return resp.json()

        raise Exception(f"Failed after {retries} retries: {url}")

# Initialize client
spotify = SpotifyClient("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET")

2. Client Credentials vs. Authorization Code Flow {#auth-flows}

Feature Client Credentials Authorization Code
User data No Yes
Public catalog Yes Yes
Rate limit Per-app Per-user + per-app
Setup complexity Low Medium
Use case Data collection User integrations

For data collection, always use Client Credentials. Authorization Code is only needed for accessing user-specific data (saved tracks, listening history, etc.).

Authorization Code Flow (for user data)

import urllib.parse
import secrets

class SpotifyUserClient(SpotifyClient):
    """Extended client supporting user authorization."""

    AUTHORIZE_URL = "https://accounts.spotify.com/authorize"

    def get_auth_url(self, redirect_uri: str,
                     scopes: list[str]) -> tuple[str, str]:
        """Generate authorization URL and state token."""
        state = secrets.token_urlsafe(16)
        params = {
            "client_id": self.client_id,
            "response_type": "code",
            "redirect_uri": redirect_uri,
            "scope": " ".join(scopes),
            "state": state,
        }
        url = f"{self.AUTHORIZE_URL}?{urllib.parse.urlencode(params)}"
        return url, state

    def exchange_code_for_token(self, code: str,
                                 redirect_uri: str) -> dict:
        """Exchange authorization code for access + refresh tokens."""
        auth = base64.b64encode(
            f"{self.client_id}:{self.client_secret}".encode()
        ).decode()

        resp = requests.post(
            self.TOKEN_URL,
            headers={"Authorization": f"Basic {auth}"},
            data={
                "grant_type": "authorization_code",
                "code": code,
                "redirect_uri": redirect_uri,
            }
        )
        resp.raise_for_status()
        return resp.json()

    def refresh_user_token(self, refresh_token: str) -> str:
        """Refresh an expired user access token."""
        auth = base64.b64encode(
            f"{self.client_id}:{self.client_secret}".encode()
        ).decode()
        resp = requests.post(
            self.TOKEN_URL,
            headers={"Authorization": f"Basic {auth}"},
            data={"grant_type": "refresh_token", "refresh_token": refresh_token}
        )
        resp.raise_for_status()
        return resp.json()["access_token"]

3. Extracting Playlist Data and Tracks {#playlists}

Playlists are the most common target. A single playlist can have up to 10,000 tracks, returned in pages of 100:

def get_playlist_info(playlist_id: str) -> dict:
    """Get playlist metadata."""
    data = spotify.get(f"playlists/{playlist_id}", params={
        "fields": "id,name,description,owner,followers,public,"
                  "snapshot_id,images,tracks.total"
    })
    return {
        "id": data["id"],
        "name": data["name"],
        "description": data.get("description", ""),
        "owner": data["owner"]["display_name"],
        "owner_id": data["owner"]["id"],
        "followers": data.get("followers", {}).get("total", 0),
        "is_public": data.get("public"),
        "total_tracks": data["tracks"]["total"],
        "snapshot_id": data.get("snapshot_id"),
        "image": data.get("images", [{}])[0].get("url"),
    }

def get_playlist_tracks(playlist_id: str,
                         market: str = "US") -> list[dict]:
    """Get all tracks from a Spotify playlist."""
    tracks = []
    offset = 0
    limit = 100

    while True:
        data = spotify.get(
            f"playlists/{playlist_id}/tracks",
            params={
                "offset": offset,
                "limit": limit,
                "market": market,
                "fields": ("items(added_at,added_by.id,"
                           "track(id,name,artists,album,duration_ms,"
                           "popularity,explicit,preview_url,"
                           "external_urls,is_local,type)),"
                           "next,total"),
            }
        )

        for item in data.get("items", []):
            track = item.get("track")
            if not track or not track.get("id"):
                # Skip local files and unavailable tracks
                continue

            tracks.append({
                "id": track["id"],
                "name": track["name"],
                "artists": [{"id": a["id"], "name": a["name"]}
                            for a in track.get("artists", [])],
                "artist_names": ", ".join(a["name"]
                                          for a in track.get("artists", [])),
                "album": track.get("album", {}).get("name"),
                "album_id": track.get("album", {}).get("id"),
                "album_type": track.get("album", {}).get("album_type"),
                "release_date": track.get("album", {}).get("release_date"),
                "duration_ms": track["duration_ms"],
                "duration_seconds": track["duration_ms"] / 1000,
                "popularity": track.get("popularity", 0),
                "explicit": track.get("explicit", False),
                "preview_url": track.get("preview_url"),
                "spotify_url": track.get("external_urls", {}).get("spotify"),
                "added_at": item.get("added_at"),
                "added_by": item.get("added_by", {}).get("id"),
            })

        if not data.get("next"):
            break

        offset += limit
        time.sleep(0.3)  # Polite delay

    return tracks

def get_playlist_full(playlist_id: str) -> dict:
    """Get complete playlist data including all tracks and metadata."""
    info = get_playlist_info(playlist_id)
    tracks = get_playlist_tracks(playlist_id)
    info["tracks"] = tracks
    info["actual_track_count"] = len(tracks)
    return info

# Example: Get Today's Top Hits
top_hits = get_playlist_full("37i9dQZF1DXcBWIGoYBM5M")
print(f"Playlist: {top_hits['name']}")
print(f"Followers: {top_hits['followers']:,}")
print(f"Tracks: {len(top_hits['tracks'])}")

4. Audio Features: The Hidden Gold {#audio-features}

This is what makes Spotify's API special. The /audio-features endpoint returns machine-analyzed attributes for every track. You can request up to 100 tracks in a single batch call:

def get_audio_features(track_ids: list[str]) -> dict[str, dict]:
    """Get audio features for tracks (batch of up to 100). Returns dict keyed by track ID."""
    all_features = {}

    for i in range(0, len(track_ids), 100):
        batch = track_ids[i:i+100]
        data = spotify.get("audio-features", params={"ids": ",".join(batch)})

        for feat in data.get("audio_features", []):
            if not feat:
                continue
            all_features[feat["id"]] = {
                "id": feat["id"],
                # Rhythm and energy
                "danceability": feat["danceability"],    # 0.0-1.0: dance-ability
                "energy": feat["energy"],                # 0.0-1.0: intensity
                "tempo": feat["tempo"],                  # BPM
                "time_signature": feat["time_signature"],# beats per bar (3,4,5,6,7)
                "loudness": feat["loudness"],            # dB, typically -60 to 0
                # Tone and mood
                "key": feat["key"],                      # -1=no key, 0=C, 1=C#...11=B
                "mode": feat["mode"],                    # 0=minor, 1=major
                "valence": feat["valence"],              # 0=sad/negative, 1=happy
                "speechiness": feat["speechiness"],      # 0=no speech, 1=all speech
                "acousticness": feat["acousticness"],    # 0=electric, 1=acoustic
                "instrumentalness": feat["instrumentalness"], # >0.5 = likely no vocals
                "liveness": feat["liveness"],            # >0.8 = likely live recording
                # Duration
                "duration_ms": feat["duration_ms"],
            }

        time.sleep(0.5)

    return all_features

# Key legend
KEY_NAMES = {-1: "No key", 0: "C", 1: "C#/Db", 2: "D", 3: "D#/Eb",
             4: "E", 5: "F", 6: "F#/Gb", 7: "G", 8: "G#/Ab",
             9: "A", 10: "A#/Bb", 11: "B"}

def describe_audio_features(features: dict) -> str:
    """Human-readable description of audio features."""
    key = KEY_NAMES.get(features.get("key", -1), "Unknown")
    mode = "major" if features.get("mode") == 1 else "minor"
    return (
        f"Key: {key} {mode} | "
        f"Tempo: {features.get('tempo', 0):.0f} BPM | "
        f"Energy: {features.get('energy', 0):.2f} | "
        f"Danceability: {features.get('danceability', 0):.2f} | "
        f"Valence: {features.get('valence', 0):.2f} | "
        f"Acousticness: {features.get('acousticness', 0):.2f}"
    )

# Usage example
track_ids = ["4iV5W9uYEdYUVa79Axb7Rh", "1301WleyT98MSxVHPZCA6M"]
features = get_audio_features(track_ids)
for tid, feat in features.items():
    print(f"{tid}: {describe_audio_features(feat)}")

Audio Analysis (More Granular, More Expensive)

For beat-level analysis (individual beats, bars, sections, tatums), use the /audio-analysis endpoint -- but it's one track at a time and significantly slower:

def get_audio_analysis(track_id: str) -> dict:
    """Get detailed audio analysis for a single track."""
    data = spotify.get(f"audio-analysis/{track_id}")
    return {
        "track_id": track_id,
        "duration": data.get("track", {}).get("duration"),
        "tempo": data.get("track", {}).get("tempo"),
        "key": data.get("track", {}).get("key"),
        "time_signature": data.get("track", {}).get("time_signature"),
        "bars": len(data.get("bars", [])),
        "beats": len(data.get("beats", [])),
        "sections": len(data.get("sections", [])),
        "segments": len(data.get("segments", [])),
        "tatums": len(data.get("tatums", [])),
        "sections_data": [
            {
                "start": s["start"],
                "duration": s["duration"],
                "tempo": s["tempo"],
                "key": s["key"],
                "loudness": s["loudness"],
            }
            for s in data.get("sections", [])
        ],
    }

5. Artist Deep Dives: Profiles, Top Tracks, and Albums {#artists}

def get_artist_complete(artist_id: str, market: str = "US") -> dict:
    """Get comprehensive artist data including profile, top tracks, and discography."""
    # Base profile
    artist = spotify.get(f"artists/{artist_id}")

    # Top tracks in market
    top_tracks_data = spotify.get(
        f"artists/{artist_id}/top-tracks",
        params={"market": market}
    )

    # Albums and singles
    albums_data = spotify.get(
        f"artists/{artist_id}/albums",
        params={
            "limit": 50,
            "include_groups": "album,single,compilation",
            "market": market,
        }
    )

    # Related artists
    related_data = spotify.get(f"artists/{artist_id}/related-artists")

    return {
        "id": artist["id"],
        "name": artist["name"],
        "genres": artist.get("genres", []),
        "followers": artist.get("followers", {}).get("total", 0),
        "popularity": artist["popularity"],
        "images": [img["url"] for img in artist.get("images", [])],
        "external_url": artist.get("external_urls", {}).get("spotify"),
        "top_tracks": [
            {
                "id": t["id"],
                "name": t["name"],
                "album": t["album"]["name"],
                "popularity": t["popularity"],
                "preview_url": t.get("preview_url"),
                "duration_ms": t["duration_ms"],
            }
            for t in top_tracks_data.get("tracks", [])
        ],
        "discography": [
            {
                "id": a["id"],
                "name": a["name"],
                "type": a["album_type"],
                "release_date": a["release_date"],
                "total_tracks": a["total_tracks"],
                "image": a.get("images", [{}])[0].get("url"),
            }
            for a in albums_data.get("items", [])
        ],
        "related_artists": [
            {
                "id": r["id"],
                "name": r["name"],
                "genres": r.get("genres", []),
                "followers": r.get("followers", {}).get("total", 0),
                "popularity": r["popularity"],
            }
            for r in related_data.get("artists", [])
        ],
    }

def get_multiple_artists(artist_ids: list[str]) -> list[dict]:
    """Batch fetch artist profiles (up to 50 per request)."""
    results = []
    for i in range(0, len(artist_ids), 50):
        batch = artist_ids[i:i+50]
        data = spotify.get("artists", params={"ids": ",".join(batch)})
        for artist in data.get("artists", []):
            if artist:
                results.append({
                    "id": artist["id"],
                    "name": artist["name"],
                    "genres": artist.get("genres", []),
                    "followers": artist.get("followers", {}).get("total", 0),
                    "popularity": artist["popularity"],
                })
        time.sleep(0.3)
    return results

6. Album Data and Track Listings {#albums}

def get_album_complete(album_id: str, market: str = "US") -> dict:
    """Get album data with all tracks."""
    album = spotify.get(f"albums/{album_id}", params={"market": market})

    tracks = []
    page = spotify.get(f"albums/{album_id}/tracks",
                       params={"limit": 50, "market": market})
    while True:
        for t in page.get("items", []):
            tracks.append({
                "id": t["id"],
                "name": t["name"],
                "track_number": t["track_number"],
                "disc_number": t["disc_number"],
                "duration_ms": t["duration_ms"],
                "explicit": t.get("explicit", False),
                "artists": [a["name"] for a in t.get("artists", [])],
                "preview_url": t.get("preview_url"),
            })
        if not page.get("next"):
            break
        page = spotify.get(page["next"])
        time.sleep(0.3)

    return {
        "id": album["id"],
        "name": album["name"],
        "type": album["album_type"],
        "artists": [a["name"] for a in album.get("artists", [])],
        "release_date": album["release_date"],
        "total_tracks": album["total_tracks"],
        "label": album.get("label"),
        "copyright": [c["text"] for c in album.get("copyrights", [])],
        "genres": album.get("genres", []),
        "popularity": album.get("popularity"),
        "image": album.get("images", [{}])[0].get("url"),
        "tracks": tracks,
        "external_url": album.get("external_urls", {}).get("spotify"),
    }

def get_albums_batch(album_ids: list[str], market: str = "US") -> list[dict]:
    """Fetch up to 20 albums in a single request."""
    results = []
    for i in range(0, len(album_ids), 20):
        batch = album_ids[i:i+20]
        data = spotify.get("albums", params={
            "ids": ",".join(batch),
            "market": market,
        })
        for album in data.get("albums", []):
            if album:
                results.append({
                    "id": album["id"],
                    "name": album["name"],
                    "release_date": album["release_date"],
                    "total_tracks": album["total_tracks"],
                    "artists": [a["name"] for a in album.get("artists", [])],
                    "popularity": album.get("popularity"),
                })
        time.sleep(0.3)
    return results

def search_spotify(query: str, search_types: list[str] = None,
                   market: str = "US", limit: int = 50) -> dict:
    """Search Spotify catalog. Types: track, artist, album, playlist, show, episode."""
    if search_types is None:
        search_types = ["track"]

    results = {t: [] for t in search_types}
    offset = 0

    while offset < limit:
        batch_size = min(50, limit - offset)
        data = spotify.get("search", params={
            "q": query,
            "type": ",".join(search_types),
            "limit": batch_size,
            "offset": offset,
            "market": market,
        })

        for search_type in search_types:
            items_key = f"{search_type}s"
            items = data.get(items_key, {}).get("items", [])
            results[search_type].extend(items)

        # Check if any type has more results
        has_more = any(
            data.get(f"{t}s", {}).get("next")
            for t in search_types
        )
        if not has_more:
            break

        offset += batch_size
        time.sleep(0.3)

    return results

def search_tracks(query: str, limit: int = 50) -> list[dict]:
    """Search for tracks and return cleaned results."""
    raw = search_spotify(query, ["track"], limit=limit)
    return [
        {
            "id": t["id"],
            "name": t["name"],
            "artists": [a["name"] for a in t.get("artists", [])],
            "album": t.get("album", {}).get("name"),
            "release_date": t.get("album", {}).get("release_date"),
            "popularity": t.get("popularity", 0),
            "duration_ms": t["duration_ms"],
            "explicit": t.get("explicit", False),
            "preview_url": t.get("preview_url"),
        }
        for t in raw["track"]
        if t  # filter out None entries
    ]

def search_by_genre(genre: str, limit: int = 50) -> list[dict]:
    """Search for tracks in a specific genre."""
    return search_tracks(f"genre:{genre}", limit=limit)

def search_artist_discography(artist_name: str) -> dict:
    """Search for an artist and get their full discography."""
    results = search_spotify(artist_name, ["artist"], limit=5)
    artists = results.get("artist", [])
    if not artists:
        return {}

    # Take the most popular result
    artist = max(artists, key=lambda a: a.get("popularity", 0))
    return get_artist_complete(artist["id"])

8. New Releases and Category Browsing {#new-releases}

def get_new_releases(country: str = "US",
                     limit: int = 50) -> list[dict]:
    """Get new album releases in a country."""
    all_releases = []
    offset = 0

    while offset < limit:
        batch_size = min(50, limit - offset)
        data = spotify.get("browse/new-releases", params={
            "country": country,
            "limit": batch_size,
            "offset": offset,
        })

        albums = data.get("albums", {})
        for album in albums.get("items", []):
            all_releases.append({
                "id": album["id"],
                "name": album["name"],
                "type": album["album_type"],
                "artists": [a["name"] for a in album.get("artists", [])],
                "release_date": album["release_date"],
                "total_tracks": album["total_tracks"],
                "image": album.get("images", [{}])[0].get("url"),
            })

        if not albums.get("next"):
            break
        offset += batch_size
        time.sleep(0.3)

    return all_releases

def get_featured_playlists(country: str = "US",
                            limit: int = 20) -> list[dict]:
    """Get Spotify's editorially featured playlists."""
    data = spotify.get("browse/featured-playlists", params={
        "country": country,
        "limit": limit,
    })
    return [
        {
            "id": p["id"],
            "name": p["name"],
            "description": p.get("description", ""),
            "followers": p.get("followers", {}).get("total"),
            "total_tracks": p.get("tracks", {}).get("total"),
            "image": p.get("images", [{}])[0].get("url"),
        }
        for p in data.get("playlists", {}).get("items", [])
        if p
    ]

def get_categories() -> list[dict]:
    """Get Spotify's browse categories."""
    categories = []
    offset = 0
    while True:
        data = spotify.get("browse/categories", params={
            "limit": 50,
            "offset": offset,
            "country": "US",
        })
        items = data.get("categories", {}).get("items", [])
        if not items:
            break
        categories.extend([{"id": c["id"], "name": c["name"]} for c in items])
        if not data.get("categories", {}).get("next"):
            break
        offset += 50
    return categories

def get_category_playlists(category_id: str,
                            limit: int = 20) -> list[dict]:
    """Get playlists for a specific Spotify category."""
    data = spotify.get(f"browse/categories/{category_id}/playlists",
                       params={"limit": limit, "country": "US"})
    playlists = data.get("playlists", {}).get("items", [])
    return [
        {"id": p["id"], "name": p["name"],
         "description": p.get("description", "")}
        for p in playlists if p
    ]

def get_recommendations(seed_artists: list[str] = None,
                          seed_tracks: list[str] = None,
                          seed_genres: list[str] = None,
                          target_features: dict = None,
                          limit: int = 100) -> list[dict]:
    """Get track recommendations based on seeds and audio feature targets."""
    params = {
        "limit": min(limit, 100),
        "market": "US",
    }

    if seed_artists:
        params["seed_artists"] = ",".join(seed_artists[:2])
    if seed_tracks:
        params["seed_tracks"] = ",".join(seed_tracks[:2])
    if seed_genres:
        params["seed_genres"] = ",".join(seed_genres[:1])

    # Target audio features for filtered recommendations
    feature_targets = {
        "target_danceability": None,
        "target_energy": None,
        "target_valence": None,
        "target_tempo": None,
        "target_popularity": None,
        "min_popularity": None,
        "max_popularity": None,
        "min_tempo": None,
        "max_tempo": None,
    }
    if target_features:
        for key, val in target_features.items():
            param_key = f"target_{key}" if not key.startswith(("min_", "max_")) else key
            if val is not None:
                params[param_key] = val

    data = spotify.get("recommendations", params=params)
    return [
        {
            "id": t["id"],
            "name": t["name"],
            "artists": [a["name"] for a in t.get("artists", [])],
            "album": t.get("album", {}).get("name"),
            "popularity": t.get("popularity", 0),
            "duration_ms": t["duration_ms"],
            "preview_url": t.get("preview_url"),
        }
        for t in data.get("tracks", [])
    ]

# Example: Find high-energy dance tracks similar to a seed
recs = get_recommendations(
    seed_genres=["edm"],
    target_features={
        "danceability": 0.9,
        "energy": 0.85,
        "valence": 0.7,
        "min_popularity": 40,
    },
    limit=50
)

def get_available_genre_seeds() -> list[str]:
    """Get all available genre seeds for recommendations."""
    data = spotify.get("recommendations/available-genre-seeds")
    return data.get("genres", [])

10. User Data with Authorization Code Flow {#user-data}

When users authorize your app, you can access their personal Spotify data:

def get_user_top_tracks(user_token: str,
                         time_range: str = "medium_term",
                         limit: int = 50) -> list[dict]:
    """Get a user's top tracks. time_range: short_term/medium_term/long_term."""
    headers = {"Authorization": f"Bearer {user_token}"}
    all_tracks = []
    offset = 0

    while offset < limit:
        resp = requests.get(
            "https://api.spotify.com/v1/me/top/tracks",
            headers=headers,
            params={
                "time_range": time_range,
                "limit": min(50, limit - offset),
                "offset": offset,
            }
        )
        resp.raise_for_status()
        data = resp.json()
        items = data.get("items", [])
        if not items:
            break

        for item in items:
            all_tracks.append({
                "id": item["id"],
                "name": item["name"],
                "artists": [a["name"] for a in item.get("artists", [])],
                "popularity": item.get("popularity"),
            })

        if not data.get("next"):
            break
        offset += 50

    return all_tracks

def get_user_saved_tracks(user_token: str,
                           limit: int = 200) -> list[dict]:
    """Get tracks saved to a user's library."""
    headers = {"Authorization": f"Bearer {user_token}"}
    saved = []
    offset = 0

    while len(saved) < limit:
        resp = requests.get(
            "https://api.spotify.com/v1/me/tracks",
            headers=headers,
            params={"limit": 50, "offset": offset, "market": "US"}
        )
        resp.raise_for_status()
        data = resp.json()
        items = data.get("items", [])
        if not items:
            break

        for item in items:
            track = item.get("track")
            if track and track.get("id"):
                saved.append({
                    "id": track["id"],
                    "name": track["name"],
                    "artists": [a["name"] for a in track.get("artists", [])],
                    "added_at": item.get("added_at"),
                })

        if not data.get("next"):
            break
        offset += 50
        time.sleep(0.3)

    return saved[:limit]

11. Pagination: Handling Large Result Sets {#pagination}

Spotify's pagination works via offset and limit, or via cursor-based next URLs:

def paginate_spotify_endpoint(endpoint: str,
                               params: dict = None,
                               items_key: str = "items",
                               max_items: int = None) -> list:
    """Generic paginator for any Spotify endpoint using offset/limit."""
    all_items = []
    params = dict(params or {})
    params.setdefault("limit", 50)

    while True:
        data = spotify.get(endpoint, params=params)

        # Handle both direct lists and wrapped objects
        container = data
        if isinstance(data.get(items_key), list):
            items = data[items_key]
        elif data.get("items"):
            items = data["items"]
        else:
            break

        all_items.extend([i for i in items if i])  # filter None

        if max_items and len(all_items) >= max_items:
            all_items = all_items[:max_items]
            break

        next_url = data.get("next")
        if not next_url:
            break

        # Use the full next URL directly
        endpoint = next_url
        params = {}
        time.sleep(0.3)

    return all_items

# Examples
all_playlist_items = paginate_spotify_endpoint(
    f"playlists/37i9dQZF1DXcBWIGoYBM5M/tracks",
    params={"market": "US", "fields": "items(track(id,name)),next"},
    max_items=500
)

12. Rate Limits and How to Handle Them {#rate-limits}

Spotify's rate limits are per-app, not per-endpoint. Based on real-world testing:

import time
import threading
from collections import deque

class TokenBucketRateLimiter:
    """Sliding window rate limiter for Spotify API."""

    def __init__(self, max_requests: int = 90,
                 window_seconds: int = 30):
        self.max_requests = max_requests
        self.window = window_seconds
        self.requests = deque()
        self._lock = threading.Lock()

    def wait(self):
        with self._lock:
            now = time.time()
            # Remove old requests outside the window
            while self.requests and now - self.requests[0] > self.window:
                self.requests.popleft()

            if len(self.requests) >= self.max_requests:
                sleep_time = self.window - (now - self.requests[0]) + 0.1
                time.sleep(sleep_time)

            self.requests.append(time.time())

rate_limiter = TokenBucketRateLimiter(max_requests=80, window_seconds=30)

# For large-scale collection using multiple app credentials
def create_rotating_client_pool(credentials: list[dict]) -> list[SpotifyClient]:
    """Create multiple clients to distribute rate limits."""
    return [SpotifyClient(c["client_id"], c["client_secret"])
            for c in credentials]

For large-scale extraction -- mapping entire genres or building recommendation datasets -- multiple Spotify app credentials rotating under their individual rate limit buckets is the practical path. For any supplementary scraping of music platforms (lyrics sites, chart data, setlist databases), routing through ThorData residential proxies keeps your scraping stable without affecting your Spotify API rate limits.


13. Storing Spotify Data: SQLite Schema {#storage}

import sqlite3
import json
import time

def init_spotify_db(db_path: str = "spotify.db") -> sqlite3.Connection:
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA synchronous=NORMAL")

    conn.execute("""
        CREATE TABLE IF NOT EXISTS artists (
            id TEXT PRIMARY KEY,
            name TEXT,
            genres TEXT,
            followers INTEGER,
            popularity INTEGER,
            images TEXT,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS albums (
            id TEXT PRIMARY KEY,
            name TEXT,
            artist_ids TEXT,
            artist_names TEXT,
            release_date TEXT,
            total_tracks INTEGER,
            album_type TEXT,
            label TEXT,
            popularity INTEGER,
            image_url TEXT,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS tracks (
            id TEXT PRIMARY KEY,
            name TEXT,
            artist_ids TEXT,
            artist_names TEXT,
            album_id TEXT,
            album_name TEXT,
            release_date TEXT,
            duration_ms INTEGER,
            popularity INTEGER,
            explicit INTEGER,
            preview_url TEXT,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS audio_features (
            track_id TEXT PRIMARY KEY,
            danceability REAL,
            energy REAL,
            key INTEGER,
            loudness REAL,
            mode INTEGER,
            speechiness REAL,
            acousticness REAL,
            instrumentalness REAL,
            liveness REAL,
            valence REAL,
            tempo REAL,
            time_signature INTEGER,
            duration_ms INTEGER,
            scraped_at REAL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS playlist_tracks (
            playlist_id TEXT,
            track_id TEXT,
            position INTEGER,
            added_at TEXT,
            added_by TEXT,
            PRIMARY KEY (playlist_id, track_id)
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_tracks_artist ON tracks(artist_ids)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_tracks_album ON tracks(album_id)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_af_track ON audio_features(track_id)")

    conn.commit()
    return conn

def save_track_with_features(conn: sqlite3.Connection,
                               track: dict, features: dict = None):
    """Save a track and its audio features atomically."""
    now = time.time()
    conn.execute("""
        INSERT OR REPLACE INTO tracks VALUES (?,?,?,?,?,?,?,?,?,?,?,?)
    """, (
        track["id"], track["name"],
        json.dumps([a["id"] for a in track.get("artists", [])]),
        track.get("artist_names") or ", ".join(a.get("name","") for a in track.get("artists",[])),
        track.get("album_id") or track.get("album", {}).get("id"),
        track.get("album") if isinstance(track.get("album"), str)
            else track.get("album", {}).get("name"),
        track.get("release_date"),
        track.get("duration_ms"),
        track.get("popularity", 0),
        int(track.get("explicit", False)),
        track.get("preview_url"),
        now
    ))
    if features:
        conn.execute("""
            INSERT OR REPLACE INTO audio_features VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
        """, (
            features["id"],
            features.get("danceability"), features.get("energy"),
            features.get("key"), features.get("loudness"),
            features.get("mode"), features.get("speechiness"),
            features.get("acousticness"), features.get("instrumentalness"),
            features.get("liveness"), features.get("valence"),
            features.get("tempo"), features.get("time_signature"),
            features.get("duration_ms"), now
        ))
    conn.commit()

14. Building Complete Datasets {#datasets}

Here's how to build a full genre dataset with audio features for machine learning or analysis:

def build_genre_dataset(genres: list[str],
                         tracks_per_genre: int = 200,
                         db_path: str = "spotify_genre_dataset.db") -> dict:
    """Build a labeled dataset of tracks by genre with audio features."""
    conn = init_spotify_db(db_path)
    dataset_summary = {}

    for genre in genres:
        print(f"\nCollecting genre: {genre}")
        genre_tracks = []

        # Search for tracks in this genre
        results = search_tracks(f"genre:{genre}", limit=min(tracks_per_genre, 1000))
        genre_tracks.extend(results)

        # Also get playlist tracks for this genre category
        playlists = get_category_playlists(genre, limit=5)
        for pl in playlists[:3]:
            pl_tracks = get_playlist_tracks(pl["id"])
            genre_tracks.extend(pl_tracks[:50])

        # Deduplicate by track ID
        seen = set()
        unique_tracks = []
        for t in genre_tracks:
            if t["id"] not in seen:
                seen.add(t["id"])
                unique_tracks.append(t)

        unique_tracks = unique_tracks[:tracks_per_genre]

        # Get audio features in batches
        track_ids = [t["id"] for t in unique_tracks]
        features_map = get_audio_features(track_ids)

        # Save to database
        for track in unique_tracks:
            track["genre_label"] = genre
            features = features_map.get(track["id"])
            save_track_with_features(conn, track, features)

        dataset_summary[genre] = {
            "tracks_collected": len(unique_tracks),
            "with_audio_features": sum(1 for t in unique_tracks
                                       if t["id"] in features_map),
        }
        print(f"  {genre}: {len(unique_tracks)} tracks, "
              f"{dataset_summary[genre]['with_audio_features']} with features")

        time.sleep(1.0)  # Respect rate limits between genres

    conn.close()
    return dataset_summary

def build_playlist_dataset(playlist_id: str,
                            db_path: str = "playlist_dataset.db") -> int:
    """Build a complete single-playlist dataset with audio features."""
    conn = init_spotify_db(db_path)

    # Get playlist info
    info = get_playlist_info(playlist_id)
    print(f"Building dataset for: {info['name']} ({info['total_tracks']} tracks)")

    # Get all tracks
    tracks = get_playlist_tracks(playlist_id)

    # Get audio features in batches
    track_ids = [t["id"] for t in tracks]
    features_map = get_audio_features(track_ids)

    # Save everything
    for i, track in enumerate(tracks):
        features = features_map.get(track["id"])
        save_track_with_features(conn, track, features)
        # Record playlist membership
        conn.execute("""
            INSERT OR IGNORE INTO playlist_tracks VALUES (?,?,?,?,?)
        """, (playlist_id, track["id"], i,
              track.get("added_at"), track.get("added_by")))

    conn.commit()
    conn.close()
    print(f"Saved {len(tracks)} tracks with "
          f"{len(features_map)} audio feature records")
    return len(tracks)

15. Spotify Web Playback and Embed APIs {#playback}

For displaying Spotify content in web apps (not for data extraction):

<!-- Embed a track player -->
<iframe
  src="https://open.spotify.com/embed/track/4iV5W9uYEdYUVa79Axb7Rh"
  width="300"
  height="80"
  frameborder="0"
  allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture">
</iframe>

<!-- Embed a playlist -->
<iframe
  src="https://open.spotify.com/embed/playlist/37i9dQZF1DXcBWIGoYBM5M"
  width="300"
  height="380"
  frameborder="0"
  allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture">
</iframe>

16. Real Use Cases {#use-cases}

Music Mood Analysis

def classify_mood(features: dict) -> str:
    """Classify track mood based on audio features."""
    valence = features.get("valence", 0.5)
    energy = features.get("energy", 0.5)

    if valence > 0.6 and energy > 0.6:
        return "happy_energetic"
    elif valence > 0.6 and energy < 0.4:
        return "happy_calm"
    elif valence < 0.4 and energy > 0.6:
        return "angry_intense"
    elif valence < 0.4 and energy < 0.4:
        return "sad_melancholic"
    else:
        return "neutral"

def analyze_playlist_moods(playlist_id: str) -> dict:
    tracks = get_playlist_tracks(playlist_id)
    track_ids = [t["id"] for t in tracks]
    features_map = get_audio_features(track_ids)

    mood_counts = {}
    for track_id, feat in features_map.items():
        mood = classify_mood(feat)
        mood_counts[mood] = mood_counts.get(mood, 0) + 1

    return mood_counts

Genre Popularity Tracking

def track_genre_popularity(genres: list[str],
                            db: sqlite3.Connection) -> dict:
    """Track average popularity of tracks across genres."""
    result = {}
    for genre in genres:
        tracks = search_tracks(f"genre:{genre}", limit=50)
        if tracks:
            avg_pop = sum(t["popularity"] for t in tracks) / len(tracks)
            result[genre] = {
                "avg_popularity": round(avg_pop, 1),
                "sample_size": len(tracks),
            }
    return result

17. Common Errors and Fixes {#errors}

Error Cause Fix
401 Unauthorized Token expired or invalid Re-authenticate, check client_id/secret
403 Forbidden Endpoint requires user auth Use Authorization Code flow, not Client Credentials
429 Too Many Requests Rate limit exceeded Check Retry-After header, implement backoff
404 Not Found Track/playlist/artist deleted Remove from tracking list
Track: null in playlist Local file or unavailable in market Filter out null tracks
Empty audio_features array Track has no audio analysis Filter and handle missing data
Token refresh fails Invalid refresh token User must re-authorize
No active device Web Playback SDK issue Unrelated to data API
Very low popularity scores Recent release, few plays Normal; scores update weekly

Final Thoughts

Spotify is a rare case where the official API is genuinely better than scraping. Free access, rich metadata, and audio features you can't get anywhere else. The main limitations are:

The audio features endpoint alone is worth the setup time. valence (emotional positivity), danceability, and tempo together enable sophisticated content classification that powers recommendation systems, mood-based playlists, music research, and marketing analytics.

If you're building anything music-related -- recommendation engines, genre analysis, mood-based playlist tools, or market research -- start here. Set up the Client Credentials flow, grab your first playlist, and run the audio features on it. You'll immediately see why Spotify's API is the most developer-friendly in the social/media space.