Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API
Extracting Spotify Data in 2026: Playlists, Tracks, Audio Features, and Artist Analytics via the Web API
Spotify is unusual among major platforms -- they actually want developers using their data. The Spotify Web API is free, well-documented, and gives you access to a staggering amount of metadata: 100 million+ tracks, artist profiles, album details, playlist contents, and even audio analysis features like tempo, key, danceability, and acousticness. No scraping required for most use cases.
The catch? Rate limits are strict, some endpoints have become more restrictive, and certain data (like actual play counts) is deliberately withheld. Here's the practical guide to getting everything you need.
Table of Contents
- Setting Up Authentication
- Client Credentials vs. Authorization Code Flow
- Extracting Playlist Data and Tracks
- Audio Features: The Hidden Gold
- Artist Deep Dives: Profiles, Top Tracks, and Albums
- Album Data and Track Listings
- Search Across the Catalog
- New Releases and Category Browsing
- Related Artists and Recommendation Seeds
- User Data with Authorization Code Flow
- Pagination: Handling Large Result Sets
- Rate Limits and How to Handle Them
- Storing Spotify Data: SQLite Schema
- Building Complete Datasets
- Spotify Web Playback and Embed APIs
- Real Use Cases
- Common Errors and Fixes
1. Setting Up Authentication {#auth}
Spotify uses OAuth 2.0. For data extraction (no user-specific data), the Client Credentials flow is simplest -- you get an app token without any user login.
Creating a Spotify App
- Go to the Spotify Developer Dashboard and create an app
- Set a redirect URI (even
http://localhost:8080works for Client Credentials -- it's not used) - Note your Client ID and Client Secret
- No approval process required for basic API access
import requests
import base64
import time
import json
from functools import lru_cache
class SpotifyClient:
"""Spotify API client with automatic token refresh and retry logic."""
BASE_URL = "https://api.spotify.com/v1"
TOKEN_URL = "https://accounts.spotify.com/api/token"
def __init__(self, client_id: str, client_secret: str):
self.client_id = client_id
self.client_secret = client_secret
self._token = None
self._token_expires = 0
self._request_count = 0
def _get_token(self) -> str:
"""Get or refresh access token using Client Credentials flow."""
if self._token and time.time() < self._token_expires - 60:
return self._token
auth = base64.b64encode(
f"{self.client_id}:{self.client_secret}".encode()
).decode()
resp = requests.post(
self.TOKEN_URL,
headers={
"Authorization": f"Basic {auth}",
"Content-Type": "application/x-www-form-urlencoded",
},
data={"grant_type": "client_credentials"},
timeout=10,
)
resp.raise_for_status()
data = resp.json()
self._token = data["access_token"]
self._token_expires = time.time() + data["expires_in"]
return self._token
def get(self, endpoint: str, params: dict = None,
retries: int = 3) -> dict:
"""Make authenticated GET request with rate limit handling."""
url = endpoint if endpoint.startswith("http") else f"{self.BASE_URL}/{endpoint}"
headers = {"Authorization": f"Bearer {self._get_token()}"}
self._request_count += 1
for attempt in range(retries):
resp = requests.get(url, headers=headers,
params=params or {}, timeout=15)
if resp.status_code == 429:
retry_after = int(resp.headers.get("Retry-After", 5))
print(f"Rate limited, waiting {retry_after}s "
f"(total requests: {self._request_count})")
time.sleep(retry_after + 1)
continue
if resp.status_code == 401:
# Token expired mid-session
self._token = None
headers["Authorization"] = f"Bearer {self._get_token()}"
continue
if resp.status_code == 503:
time.sleep(2 ** attempt)
continue
resp.raise_for_status()
return resp.json()
raise Exception(f"Failed after {retries} retries: {url}")
# Initialize client
spotify = SpotifyClient("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET")
2. Client Credentials vs. Authorization Code Flow {#auth-flows}
| Feature | Client Credentials | Authorization Code |
|---|---|---|
| User data | No | Yes |
| Public catalog | Yes | Yes |
| Rate limit | Per-app | Per-user + per-app |
| Setup complexity | Low | Medium |
| Use case | Data collection | User integrations |
For data collection, always use Client Credentials. Authorization Code is only needed for accessing user-specific data (saved tracks, listening history, etc.).
Authorization Code Flow (for user data)
import urllib.parse
import secrets
class SpotifyUserClient(SpotifyClient):
"""Extended client supporting user authorization."""
AUTHORIZE_URL = "https://accounts.spotify.com/authorize"
def get_auth_url(self, redirect_uri: str,
scopes: list[str]) -> tuple[str, str]:
"""Generate authorization URL and state token."""
state = secrets.token_urlsafe(16)
params = {
"client_id": self.client_id,
"response_type": "code",
"redirect_uri": redirect_uri,
"scope": " ".join(scopes),
"state": state,
}
url = f"{self.AUTHORIZE_URL}?{urllib.parse.urlencode(params)}"
return url, state
def exchange_code_for_token(self, code: str,
redirect_uri: str) -> dict:
"""Exchange authorization code for access + refresh tokens."""
auth = base64.b64encode(
f"{self.client_id}:{self.client_secret}".encode()
).decode()
resp = requests.post(
self.TOKEN_URL,
headers={"Authorization": f"Basic {auth}"},
data={
"grant_type": "authorization_code",
"code": code,
"redirect_uri": redirect_uri,
}
)
resp.raise_for_status()
return resp.json()
def refresh_user_token(self, refresh_token: str) -> str:
"""Refresh an expired user access token."""
auth = base64.b64encode(
f"{self.client_id}:{self.client_secret}".encode()
).decode()
resp = requests.post(
self.TOKEN_URL,
headers={"Authorization": f"Basic {auth}"},
data={"grant_type": "refresh_token", "refresh_token": refresh_token}
)
resp.raise_for_status()
return resp.json()["access_token"]
3. Extracting Playlist Data and Tracks {#playlists}
Playlists are the most common target. A single playlist can have up to 10,000 tracks, returned in pages of 100:
def get_playlist_info(playlist_id: str) -> dict:
"""Get playlist metadata."""
data = spotify.get(f"playlists/{playlist_id}", params={
"fields": "id,name,description,owner,followers,public,"
"snapshot_id,images,tracks.total"
})
return {
"id": data["id"],
"name": data["name"],
"description": data.get("description", ""),
"owner": data["owner"]["display_name"],
"owner_id": data["owner"]["id"],
"followers": data.get("followers", {}).get("total", 0),
"is_public": data.get("public"),
"total_tracks": data["tracks"]["total"],
"snapshot_id": data.get("snapshot_id"),
"image": data.get("images", [{}])[0].get("url"),
}
def get_playlist_tracks(playlist_id: str,
market: str = "US") -> list[dict]:
"""Get all tracks from a Spotify playlist."""
tracks = []
offset = 0
limit = 100
while True:
data = spotify.get(
f"playlists/{playlist_id}/tracks",
params={
"offset": offset,
"limit": limit,
"market": market,
"fields": ("items(added_at,added_by.id,"
"track(id,name,artists,album,duration_ms,"
"popularity,explicit,preview_url,"
"external_urls,is_local,type)),"
"next,total"),
}
)
for item in data.get("items", []):
track = item.get("track")
if not track or not track.get("id"):
# Skip local files and unavailable tracks
continue
tracks.append({
"id": track["id"],
"name": track["name"],
"artists": [{"id": a["id"], "name": a["name"]}
for a in track.get("artists", [])],
"artist_names": ", ".join(a["name"]
for a in track.get("artists", [])),
"album": track.get("album", {}).get("name"),
"album_id": track.get("album", {}).get("id"),
"album_type": track.get("album", {}).get("album_type"),
"release_date": track.get("album", {}).get("release_date"),
"duration_ms": track["duration_ms"],
"duration_seconds": track["duration_ms"] / 1000,
"popularity": track.get("popularity", 0),
"explicit": track.get("explicit", False),
"preview_url": track.get("preview_url"),
"spotify_url": track.get("external_urls", {}).get("spotify"),
"added_at": item.get("added_at"),
"added_by": item.get("added_by", {}).get("id"),
})
if not data.get("next"):
break
offset += limit
time.sleep(0.3) # Polite delay
return tracks
def get_playlist_full(playlist_id: str) -> dict:
"""Get complete playlist data including all tracks and metadata."""
info = get_playlist_info(playlist_id)
tracks = get_playlist_tracks(playlist_id)
info["tracks"] = tracks
info["actual_track_count"] = len(tracks)
return info
# Example: Get Today's Top Hits
top_hits = get_playlist_full("37i9dQZF1DXcBWIGoYBM5M")
print(f"Playlist: {top_hits['name']}")
print(f"Followers: {top_hits['followers']:,}")
print(f"Tracks: {len(top_hits['tracks'])}")
4. Audio Features: The Hidden Gold {#audio-features}
This is what makes Spotify's API special. The /audio-features endpoint returns machine-analyzed attributes for every track. You can request up to 100 tracks in a single batch call:
def get_audio_features(track_ids: list[str]) -> dict[str, dict]:
"""Get audio features for tracks (batch of up to 100). Returns dict keyed by track ID."""
all_features = {}
for i in range(0, len(track_ids), 100):
batch = track_ids[i:i+100]
data = spotify.get("audio-features", params={"ids": ",".join(batch)})
for feat in data.get("audio_features", []):
if not feat:
continue
all_features[feat["id"]] = {
"id": feat["id"],
# Rhythm and energy
"danceability": feat["danceability"], # 0.0-1.0: dance-ability
"energy": feat["energy"], # 0.0-1.0: intensity
"tempo": feat["tempo"], # BPM
"time_signature": feat["time_signature"],# beats per bar (3,4,5,6,7)
"loudness": feat["loudness"], # dB, typically -60 to 0
# Tone and mood
"key": feat["key"], # -1=no key, 0=C, 1=C#...11=B
"mode": feat["mode"], # 0=minor, 1=major
"valence": feat["valence"], # 0=sad/negative, 1=happy
"speechiness": feat["speechiness"], # 0=no speech, 1=all speech
"acousticness": feat["acousticness"], # 0=electric, 1=acoustic
"instrumentalness": feat["instrumentalness"], # >0.5 = likely no vocals
"liveness": feat["liveness"], # >0.8 = likely live recording
# Duration
"duration_ms": feat["duration_ms"],
}
time.sleep(0.5)
return all_features
# Key legend
KEY_NAMES = {-1: "No key", 0: "C", 1: "C#/Db", 2: "D", 3: "D#/Eb",
4: "E", 5: "F", 6: "F#/Gb", 7: "G", 8: "G#/Ab",
9: "A", 10: "A#/Bb", 11: "B"}
def describe_audio_features(features: dict) -> str:
"""Human-readable description of audio features."""
key = KEY_NAMES.get(features.get("key", -1), "Unknown")
mode = "major" if features.get("mode") == 1 else "minor"
return (
f"Key: {key} {mode} | "
f"Tempo: {features.get('tempo', 0):.0f} BPM | "
f"Energy: {features.get('energy', 0):.2f} | "
f"Danceability: {features.get('danceability', 0):.2f} | "
f"Valence: {features.get('valence', 0):.2f} | "
f"Acousticness: {features.get('acousticness', 0):.2f}"
)
# Usage example
track_ids = ["4iV5W9uYEdYUVa79Axb7Rh", "1301WleyT98MSxVHPZCA6M"]
features = get_audio_features(track_ids)
for tid, feat in features.items():
print(f"{tid}: {describe_audio_features(feat)}")
Audio Analysis (More Granular, More Expensive)
For beat-level analysis (individual beats, bars, sections, tatums), use the /audio-analysis endpoint -- but it's one track at a time and significantly slower:
def get_audio_analysis(track_id: str) -> dict:
"""Get detailed audio analysis for a single track."""
data = spotify.get(f"audio-analysis/{track_id}")
return {
"track_id": track_id,
"duration": data.get("track", {}).get("duration"),
"tempo": data.get("track", {}).get("tempo"),
"key": data.get("track", {}).get("key"),
"time_signature": data.get("track", {}).get("time_signature"),
"bars": len(data.get("bars", [])),
"beats": len(data.get("beats", [])),
"sections": len(data.get("sections", [])),
"segments": len(data.get("segments", [])),
"tatums": len(data.get("tatums", [])),
"sections_data": [
{
"start": s["start"],
"duration": s["duration"],
"tempo": s["tempo"],
"key": s["key"],
"loudness": s["loudness"],
}
for s in data.get("sections", [])
],
}
5. Artist Deep Dives: Profiles, Top Tracks, and Albums {#artists}
def get_artist_complete(artist_id: str, market: str = "US") -> dict:
"""Get comprehensive artist data including profile, top tracks, and discography."""
# Base profile
artist = spotify.get(f"artists/{artist_id}")
# Top tracks in market
top_tracks_data = spotify.get(
f"artists/{artist_id}/top-tracks",
params={"market": market}
)
# Albums and singles
albums_data = spotify.get(
f"artists/{artist_id}/albums",
params={
"limit": 50,
"include_groups": "album,single,compilation",
"market": market,
}
)
# Related artists
related_data = spotify.get(f"artists/{artist_id}/related-artists")
return {
"id": artist["id"],
"name": artist["name"],
"genres": artist.get("genres", []),
"followers": artist.get("followers", {}).get("total", 0),
"popularity": artist["popularity"],
"images": [img["url"] for img in artist.get("images", [])],
"external_url": artist.get("external_urls", {}).get("spotify"),
"top_tracks": [
{
"id": t["id"],
"name": t["name"],
"album": t["album"]["name"],
"popularity": t["popularity"],
"preview_url": t.get("preview_url"),
"duration_ms": t["duration_ms"],
}
for t in top_tracks_data.get("tracks", [])
],
"discography": [
{
"id": a["id"],
"name": a["name"],
"type": a["album_type"],
"release_date": a["release_date"],
"total_tracks": a["total_tracks"],
"image": a.get("images", [{}])[0].get("url"),
}
for a in albums_data.get("items", [])
],
"related_artists": [
{
"id": r["id"],
"name": r["name"],
"genres": r.get("genres", []),
"followers": r.get("followers", {}).get("total", 0),
"popularity": r["popularity"],
}
for r in related_data.get("artists", [])
],
}
def get_multiple_artists(artist_ids: list[str]) -> list[dict]:
"""Batch fetch artist profiles (up to 50 per request)."""
results = []
for i in range(0, len(artist_ids), 50):
batch = artist_ids[i:i+50]
data = spotify.get("artists", params={"ids": ",".join(batch)})
for artist in data.get("artists", []):
if artist:
results.append({
"id": artist["id"],
"name": artist["name"],
"genres": artist.get("genres", []),
"followers": artist.get("followers", {}).get("total", 0),
"popularity": artist["popularity"],
})
time.sleep(0.3)
return results
6. Album Data and Track Listings {#albums}
def get_album_complete(album_id: str, market: str = "US") -> dict:
"""Get album data with all tracks."""
album = spotify.get(f"albums/{album_id}", params={"market": market})
tracks = []
page = spotify.get(f"albums/{album_id}/tracks",
params={"limit": 50, "market": market})
while True:
for t in page.get("items", []):
tracks.append({
"id": t["id"],
"name": t["name"],
"track_number": t["track_number"],
"disc_number": t["disc_number"],
"duration_ms": t["duration_ms"],
"explicit": t.get("explicit", False),
"artists": [a["name"] for a in t.get("artists", [])],
"preview_url": t.get("preview_url"),
})
if not page.get("next"):
break
page = spotify.get(page["next"])
time.sleep(0.3)
return {
"id": album["id"],
"name": album["name"],
"type": album["album_type"],
"artists": [a["name"] for a in album.get("artists", [])],
"release_date": album["release_date"],
"total_tracks": album["total_tracks"],
"label": album.get("label"),
"copyright": [c["text"] for c in album.get("copyrights", [])],
"genres": album.get("genres", []),
"popularity": album.get("popularity"),
"image": album.get("images", [{}])[0].get("url"),
"tracks": tracks,
"external_url": album.get("external_urls", {}).get("spotify"),
}
def get_albums_batch(album_ids: list[str], market: str = "US") -> list[dict]:
"""Fetch up to 20 albums in a single request."""
results = []
for i in range(0, len(album_ids), 20):
batch = album_ids[i:i+20]
data = spotify.get("albums", params={
"ids": ",".join(batch),
"market": market,
})
for album in data.get("albums", []):
if album:
results.append({
"id": album["id"],
"name": album["name"],
"release_date": album["release_date"],
"total_tracks": album["total_tracks"],
"artists": [a["name"] for a in album.get("artists", [])],
"popularity": album.get("popularity"),
})
time.sleep(0.3)
return results
7. Search Across the Catalog {#search}
def search_spotify(query: str, search_types: list[str] = None,
market: str = "US", limit: int = 50) -> dict:
"""Search Spotify catalog. Types: track, artist, album, playlist, show, episode."""
if search_types is None:
search_types = ["track"]
results = {t: [] for t in search_types}
offset = 0
while offset < limit:
batch_size = min(50, limit - offset)
data = spotify.get("search", params={
"q": query,
"type": ",".join(search_types),
"limit": batch_size,
"offset": offset,
"market": market,
})
for search_type in search_types:
items_key = f"{search_type}s"
items = data.get(items_key, {}).get("items", [])
results[search_type].extend(items)
# Check if any type has more results
has_more = any(
data.get(f"{t}s", {}).get("next")
for t in search_types
)
if not has_more:
break
offset += batch_size
time.sleep(0.3)
return results
def search_tracks(query: str, limit: int = 50) -> list[dict]:
"""Search for tracks and return cleaned results."""
raw = search_spotify(query, ["track"], limit=limit)
return [
{
"id": t["id"],
"name": t["name"],
"artists": [a["name"] for a in t.get("artists", [])],
"album": t.get("album", {}).get("name"),
"release_date": t.get("album", {}).get("release_date"),
"popularity": t.get("popularity", 0),
"duration_ms": t["duration_ms"],
"explicit": t.get("explicit", False),
"preview_url": t.get("preview_url"),
}
for t in raw["track"]
if t # filter out None entries
]
def search_by_genre(genre: str, limit: int = 50) -> list[dict]:
"""Search for tracks in a specific genre."""
return search_tracks(f"genre:{genre}", limit=limit)
def search_artist_discography(artist_name: str) -> dict:
"""Search for an artist and get their full discography."""
results = search_spotify(artist_name, ["artist"], limit=5)
artists = results.get("artist", [])
if not artists:
return {}
# Take the most popular result
artist = max(artists, key=lambda a: a.get("popularity", 0))
return get_artist_complete(artist["id"])
8. New Releases and Category Browsing {#new-releases}
def get_new_releases(country: str = "US",
limit: int = 50) -> list[dict]:
"""Get new album releases in a country."""
all_releases = []
offset = 0
while offset < limit:
batch_size = min(50, limit - offset)
data = spotify.get("browse/new-releases", params={
"country": country,
"limit": batch_size,
"offset": offset,
})
albums = data.get("albums", {})
for album in albums.get("items", []):
all_releases.append({
"id": album["id"],
"name": album["name"],
"type": album["album_type"],
"artists": [a["name"] for a in album.get("artists", [])],
"release_date": album["release_date"],
"total_tracks": album["total_tracks"],
"image": album.get("images", [{}])[0].get("url"),
})
if not albums.get("next"):
break
offset += batch_size
time.sleep(0.3)
return all_releases
def get_featured_playlists(country: str = "US",
limit: int = 20) -> list[dict]:
"""Get Spotify's editorially featured playlists."""
data = spotify.get("browse/featured-playlists", params={
"country": country,
"limit": limit,
})
return [
{
"id": p["id"],
"name": p["name"],
"description": p.get("description", ""),
"followers": p.get("followers", {}).get("total"),
"total_tracks": p.get("tracks", {}).get("total"),
"image": p.get("images", [{}])[0].get("url"),
}
for p in data.get("playlists", {}).get("items", [])
if p
]
def get_categories() -> list[dict]:
"""Get Spotify's browse categories."""
categories = []
offset = 0
while True:
data = spotify.get("browse/categories", params={
"limit": 50,
"offset": offset,
"country": "US",
})
items = data.get("categories", {}).get("items", [])
if not items:
break
categories.extend([{"id": c["id"], "name": c["name"]} for c in items])
if not data.get("categories", {}).get("next"):
break
offset += 50
return categories
def get_category_playlists(category_id: str,
limit: int = 20) -> list[dict]:
"""Get playlists for a specific Spotify category."""
data = spotify.get(f"browse/categories/{category_id}/playlists",
params={"limit": limit, "country": "US"})
playlists = data.get("playlists", {}).get("items", [])
return [
{"id": p["id"], "name": p["name"],
"description": p.get("description", "")}
for p in playlists if p
]
9. Related Artists and Recommendation Seeds {#recommendations}
def get_recommendations(seed_artists: list[str] = None,
seed_tracks: list[str] = None,
seed_genres: list[str] = None,
target_features: dict = None,
limit: int = 100) -> list[dict]:
"""Get track recommendations based on seeds and audio feature targets."""
params = {
"limit": min(limit, 100),
"market": "US",
}
if seed_artists:
params["seed_artists"] = ",".join(seed_artists[:2])
if seed_tracks:
params["seed_tracks"] = ",".join(seed_tracks[:2])
if seed_genres:
params["seed_genres"] = ",".join(seed_genres[:1])
# Target audio features for filtered recommendations
feature_targets = {
"target_danceability": None,
"target_energy": None,
"target_valence": None,
"target_tempo": None,
"target_popularity": None,
"min_popularity": None,
"max_popularity": None,
"min_tempo": None,
"max_tempo": None,
}
if target_features:
for key, val in target_features.items():
param_key = f"target_{key}" if not key.startswith(("min_", "max_")) else key
if val is not None:
params[param_key] = val
data = spotify.get("recommendations", params=params)
return [
{
"id": t["id"],
"name": t["name"],
"artists": [a["name"] for a in t.get("artists", [])],
"album": t.get("album", {}).get("name"),
"popularity": t.get("popularity", 0),
"duration_ms": t["duration_ms"],
"preview_url": t.get("preview_url"),
}
for t in data.get("tracks", [])
]
# Example: Find high-energy dance tracks similar to a seed
recs = get_recommendations(
seed_genres=["edm"],
target_features={
"danceability": 0.9,
"energy": 0.85,
"valence": 0.7,
"min_popularity": 40,
},
limit=50
)
def get_available_genre_seeds() -> list[str]:
"""Get all available genre seeds for recommendations."""
data = spotify.get("recommendations/available-genre-seeds")
return data.get("genres", [])
10. User Data with Authorization Code Flow {#user-data}
When users authorize your app, you can access their personal Spotify data:
def get_user_top_tracks(user_token: str,
time_range: str = "medium_term",
limit: int = 50) -> list[dict]:
"""Get a user's top tracks. time_range: short_term/medium_term/long_term."""
headers = {"Authorization": f"Bearer {user_token}"}
all_tracks = []
offset = 0
while offset < limit:
resp = requests.get(
"https://api.spotify.com/v1/me/top/tracks",
headers=headers,
params={
"time_range": time_range,
"limit": min(50, limit - offset),
"offset": offset,
}
)
resp.raise_for_status()
data = resp.json()
items = data.get("items", [])
if not items:
break
for item in items:
all_tracks.append({
"id": item["id"],
"name": item["name"],
"artists": [a["name"] for a in item.get("artists", [])],
"popularity": item.get("popularity"),
})
if not data.get("next"):
break
offset += 50
return all_tracks
def get_user_saved_tracks(user_token: str,
limit: int = 200) -> list[dict]:
"""Get tracks saved to a user's library."""
headers = {"Authorization": f"Bearer {user_token}"}
saved = []
offset = 0
while len(saved) < limit:
resp = requests.get(
"https://api.spotify.com/v1/me/tracks",
headers=headers,
params={"limit": 50, "offset": offset, "market": "US"}
)
resp.raise_for_status()
data = resp.json()
items = data.get("items", [])
if not items:
break
for item in items:
track = item.get("track")
if track and track.get("id"):
saved.append({
"id": track["id"],
"name": track["name"],
"artists": [a["name"] for a in track.get("artists", [])],
"added_at": item.get("added_at"),
})
if not data.get("next"):
break
offset += 50
time.sleep(0.3)
return saved[:limit]
11. Pagination: Handling Large Result Sets {#pagination}
Spotify's pagination works via offset and limit, or via cursor-based next URLs:
def paginate_spotify_endpoint(endpoint: str,
params: dict = None,
items_key: str = "items",
max_items: int = None) -> list:
"""Generic paginator for any Spotify endpoint using offset/limit."""
all_items = []
params = dict(params or {})
params.setdefault("limit", 50)
while True:
data = spotify.get(endpoint, params=params)
# Handle both direct lists and wrapped objects
container = data
if isinstance(data.get(items_key), list):
items = data[items_key]
elif data.get("items"):
items = data["items"]
else:
break
all_items.extend([i for i in items if i]) # filter None
if max_items and len(all_items) >= max_items:
all_items = all_items[:max_items]
break
next_url = data.get("next")
if not next_url:
break
# Use the full next URL directly
endpoint = next_url
params = {}
time.sleep(0.3)
return all_items
# Examples
all_playlist_items = paginate_spotify_endpoint(
f"playlists/37i9dQZF1DXcBWIGoYBM5M/tracks",
params={"market": "US", "fields": "items(track(id,name)),next"},
max_items=500
)
12. Rate Limits and How to Handle Them {#rate-limits}
Spotify's rate limits are per-app, not per-endpoint. Based on real-world testing:
- Client Credentials flow: Roughly 100-200 requests per 30 seconds
- 429 responses include a
Retry-Afterheader (in seconds) - Token lifetime: 3600 seconds (1 hour), then needs refresh
- Batch endpoints count as 1 request regardless of IDs included -- always use them
import time
import threading
from collections import deque
class TokenBucketRateLimiter:
"""Sliding window rate limiter for Spotify API."""
def __init__(self, max_requests: int = 90,
window_seconds: int = 30):
self.max_requests = max_requests
self.window = window_seconds
self.requests = deque()
self._lock = threading.Lock()
def wait(self):
with self._lock:
now = time.time()
# Remove old requests outside the window
while self.requests and now - self.requests[0] > self.window:
self.requests.popleft()
if len(self.requests) >= self.max_requests:
sleep_time = self.window - (now - self.requests[0]) + 0.1
time.sleep(sleep_time)
self.requests.append(time.time())
rate_limiter = TokenBucketRateLimiter(max_requests=80, window_seconds=30)
# For large-scale collection using multiple app credentials
def create_rotating_client_pool(credentials: list[dict]) -> list[SpotifyClient]:
"""Create multiple clients to distribute rate limits."""
return [SpotifyClient(c["client_id"], c["client_secret"])
for c in credentials]
For large-scale extraction -- mapping entire genres or building recommendation datasets -- multiple Spotify app credentials rotating under their individual rate limit buckets is the practical path. For any supplementary scraping of music platforms (lyrics sites, chart data, setlist databases), routing through ThorData residential proxies keeps your scraping stable without affecting your Spotify API rate limits.
13. Storing Spotify Data: SQLite Schema {#storage}
import sqlite3
import json
import time
def init_spotify_db(db_path: str = "spotify.db") -> sqlite3.Connection:
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA synchronous=NORMAL")
conn.execute("""
CREATE TABLE IF NOT EXISTS artists (
id TEXT PRIMARY KEY,
name TEXT,
genres TEXT,
followers INTEGER,
popularity INTEGER,
images TEXT,
scraped_at REAL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS albums (
id TEXT PRIMARY KEY,
name TEXT,
artist_ids TEXT,
artist_names TEXT,
release_date TEXT,
total_tracks INTEGER,
album_type TEXT,
label TEXT,
popularity INTEGER,
image_url TEXT,
scraped_at REAL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS tracks (
id TEXT PRIMARY KEY,
name TEXT,
artist_ids TEXT,
artist_names TEXT,
album_id TEXT,
album_name TEXT,
release_date TEXT,
duration_ms INTEGER,
popularity INTEGER,
explicit INTEGER,
preview_url TEXT,
scraped_at REAL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS audio_features (
track_id TEXT PRIMARY KEY,
danceability REAL,
energy REAL,
key INTEGER,
loudness REAL,
mode INTEGER,
speechiness REAL,
acousticness REAL,
instrumentalness REAL,
liveness REAL,
valence REAL,
tempo REAL,
time_signature INTEGER,
duration_ms INTEGER,
scraped_at REAL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS playlist_tracks (
playlist_id TEXT,
track_id TEXT,
position INTEGER,
added_at TEXT,
added_by TEXT,
PRIMARY KEY (playlist_id, track_id)
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_tracks_artist ON tracks(artist_ids)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_tracks_album ON tracks(album_id)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_af_track ON audio_features(track_id)")
conn.commit()
return conn
def save_track_with_features(conn: sqlite3.Connection,
track: dict, features: dict = None):
"""Save a track and its audio features atomically."""
now = time.time()
conn.execute("""
INSERT OR REPLACE INTO tracks VALUES (?,?,?,?,?,?,?,?,?,?,?,?)
""", (
track["id"], track["name"],
json.dumps([a["id"] for a in track.get("artists", [])]),
track.get("artist_names") or ", ".join(a.get("name","") for a in track.get("artists",[])),
track.get("album_id") or track.get("album", {}).get("id"),
track.get("album") if isinstance(track.get("album"), str)
else track.get("album", {}).get("name"),
track.get("release_date"),
track.get("duration_ms"),
track.get("popularity", 0),
int(track.get("explicit", False)),
track.get("preview_url"),
now
))
if features:
conn.execute("""
INSERT OR REPLACE INTO audio_features VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
""", (
features["id"],
features.get("danceability"), features.get("energy"),
features.get("key"), features.get("loudness"),
features.get("mode"), features.get("speechiness"),
features.get("acousticness"), features.get("instrumentalness"),
features.get("liveness"), features.get("valence"),
features.get("tempo"), features.get("time_signature"),
features.get("duration_ms"), now
))
conn.commit()
14. Building Complete Datasets {#datasets}
Here's how to build a full genre dataset with audio features for machine learning or analysis:
def build_genre_dataset(genres: list[str],
tracks_per_genre: int = 200,
db_path: str = "spotify_genre_dataset.db") -> dict:
"""Build a labeled dataset of tracks by genre with audio features."""
conn = init_spotify_db(db_path)
dataset_summary = {}
for genre in genres:
print(f"\nCollecting genre: {genre}")
genre_tracks = []
# Search for tracks in this genre
results = search_tracks(f"genre:{genre}", limit=min(tracks_per_genre, 1000))
genre_tracks.extend(results)
# Also get playlist tracks for this genre category
playlists = get_category_playlists(genre, limit=5)
for pl in playlists[:3]:
pl_tracks = get_playlist_tracks(pl["id"])
genre_tracks.extend(pl_tracks[:50])
# Deduplicate by track ID
seen = set()
unique_tracks = []
for t in genre_tracks:
if t["id"] not in seen:
seen.add(t["id"])
unique_tracks.append(t)
unique_tracks = unique_tracks[:tracks_per_genre]
# Get audio features in batches
track_ids = [t["id"] for t in unique_tracks]
features_map = get_audio_features(track_ids)
# Save to database
for track in unique_tracks:
track["genre_label"] = genre
features = features_map.get(track["id"])
save_track_with_features(conn, track, features)
dataset_summary[genre] = {
"tracks_collected": len(unique_tracks),
"with_audio_features": sum(1 for t in unique_tracks
if t["id"] in features_map),
}
print(f" {genre}: {len(unique_tracks)} tracks, "
f"{dataset_summary[genre]['with_audio_features']} with features")
time.sleep(1.0) # Respect rate limits between genres
conn.close()
return dataset_summary
def build_playlist_dataset(playlist_id: str,
db_path: str = "playlist_dataset.db") -> int:
"""Build a complete single-playlist dataset with audio features."""
conn = init_spotify_db(db_path)
# Get playlist info
info = get_playlist_info(playlist_id)
print(f"Building dataset for: {info['name']} ({info['total_tracks']} tracks)")
# Get all tracks
tracks = get_playlist_tracks(playlist_id)
# Get audio features in batches
track_ids = [t["id"] for t in tracks]
features_map = get_audio_features(track_ids)
# Save everything
for i, track in enumerate(tracks):
features = features_map.get(track["id"])
save_track_with_features(conn, track, features)
# Record playlist membership
conn.execute("""
INSERT OR IGNORE INTO playlist_tracks VALUES (?,?,?,?,?)
""", (playlist_id, track["id"], i,
track.get("added_at"), track.get("added_by")))
conn.commit()
conn.close()
print(f"Saved {len(tracks)} tracks with "
f"{len(features_map)} audio feature records")
return len(tracks)
15. Spotify Web Playback and Embed APIs {#playback}
For displaying Spotify content in web apps (not for data extraction):
<!-- Embed a track player -->
<iframe
src="https://open.spotify.com/embed/track/4iV5W9uYEdYUVa79Axb7Rh"
width="300"
height="80"
frameborder="0"
allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture">
</iframe>
<!-- Embed a playlist -->
<iframe
src="https://open.spotify.com/embed/playlist/37i9dQZF1DXcBWIGoYBM5M"
width="300"
height="380"
frameborder="0"
allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture">
</iframe>
16. Real Use Cases {#use-cases}
Music Mood Analysis
def classify_mood(features: dict) -> str:
"""Classify track mood based on audio features."""
valence = features.get("valence", 0.5)
energy = features.get("energy", 0.5)
if valence > 0.6 and energy > 0.6:
return "happy_energetic"
elif valence > 0.6 and energy < 0.4:
return "happy_calm"
elif valence < 0.4 and energy > 0.6:
return "angry_intense"
elif valence < 0.4 and energy < 0.4:
return "sad_melancholic"
else:
return "neutral"
def analyze_playlist_moods(playlist_id: str) -> dict:
tracks = get_playlist_tracks(playlist_id)
track_ids = [t["id"] for t in tracks]
features_map = get_audio_features(track_ids)
mood_counts = {}
for track_id, feat in features_map.items():
mood = classify_mood(feat)
mood_counts[mood] = mood_counts.get(mood, 0) + 1
return mood_counts
Genre Popularity Tracking
def track_genre_popularity(genres: list[str],
db: sqlite3.Connection) -> dict:
"""Track average popularity of tracks across genres."""
result = {}
for genre in genres:
tracks = search_tracks(f"genre:{genre}", limit=50)
if tracks:
avg_pop = sum(t["popularity"] for t in tracks) / len(tracks)
result[genre] = {
"avg_popularity": round(avg_pop, 1),
"sample_size": len(tracks),
}
return result
17. Common Errors and Fixes {#errors}
| Error | Cause | Fix |
|---|---|---|
401 Unauthorized |
Token expired or invalid | Re-authenticate, check client_id/secret |
403 Forbidden |
Endpoint requires user auth | Use Authorization Code flow, not Client Credentials |
429 Too Many Requests |
Rate limit exceeded | Check Retry-After header, implement backoff |
404 Not Found |
Track/playlist/artist deleted | Remove from tracking list |
Track: null in playlist |
Local file or unavailable in market | Filter out null tracks |
Empty audio_features array |
Track has no audio analysis | Filter and handle missing data |
| Token refresh fails | Invalid refresh token | User must re-authorize |
No active device |
Web Playback SDK issue | Unrelated to data API |
Very low popularity scores |
Recent release, few plays | Normal; scores update weekly |
Final Thoughts
Spotify is a rare case where the official API is genuinely better than scraping. Free access, rich metadata, and audio features you can't get anywhere else. The main limitations are:
- No actual play counts -- only a 0-100 popularity score updated weekly
- No chart positions -- use third-party chart APIs for that
- No lyrics -- use Genius API or Musixmatch for lyrics
The audio features endpoint alone is worth the setup time. valence (emotional positivity), danceability, and tempo together enable sophisticated content classification that powers recommendation systems, mood-based playlists, music research, and marketing analytics.
If you're building anything music-related -- recommendation engines, genre analysis, mood-based playlist tools, or market research -- start here. Set up the Client Credentials flow, grab your first playlist, and run the audio features on it. You'll immediately see why Spotify's API is the most developer-friendly in the social/media space.