← Back to blog

How to Scrape Bandcamp Music Data with Python (2026)

How to Scrape Bandcamp Music Data with Python (2026)

Bandcamp doesn't have a public API. They had one years ago, shut it down, and never brought it back. If you want structured data about artists, albums, pricing, or fan activity on Bandcamp, scraping is the only option.

The good news: Bandcamp's HTML is relatively clean. Most data lives in structured JSON-LD embedded in the page source. You don't need a headless browser for most tasks — plain HTTP requests work fine for public pages.

This guide covers scraping artist profiles, album listings, individual tracks, fan data, estimating sales from publicly visible signals, and running a full production pipeline with proxy rotation via ThorData.

Why Scrape Bandcamp?

Bandcamp is one of the last major music platforms where artists retain control and pricing is transparent. This makes it uniquely valuable for:

Unlike Spotify or Apple Music, Bandcamp shows real transactional signals — who bought what, at what price. That data doesn't exist anywhere else.

Setup

pip install requests beautifulsoup4 lxml sqlite3

You'll also need a proxy service for any serious data collection. We'll use ThorData residential proxies throughout this guide — the key reason is that residential IPs from real ISPs don't trigger Bandcamp's rate limiting the way datacenter IPs do.

Understanding Bandcamp's Page Structure

Before scraping, understand what you're working with. Bandcamp has several page types:

The most valuable insight: Bandcamp embeds JSON-LD structured data in <script type="application/ld+json"> tags on every artist and album page. This gives you machine-readable data without parsing messy HTML. Always try this first before falling back to CSS selectors.

Scraping Artist Pages

Every Bandcamp artist has a page at artistname.bandcamp.com. The page lists their discography and basic info.

import requests
from bs4 import BeautifulSoup
import json
import time
import random
import re
import sqlite3
from pathlib import Path
from datetime import datetime
from typing import Optional, Dict, List, Any

def make_headers(rotate_ua: bool = True) -> Dict[str, str]:
    """Generate realistic browser headers."""
    user_agents = [
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 14.5; rv:127.0) Gecko/20100101 Firefox/127.0",
    ]
    return {
        "User-Agent": random.choice(user_agents) if rotate_ua else user_agents[0],
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "DNT": "1",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
    }


def scrape_artist(artist_slug: str, proxy: Optional[str] = None) -> Optional[Dict]:
    """Scrape artist page and return structured data."""
    url = f"https://{artist_slug}.bandcamp.com"
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None

    try:
        response = requests.get(url, headers=headers, proxies=proxies, timeout=20)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"[ERROR] Failed to fetch {url}: {e}")
        return None

    soup = BeautifulSoup(response.text, "lxml")

    # Try JSON-LD first — cleanest data source
    ld_script = soup.find("script", {"type": "application/ld+json"})
    if ld_script:
        try:
            data = json.loads(ld_script.string)
            # Normalize JSON-LD structure
            result = {
                "slug": artist_slug,
                "url": url,
                "name": data.get("name") or data.get("foundingLocation", {}).get("name"),
                "description": data.get("description", ""),
                "location": None,
                "albums": [],
                "source": "json-ld",
                "scraped_at": datetime.utcnow().isoformat(),
            }
            # Albums are listed in "album" key as list or single object
            albums_raw = data.get("album", [])
            if isinstance(albums_raw, dict):
                albums_raw = [albums_raw]
            for alb in albums_raw:
                result["albums"].append({
                    "title": alb.get("name"),
                    "url": alb.get("@id") or alb.get("url"),
                    "release_date": alb.get("datePublished"),
                })
            return result
        except json.JSONDecodeError:
            pass  # Fall through to HTML parsing

    # Fallback: parse HTML directly
    artist_name = soup.select_one("#band-name-location .title")
    location = soup.select_one("#band-name-location .location")
    bio_el = soup.select_one("#bio-text")

    albums = []
    for item in soup.select("#music-grid .music-grid-item"):
        link = item.select_one("a")
        title = item.select_one(".title")
        if link and title:
            href = link.get("href", "")
            if href and not href.startswith("http"):
                href = f"https://{artist_slug}.bandcamp.com{href}"
            albums.append({
                "title": title.text.strip(),
                "url": href,
                "release_date": None,
            })

    return {
        "slug": artist_slug,
        "url": url,
        "name": artist_name.text.strip() if artist_name else None,
        "location": location.text.strip() if location else None,
        "description": bio_el.get_text(" ", strip=True)[:500] if bio_el else "",
        "albums": albums,
        "source": "html",
        "scraped_at": datetime.utcnow().isoformat(),
    }

Extracting Album and Track Data

Album pages contain track listings, pricing, release dates, and credits. The JSON-LD data here is especially rich.

def scrape_album(album_url: str, proxy: Optional[str] = None) -> Optional[Dict]:
    """Scrape album page for full track and pricing data."""
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None

    try:
        response = requests.get(album_url, headers=headers, proxies=proxies, timeout=20)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"[ERROR] Failed to fetch {album_url}: {e}")
        return None

    soup = BeautifulSoup(response.text, "lxml")

    # JSON-LD contains track listing, price, release date
    ld_script = soup.find("script", {"type": "application/ld+json"})
    if ld_script:
        try:
            data = json.loads(ld_script.string)
        except json.JSONDecodeError:
            data = {}
    else:
        data = {}

    album = {
        "url": album_url,
        "title": data.get("name"),
        "artist": data.get("byArtist", {}).get("name"),
        "release_date": data.get("datePublished"),
        "num_tracks": data.get("numTracks"),
        "description": data.get("description", "")[:500],
        "tracks": [],
        "price": None,
        "tags": [],
        "scraped_at": datetime.utcnow().isoformat(),
    }

    # Extract minimum price from offers
    offers = data.get("offers", {})
    if isinstance(offers, dict):
        album["price"] = {
            "amount": offers.get("price"),
            "currency": offers.get("priceCurrency"),
            "availability": offers.get("availability", "").split("/")[-1],
        }

    # Track listing
    track_list = data.get("track", {})
    if isinstance(track_list, dict):
        for track in track_list.get("itemListElement", []):
            item = track.get("item", {})
            album["tracks"].append({
                "position": track.get("position"),
                "title": item.get("name"),
                "duration": item.get("duration"),  # ISO 8601 duration
                "url": item.get("@id"),
            })

    # Tags from HTML (not in JSON-LD)
    for tag_el in soup.select(".tralbum-tags .tag"):
        album["tags"].append(tag_el.text.strip())

    # Supporter count from HTML
    supporter_text = soup.select_one(".collected-by-header")
    if supporter_text:
        match = re.search(r"([\d,]+)\s+supporter", supporter_text.text)
        if match:
            album["supporter_count"] = int(match.group(1).replace(",", ""))

    # Credits from HTML
    credits_el = soup.select_one("#trackInfo .tralbum-credits")
    if credits_el:
        album["credits"] = credits_el.get_text("\n", strip=True)[:500]

    return album


def scrape_track(track_url: str, proxy: Optional[str] = None) -> Optional[Dict]:
    """Scrape individual track page."""
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None

    try:
        response = requests.get(track_url, headers=headers, proxies=proxies, timeout=20)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"[ERROR] {e}")
        return None

    soup = BeautifulSoup(response.text, "lxml")
    ld_script = soup.find("script", {"type": "application/ld+json"})

    if not ld_script:
        return None

    try:
        data = json.loads(ld_script.string)
    except json.JSONDecodeError:
        return None

    return {
        "url": track_url,
        "title": data.get("name"),
        "artist": data.get("byArtist", {}).get("name"),
        "album": data.get("inAlbum", {}).get("name"),
        "duration": data.get("duration"),
        "release_date": data.get("datePublished"),
        "tags": [t.text.strip() for t in soup.select(".tralbum-tags .tag")],
        "price": data.get("offers", {}).get("price"),
        "scraped_at": datetime.utcnow().isoformat(),
    }

Scraping Fan Collections

Bandcamp fan profiles are public by default. Each fan page shows purchased albums and wishlisted items — useful for understanding what music a given community actually buys.

Fan collection pages use an internal API endpoint for pagination. The initial HTML embeds the first batch of items in a data-blob attribute.

def scrape_fan_collection(username: str, proxy: Optional[str] = None) -> Optional[Dict]:
    """Scrape a Bandcamp fan's complete purchased collection."""
    url = f"https://bandcamp.com/{username}"
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None

    try:
        response = requests.get(url, headers=headers, proxies=proxies, timeout=20)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"[ERROR] {e}")
        return None

    soup = BeautifulSoup(response.text, "lxml")

    # Fan data is embedded in a data-blob attribute on #pagedata div
    fan_data_div = soup.find("div", {"id": "pagedata"})
    if not fan_data_div or "data-blob" not in fan_data_div.attrs:
        print(f"[WARN] No pagedata blob found for {username}")
        return None

    try:
        blob = json.loads(fan_data_div["data-blob"])
    except json.JSONDecodeError:
        return None

    fan_data = blob.get("fan_data", {})
    fan_id = fan_data.get("fan_id")
    collection_data = blob.get("collection_data", {})

    items = []
    for item_id, item in collection_data.get("items", {}).items():
        items.append({
            "title": item.get("album_title") or item.get("item_title"),
            "artist": item.get("band_name"),
            "url": item.get("item_url"),
            "item_type": item.get("item_type"),
            "purchased": item.get("purchased"),
            "band_url": item.get("band_url"),
        })

    # Paginate through full collection via internal API
    last_token = collection_data.get("last_token")
    page_num = 1

    while last_token:
        api_url = "https://bandcamp.com/api/fancollection/1/collection_items"
        payload = {
            "fan_id": fan_id,
            "older_than_token": last_token,
            "count": 20,
        }

        try:
            api_resp = requests.post(
                api_url,
                json=payload,
                headers={**headers, "Content-Type": "application/json"},
                proxies=proxies,
                timeout=20,
            )
            api_resp.raise_for_status()
            page = api_resp.json()
        except requests.RequestException as e:
            print(f"[ERROR] Pagination request failed on page {page_num}: {e}")
            break

        page_items = page.get("items", [])
        if not page_items:
            break

        for item in page_items:
            items.append({
                "title": item.get("album_title") or item.get("item_title"),
                "artist": item.get("band_name"),
                "url": item.get("item_url"),
                "item_type": item.get("item_type"),
                "purchased": item.get("purchased"),
            })

        last_token = page.get("last_token")
        page_num += 1
        time.sleep(random.uniform(1.0, 2.0))

    return {
        "username": username,
        "fan_id": fan_id,
        "display_name": fan_data.get("name"),
        "location": fan_data.get("location"),
        "collection_count": len(items),
        "items": items,
        "scraped_at": datetime.utcnow().isoformat(),
    }

Scraping Bandcamp Discovery

The Bandcamp discovery page lets you browse by tag — a goldmine for finding music by genre.

def scrape_discovery(tag: str, pages: int = 5, proxy: Optional[str] = None) -> List[Dict]:
    """Scrape Bandcamp discover page for a given tag."""
    all_items = []
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None

    for page in range(1, pages + 1):
        url = f"https://bandcamp.com/discover/{tag}"
        params = {
            "g": "all",
            "s": "top",
            "p": page,
            "gn": 0,
            "f": "digital",
            "lo": "false",
            "hi_ab": "false",
        }

        try:
            resp = requests.get(url, params=params, headers=headers, proxies=proxies, timeout=20)
            resp.raise_for_status()
        except requests.RequestException as e:
            print(f"[ERROR] Discovery page {page} failed: {e}")
            break

        soup = BeautifulSoup(resp.text, "lxml")

        # Discovery items are in .discover-result elements
        for item in soup.select(".discover-result"):
            title_el = item.select_one(".result-title")
            artist_el = item.select_one(".result-info-inner .subhead a")
            link_el = item.select_one("a.art")
            price_el = item.select_one(".result-info .price")

            if title_el:
                all_items.append({
                    "title": title_el.text.strip(),
                    "artist": artist_el.text.strip() if artist_el else None,
                    "url": link_el.get("href") if link_el else None,
                    "price": price_el.text.strip() if price_el else None,
                    "tag": tag,
                    "page": page,
                })

        time.sleep(random.uniform(2.0, 4.0))

    return all_items

Estimating Sales from Public Data

Bandcamp doesn't publish sales numbers, but you can estimate them from publicly visible signals — supporter counts, pricing, and how long the album has been available.

def estimate_album_popularity(album_url: str, proxy: Optional[str] = None) -> Dict:
    """Estimate sales and popularity metrics for an album."""
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None

    try:
        response = requests.get(album_url, headers=headers, proxies=proxies, timeout=20)
        response.raise_for_status()
    except requests.RequestException as e:
        return {"error": str(e), "url": album_url}

    soup = BeautifulSoup(response.text, "lxml")

    # Visible supporter/buyer count
    supporter_text = soup.select_one(".collected-by-header")
    supporter_count = 0
    if supporter_text:
        match = re.search(r"([\d,]+)\s+supporter", supporter_text.text)
        if match:
            supporter_count = int(match.group(1).replace(",", ""))
    else:
        # Count visible thumb images
        thumbs = soup.select(".buy-thumbs .thumb-row a")
        supporter_count = len(thumbs)

    # Recent purchases feed (visible on the page)
    recent_purchases = soup.select(".purchaseinfo .buyer")
    recent_count = len(recent_purchases)

    # Tags for genre context
    tags = [t.text.strip() for t in soup.select(".tralbum-tags .tag")]

    # Release date for velocity calculation
    ld_script = soup.find("script", {"type": "application/ld+json"})
    release_date = None
    min_price = None
    if ld_script:
        try:
            data = json.loads(ld_script.string)
            release_date = data.get("datePublished")
            offers = data.get("offers", {})
            if isinstance(offers, dict):
                min_price = offers.get("price")
        except json.JSONDecodeError:
            pass

    # Estimate: not every buyer shows as supporter
    # Typical ratio is 1 visible : 1.5-3 actual purchasers
    low_estimate = supporter_count
    high_estimate = int(supporter_count * 2.5)

    # If we have a price, estimate revenue range
    revenue_low = round(float(min_price) * low_estimate, 2) if min_price and float(min_price or 0) > 0 else None
    revenue_high = round(float(min_price) * high_estimate, 2) if min_price and float(min_price or 0) > 0 else None

    return {
        "url": album_url,
        "visible_supporters": supporter_count,
        "recent_purchases_visible": recent_count,
        "tags": tags,
        "release_date": release_date,
        "min_price": min_price,
        "estimated_sales_low": low_estimate,
        "estimated_sales_high": high_estimate,
        "estimated_revenue_low": revenue_low,
        "estimated_revenue_high": revenue_high,
    }

Anti-Detection Techniques

Bandcamp's anti-bot protection is moderate. You won't hit CAPTCHAs on public pages in the way Google or Cloudflare would serve them, but aggressive scraping will get your IP temporarily blocked. Here's the full playbook:

1. Realistic Request Timing

Random delays between requests are essential. Humans don't make requests at perfectly even intervals.

def polite_sleep(min_s: float = 1.5, max_s: float = 3.5):
    """Sleep for a human-like random duration."""
    duration = random.uniform(min_s, max_s)
    time.sleep(duration)


def scrape_with_backoff(url: str, proxy: Optional[str] = None, max_retries: int = 5) -> Optional[requests.Response]:
    """Fetch URL with exponential backoff on rate limit responses."""
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None

    for attempt in range(max_retries):
        try:
            resp = requests.get(url, headers=headers, proxies=proxies, timeout=20)

            if resp.status_code == 200:
                return resp
            elif resp.status_code == 429:
                wait = (2 ** attempt) * 5 + random.uniform(0, 5)
                print(f"[RATE LIMIT] Waiting {wait:.1f}s before retry {attempt + 1}/{max_retries}")
                time.sleep(wait)
            elif resp.status_code in (403, 503):
                wait = (2 ** attempt) * 10
                print(f"[BLOCKED] Status {resp.status_code}, waiting {wait:.0f}s")
                time.sleep(wait)
            else:
                print(f"[ERROR] Status {resp.status_code} for {url}")
                return None

        except requests.Timeout:
            print(f"[TIMEOUT] Attempt {attempt + 1} timed out")
            time.sleep(5 * (attempt + 1))
        except requests.RequestException as e:
            print(f"[ERROR] Request failed: {e}")
            return None

    print(f"[FAIL] All {max_retries} attempts exhausted for {url}")
    return None

2. User-Agent Rotation

Bandcamp validates User-Agent headers. Rotating between 3-5 different current browser strings is sufficient for most collection tasks.

3. Session Reuse with Cooldowns

Using a requests.Session for related requests (artist page + all albums) is more realistic than creating new sessions for each request. But reset sessions after 15-20 requests.

def create_session(proxy: Optional[str] = None) -> requests.Session:
    """Create a configured session with realistic headers."""
    session = requests.Session()
    session.headers.update(make_headers(rotate_ua=False))  # Consistent UA within a session
    if proxy:
        session.proxies = {"http": proxy, "https": proxy}
    return session

ThorData Proxy Integration

For any serious Bandcamp data collection — more than a few hundred pages — you need rotating residential proxies. Bandcamp blocks datacenter IP ranges (AWS, GCP, DigitalOcean) fairly aggressively.

ThorData provides residential proxies that route through real ISP-assigned IP addresses. Here's how to integrate:

import os
from typing import Iterator

class ThorDataProxyPool:
    """Rotating proxy pool using ThorData residential proxies."""

    def __init__(self, username: str, password: str, host: str = "gate.thordata.com", port: int = 9000):
        self.username = username
        self.password = password
        self.host = host
        self.port = port

    def get_proxy(self, country: Optional[str] = None, session_id: Optional[str] = None) -> str:
        """Get a proxy URL, optionally geo-targeted."""
        user = self.username
        if country:
            user = f"{self.username}-country-{country.upper()}"
        if session_id:
            user = f"{user}-session-{session_id}"
        return f"http://{user}:{self.password}@{self.host}:{self.port}"

    def get_rotating_proxy(self, country: Optional[str] = None) -> str:
        """Get a fresh rotating proxy (new IP each request)."""
        return self.get_proxy(country=country)

    def get_sticky_proxy(self, session_id: str, country: Optional[str] = None) -> str:
        """Get a sticky proxy (same IP for the session duration)."""
        return self.get_proxy(country=country, session_id=session_id)


def scrape_with_thordata(artist_slugs: List[str], thordata_user: str, thordata_pass: str) -> List[Dict]:
    """Scrape multiple artists using ThorData rotating proxies."""
    pool = ThorDataProxyPool(thordata_user, thordata_pass)
    results = []

    for slug in artist_slugs:
        # Fresh proxy for each artist
        proxy = pool.get_rotating_proxy()

        artist = scrape_artist(slug, proxy=proxy)
        if not artist:
            continue

        results.append(artist)
        print(f"Scraped {slug}: {len(artist.get('albums', []))} albums")

        # Now get each album with the same session proxy for consistency
        session_id = f"artist-{slug[:10]}"
        session_proxy = pool.get_sticky_proxy(session_id)

        for album_info in artist.get("albums", []):
            album_url = album_info.get("url")
            if not album_url:
                continue

            album = scrape_album(album_url, proxy=session_proxy)
            if album:
                artist.setdefault("album_details", []).append(album)
                print(f"  Album: {album.get('title')} — {len(album.get('tracks', []))} tracks")

            polite_sleep(1.5, 3.0)

        polite_sleep(3.0, 6.0)  # Longer pause between artists

    return results

Pagination Handling

Bandcamp artist pages don't paginate (all albums shown on one page), but the fan collection API and the discovery page do.

def paginate_fan_wishlist(fan_id: int, username: str, proxy: Optional[str] = None) -> List[Dict]:
    """Paginate through a fan's full wishlist."""
    headers = make_headers()
    proxies = {"http": proxy, "https": proxy} if proxy else None
    items = []
    cursor = None
    page = 1

    while True:
        api_url = "https://bandcamp.com/api/fancollection/1/wishlist_items"
        payload = {"fan_id": fan_id, "count": 20}
        if cursor:
            payload["older_than_token"] = cursor

        try:
            resp = requests.post(
                api_url,
                json=payload,
                headers={**headers, "Content-Type": "application/json"},
                proxies=proxies,
                timeout=20,
            )
            resp.raise_for_status()
            data = resp.json()
        except requests.RequestException as e:
            print(f"[ERROR] Wishlist page {page} failed: {e}")
            break

        page_items = data.get("items", [])
        if not page_items:
            break

        items.extend(page_items)
        cursor = data.get("last_token")
        print(f"  Wishlist page {page}: {len(page_items)} items (total: {len(items)})")

        if not cursor:
            break

        page += 1
        time.sleep(random.uniform(1.0, 2.0))

    return items

Data Storage

SQLite is the right choice for Bandcamp data — structured enough to query, simple enough to set up.

def init_database(db_path: str = "bandcamp.db") -> sqlite3.Connection:
    """Initialize the Bandcamp data database."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")  # Better concurrent access

    conn.executescript("""
        CREATE TABLE IF NOT EXISTS artists (
            slug TEXT PRIMARY KEY,
            name TEXT,
            location TEXT,
            description TEXT,
            url TEXT,
            album_count INTEGER DEFAULT 0,
            scraped_at TEXT,
            updated_at TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS albums (
            url TEXT PRIMARY KEY,
            artist_slug TEXT,
            title TEXT,
            release_date TEXT,
            num_tracks INTEGER,
            min_price REAL,
            currency TEXT,
            supporter_count INTEGER,
            tags TEXT,  -- JSON array stored as text
            scraped_at TEXT,
            FOREIGN KEY (artist_slug) REFERENCES artists(slug)
        );

        CREATE TABLE IF NOT EXISTS tracks (
            url TEXT PRIMARY KEY,
            album_url TEXT,
            title TEXT,
            position INTEGER,
            duration TEXT,
            scraped_at TEXT,
            FOREIGN KEY (album_url) REFERENCES albums(url)
        );

        CREATE TABLE IF NOT EXISTS fan_collections (
            username TEXT PRIMARY KEY,
            fan_id INTEGER,
            display_name TEXT,
            location TEXT,
            collection_count INTEGER,
            scraped_at TEXT
        );

        CREATE TABLE IF NOT EXISTS collection_items (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            username TEXT,
            title TEXT,
            artist TEXT,
            item_url TEXT,
            item_type TEXT,
            purchased TEXT,
            FOREIGN KEY (username) REFERENCES fan_collections(username)
        );

        CREATE INDEX IF NOT EXISTS idx_albums_artist ON albums(artist_slug);
        CREATE INDEX IF NOT EXISTS idx_tracks_album ON tracks(album_url);
        CREATE INDEX IF NOT EXISTS idx_items_username ON collection_items(username);
    """)

    conn.commit()
    return conn


def save_artist(conn: sqlite3.Connection, artist: Dict):
    """Save artist data to database."""
    conn.execute(
        """INSERT OR REPLACE INTO artists (slug, name, location, description, url, album_count, scraped_at)
           VALUES (?, ?, ?, ?, ?, ?, ?)""",
        (
            artist.get("slug"),
            artist.get("name"),
            artist.get("location"),
            artist.get("description"),
            artist.get("url"),
            len(artist.get("albums", [])),
            artist.get("scraped_at"),
        )
    )
    conn.commit()


def save_album(conn: sqlite3.Connection, album: Dict, artist_slug: str):
    """Save album data including tracks."""
    price_info = album.get("price", {}) or {}
    conn.execute(
        """INSERT OR REPLACE INTO albums
           (url, artist_slug, title, release_date, num_tracks, min_price, currency, supporter_count, tags, scraped_at)
           VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
        (
            album.get("url"),
            artist_slug,
            album.get("title"),
            album.get("release_date"),
            album.get("num_tracks"),
            price_info.get("amount"),
            price_info.get("currency"),
            album.get("supporter_count"),
            json.dumps(album.get("tags", [])),
            album.get("scraped_at"),
        )
    )

    for track in album.get("tracks", []):
        conn.execute(
            """INSERT OR REPLACE INTO tracks (url, album_url, title, position, duration, scraped_at)
               VALUES (?, ?, ?, ?, ?, ?)""",
            (
                track.get("url"),
                album.get("url"),
                track.get("title"),
                track.get("position"),
                track.get("duration"),
                album.get("scraped_at"),
            )
        )

    conn.commit()


def query_top_albums_by_supporters(conn: sqlite3.Connection, limit: int = 20) -> List[Dict]:
    """Get albums ranked by supporter count."""
    rows = conn.execute(
        """SELECT a.title, ar.name, a.supporter_count, a.min_price, a.tags, a.url
           FROM albums a
           JOIN artists ar ON a.artist_slug = ar.slug
           WHERE a.supporter_count IS NOT NULL
           ORDER BY a.supporter_count DESC
           LIMIT ?""",
        (limit,)
    ).fetchall()

    return [
        {
            "title": r[0],
            "artist": r[1],
            "supporters": r[2],
            "min_price": r[3],
            "tags": json.loads(r[4]) if r[4] else [],
            "url": r[5],
        }
        for r in rows
    ]

Full Production Pipeline

Putting it all together — a complete pipeline that scrapes artist discographies, stores results, and handles failures gracefully:

def run_discography_pipeline(
    artist_slugs: List[str],
    db_path: str = "bandcamp.db",
    proxy: Optional[str] = None,
    max_albums_per_artist: int = 50,
):
    """Complete pipeline: artists → albums → tracks → database."""
    conn = init_database(db_path)
    stats = {"artists": 0, "albums": 0, "tracks": 0, "errors": 0}

    for slug in artist_slugs:
        print(f"\n[ARTIST] {slug}")

        # Check if already scraped recently
        existing = conn.execute(
            "SELECT scraped_at FROM artists WHERE slug = ?", (slug,)
        ).fetchone()
        if existing:
            # Skip if scraped within 7 days
            from datetime import datetime, timedelta
            scraped = datetime.fromisoformat(existing[0])
            if datetime.utcnow() - scraped < timedelta(days=7):
                print(f"  Skipping — scraped {existing[0][:10]}")
                continue

        artist = scrape_artist(slug, proxy=proxy)
        if not artist:
            stats["errors"] += 1
            continue

        save_artist(conn, artist)
        stats["artists"] += 1
        print(f"  {artist.get('name')} — {len(artist.get('albums', []))} albums")

        albums_scraped = 0
        for album_info in artist.get("albums", [])[:max_albums_per_artist]:
            album_url = album_info.get("url")
            if not album_url:
                continue

            # Ensure absolute URL
            if not album_url.startswith("http"):
                album_url = f"https://{slug}.bandcamp.com{album_url}"

            # Check if already have this album
            existing_album = conn.execute(
                "SELECT url FROM albums WHERE url = ?", (album_url,)
            ).fetchone()
            if existing_album:
                print(f"    [SKIP] {album_info.get('title')}")
                continue

            album = scrape_album(album_url, proxy=proxy)
            if album:
                save_album(conn, album, slug)
                stats["albums"] += 1
                stats["tracks"] += len(album.get("tracks", []))
                print(f"    {album.get('title')} — {len(album.get('tracks', []))} tracks")
                albums_scraped += 1

            polite_sleep(2.0, 4.0)

        polite_sleep(5.0, 10.0)  # Longer pause between artists

    conn.close()
    print(f"\nDone: {stats['artists']} artists, {stats['albums']} albums, {stats['tracks']} tracks, {stats['errors']} errors")
    return stats


# Real-world use: scrape a genre's top artists
if __name__ == "__main__":
    # Example artists in lo-fi/indie scene
    artists = [
        "bedepartment",
        "godhatesnoone",
        "yseultofficial",
        "lapfrancesita",
    ]

    # With ThorData proxy (replace with your credentials)
    # proxy = "http://YOUR_USER:[email protected]:9000"
    proxy = None  # Remove this line when using ThorData

    run_discography_pipeline(artists, proxy=proxy)

Real-World Use Cases

Music Trend Tracker: Monitor which tags are gaining supporters fastest over time. Pull discovery page data weekly, track supporter velocity per album, and surface emerging sounds before they hit mainstream media coverage.

A&R Intelligence Tool: Scrape new releases in target genres, filter by fan engagement (supporters > N, recent purchases visible), and surface artists worth signing before they get too expensive.

Pricing Strategy Research: Collect album prices across genres to understand what fans pay by style. Bandcamp's "name your price" model means you see both minimum prices and what fans actually choose to pay (partially visible from supporter data).

Fan Network Mapping: Scrape fan collections in a niche community to build recommendation graphs. Fans who bought Album A also bought Album B reveals taste clusters that traditional collaborative filtering misses.

Label Portfolio Monitoring: Track all artists on a given label (identifiable by shared domain patterns), monitor their release activity, and alert on sales spikes.

Bandcamp's Terms of Service prohibit automated access. This code is for educational purposes. Don't use it to replicate Bandcamp's catalog, violate copyright, or harm artists. If you're building something that surfaces Bandcamp data, consider whether there's a way to do it that supports the artists rather than just extracting value from them.

Public pages contain publicly visible data, but scale and intent matter for legal risk. Scrape responsibly, cache aggressively, and don't re-fetch data you already have.

Use ThorData's residential proxy service to distribute requests naturally across different IP addresses and avoid putting load on Bandcamp's servers from a single source — this is both more respectful and more reliable.