How to Scrape Eventbrite Events in 2026: Listings, Prices & Organizer Data
How to Scrape Eventbrite Events in 2026: Listings, Prices & Organizer Data
Eventbrite is the biggest public events platform on the web. Millions of listings — concerts, conferences, workshops, fundraisers — all with structured data attached: ticket prices, venue details, organizer history, category tags, and sometimes public attendance numbers. If you're building an events aggregator, doing market research on a niche, tracking what's happening in a city over time, or monitoring competitors' events — Eventbrite is an obvious data source.
The good news: Eventbrite has a real API with decent documentation. The bad news: the API doesn't expose everything. Attendee counts shown publicly on event pages, organizer event histories, and sold-out signals are only available by scraping the actual HTML. So you need both approaches.
This guide covers the complete workflow: API setup, event search with pagination, ticket price extraction, organizer analytics, HTML scraping for what the API misses, anti-bot handling, proxy configuration, and data storage with SQLite.
What Data You Can Get
Through the API and direct scraping combined, you can collect:
From the API: - Event name, description, start/end datetime, status, URL, category, subcategory - Ticket class names, costs, currency, availability status, min/max per order, quantity sold - Venue name, address, city, country, GPS coordinates - Organizer name, description, website, follower count - Currency and event format (online vs in-person)
From the HTML (API doesn't expose these): - The "X people are going" public attendance count - Sold-out status before the API reflects it - Event tags and keywords visible on the public page - Organizer's complete historical event list - Real-time availability signals
Eventbrite API v3
You need an OAuth token. Go to eventbrite.com/platform and create an app — you get a private token immediately. No approval process for read-only access.
Rate limits:
- 2000 requests per hour (generous)
- Max 50 items per page
- Responses include Retry-After header on 429s
pip install requests beautifulsoup4
Base URL: https://www.eventbriteapi.com/v3/
Key endpoints:
GET /events/search/ - search events by location, keyword, category, date
GET /events/{id}/ - single event details
GET /events/{id}/ticket_classes/ - all ticket types and prices for an event
GET /events/{id}/attendees/ - attendee list (requires organizer token)
GET /venues/{id}/ - venue detail
GET /organizers/{id}/ - organizer profile
GET /organizers/{id}/events/ - all events by an organizer
GET /users/me/ - your own profile (for validation)
Full Python Client
import requests
import time
import random
import json
import sqlite3
import logging
from datetime import datetime
from typing import Optional
from bs4 import BeautifulSoup
from pathlib import Path
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s"
)
logger = logging.getLogger(__name__)
EB_TOKEN = "YOUR_PRIVATE_TOKEN"
EB_BASE = "https://www.eventbriteapi.com/v3"
EB_HEADERS = {"Authorization": f"Bearer {EB_TOKEN}"}
def api_get(
endpoint: str,
params: dict = None,
max_retries: int = 4,
) -> Optional[dict]:
"""
Eventbrite API GET request with rate limit handling.
endpoint: path after /v3/ (e.g., 'events/search/')
"""
url = f"{EB_BASE}/{endpoint.lstrip('/')}"
for attempt in range(max_retries):
try:
resp = requests.get(url, headers=EB_HEADERS, params=params, timeout=20)
if resp.status_code == 200:
return resp.json()
elif resp.status_code == 429:
wait = int(resp.headers.get("retry-after", 60))
logger.warning(f"Rate limited, waiting {wait}s")
time.sleep(wait)
continue
elif resp.status_code == 401:
logger.error("Unauthorized — check API token")
return None
elif resp.status_code == 404:
logger.debug(f"Not found: {endpoint}")
return None
elif resp.status_code == 403:
logger.warning(f"Forbidden: {endpoint}")
return None
else:
logger.warning(f"HTTP {resp.status_code}: {endpoint}")
time.sleep(2 ** attempt)
except requests.Timeout:
wait = 2 ** attempt + 2
logger.warning(f"Timeout, retry in {wait}s (attempt {attempt+1})")
time.sleep(wait)
except requests.RequestException as e:
logger.error(f"Request error: {e}")
time.sleep(5)
logger.error(f"Failed after {max_retries} attempts: {endpoint}")
return None
Event Search with Full Pagination
The API returns 50 results per page by default (max 50). Use the page parameter and check pagination.has_more_items to walk through results:
def search_events(
location: str = None,
keyword: str = "",
max_pages: int = 20,
start_date_min: str = None,
start_date_max: str = None,
categories: str = None,
formats: str = None,
is_free: bool = None,
sort_by: str = "date",
) -> list:
"""
Search Eventbrite events with pagination.
location: city name or lat,long string
start_date_min/max: ISO 8601 format "2026-01-01T00:00:00Z"
categories: comma-separated category IDs
formats: comma-separated format IDs (e.g., "1" = seminar, "11" = conference)
sort_by: date, sales, relevance, modified, published, id
"""
results = []
page = 1
while page <= max_pages:
params = {
"q": keyword,
"expand": "organizer,venue,ticket_classes,format,category",
"page": page,
"page_size": 50,
"sort_by": sort_by,
"status": "live",
}
if location:
params["location.address"] = location
params["location.within"] = "50km" # radius
if start_date_min:
params["start_date.range_start"] = start_date_min
if start_date_max:
params["start_date.range_end"] = start_date_max
if categories:
params["categories"] = categories
if formats:
params["formats"] = formats
if is_free is not None:
params["price"] = "free" if is_free else "paid"
data = api_get("events/search/", params=params)
if not data:
logger.warning(f"No data on page {page}")
break
events = data.get("events", [])
for event in events:
results.append(extract_event(event))
pagination = data.get("pagination", {})
page_count = pagination.get("page_count", 1)
object_count = pagination.get("object_count", 0)
logger.info(
f"Page {page}/{page_count}: "
f"{len(events)} events "
f"(total: {object_count})"
)
if not pagination.get("has_more_items", False):
break
page += 1
time.sleep(0.6)
return results
def extract_event(event: dict) -> dict:
"""Extract fields from an Eventbrite API event object."""
venue = event.get("venue") or {}
organizer = event.get("organizer") or {}
ticket_classes = event.get("ticket_classes") or []
category = event.get("category") or {}
event_format = event.get("format") or {}
address = venue.get("address") or {}
# Calculate min/max ticket prices
prices = []
free_available = False
for tc in ticket_classes:
if tc.get("free"):
free_available = True
else:
cost = tc.get("cost") or {}
val = cost.get("major_value")
if val:
try:
prices.append(float(val))
except (ValueError, TypeError):
pass
# Ticket availability
available_count = sum(
(tc.get("quantity_total", 0) or 0) - (tc.get("quantity_sold", 0) or 0)
for tc in ticket_classes
)
return {
"id": event.get("id"),
"name": (event.get("name") or {}).get("text"),
"description": ((event.get("description") or {}).get("text") or "")[:500],
"url": event.get("url"),
"start": (event.get("start") or {}).get("local"),
"end": (event.get("end") or {}).get("local"),
"timezone": (event.get("start") or {}).get("timezone"),
"status": event.get("status"),
"is_free": event.get("is_free", False) or free_available,
"is_online": event.get("online_event", False),
"currency": event.get("currency"),
"capacity": event.get("capacity"),
"category_name": category.get("name"),
"category_id": category.get("id"),
"format_name": event_format.get("name"),
"organizer_name": organizer.get("name"),
"organizer_id": organizer.get("id"),
"organizer_url": organizer.get("url"),
"venue_name": venue.get("name"),
"venue_address": address.get("localized_address_display"),
"venue_city": address.get("city"),
"venue_country": address.get("country"),
"venue_lat": address.get("latitude"),
"venue_lng": address.get("longitude"),
"min_ticket_price": min(prices) if prices else None,
"max_ticket_price": max(prices) if prices else None,
"ticket_count": len(ticket_classes),
"tickets_available": available_count,
}
Fetching Ticket Details
def get_ticket_classes(event_id: str) -> list:
"""Fetch all ticket types and prices for an event."""
data = api_get(f"events/{event_id}/ticket_classes/")
if not data:
return []
tickets = []
for tc in data.get("ticket_classes", []):
cost = tc.get("cost") or {}
display_price = tc.get("display_price")
fee = tc.get("fee") or {}
tickets.append({
"name": tc.get("name"),
"description": (tc.get("description") or "")[:300],
"free": tc.get("free", False),
"price": cost.get("major_value"),
"fee_value": fee.get("major_value"),
"display_price": display_price,
"currency": cost.get("currency"),
"available": tc.get("on_sale_status") == "AVAILABLE",
"on_sale_status": tc.get("on_sale_status"),
"quantity_total": tc.get("quantity_total"),
"quantity_sold": tc.get("quantity_sold"),
"minimum_quantity": tc.get("minimum_quantity"),
"maximum_quantity": tc.get("maximum_quantity"),
"sales_start": tc.get("sales_start"),
"sales_end": tc.get("sales_end"),
"hidden": tc.get("hidden", False),
})
return tickets
def get_organizer_events(
organizer_id: str,
max_pages: int = 10,
include_past: bool = True,
) -> list:
"""Fetch all events by an organizer (past and upcoming)."""
all_events = []
page = 1
status_filter = "all" if include_past else "live"
while page <= max_pages:
data = api_get(
f"organizers/{organizer_id}/events/",
params={
"status": status_filter,
"page": page,
"page_size": 50,
"expand": "ticket_classes,venue",
}
)
if not data:
break
events = data.get("events", [])
if not events:
break
for e in events:
all_events.append(extract_event(e))
if not data.get("pagination", {}).get("has_more_items", False):
break
page += 1
time.sleep(0.5)
return all_events
def get_organizer_profile(organizer_id: str) -> Optional[dict]:
"""Fetch organizer profile data."""
data = api_get(f"organizers/{organizer_id}/")
if not data:
return None
return {
"id": data.get("id"),
"name": data.get("name"),
"description": (data.get("description") or {}).get("text", "")[:500],
"url": data.get("url"),
"website": data.get("website"),
"facebook": data.get("facebook"),
"twitter": data.get("twitter"),
"instagram": data.get("instagram"),
"num_past_events": data.get("num_past_events"),
"num_future_events": data.get("num_future_events"),
}
Scraping What the API Misses
Two things the API doesn't reliably surface: the public "X people are going" count shown on event pages, and sold-out status before the API catches up:
SCRAPE_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/126.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,*/*;q=0.8",
}
def make_scrape_session(proxy_url: str = None) -> requests.Session:
"""Create a session with browser-like headers."""
session = requests.Session()
session.headers.update({
**SCRAPE_HEADERS,
"User-Agent": random.choice([
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
])
})
if proxy_url:
session.proxies = {"http": proxy_url, "https": proxy_url}
return session
def scrape_event_page(
event_url: str,
session: requests.Session,
) -> dict:
"""
Scrape public event page for fields the API doesn't expose.
Returns enrichment data to merge with API records.
"""
try:
resp = session.get(event_url, timeout=20)
resp.raise_for_status()
except requests.RequestException as e:
logger.debug(f"Failed to scrape {event_url}: {e}")
return {}
soup = BeautifulSoup(resp.text, "html.parser")
result = {}
# Attendee count - shown as "X people are going" or similar
attendee_count = None
for text_node in soup.find_all(string=True):
text = text_node.strip()
if "people are going" in text.lower() or "attendees" in text.lower():
import re
match = re.search(r"([\d,]+)", text)
if match:
try:
attendee_count = int(match.group(1).replace(",", ""))
break
except ValueError:
pass
result["attendee_count"] = attendee_count
# Sold out signal — check multiple selectors as Eventbrite tests layouts
sold_out_signals = [
soup.find(string=lambda t: t and "Sold Out" in t),
soup.select_one(".ticket-status-sold-out"),
soup.select_one("[data-testid='sold-out']"),
]
result["sold_out"] = any(s is not None for s in sold_out_signals)
# Event tags
tags = []
for tag_el in soup.select(".tags-item a, [data-spec='tags'] a, .event-tags a"):
text = tag_el.get_text(strip=True)
if text:
tags.append(text)
result["page_tags"] = tags
# Organizer follower count (sometimes visible on page)
import re
for el in soup.select("span, div"):
text = el.get_text(strip=True)
if re.search(r"[\d,]+ follower", text, re.I):
match = re.search(r"([\d,]+)", text)
if match:
result["organizer_followers"] = int(match.group(1).replace(",", ""))
break
# Related events count
related = soup.select(".related-events__item, [data-testid='related-events'] li")
result["related_events_count"] = len(related)
return result
def enrich_events_with_scraping(
events: list,
session: requests.Session,
delay_range: tuple = (1.5, 4.0),
max_enrich: int = None,
) -> list:
"""
Enrich API event records with data scraped from pages.
Modifies events in-place, returns enriched list.
"""
target = events[:max_enrich] if max_enrich else events
enriched = 0
for i, event in enumerate(target):
url = event.get("url")
if not url:
continue
page_data = scrape_event_page(url, session)
if page_data:
event.update(page_data)
enriched += 1
# Randomized human-like delay
time.sleep(random.uniform(*delay_range))
if (i + 1) % 20 == 0:
logger.info(f"Enriched {i+1}/{len(target)} events ({enriched} successful)")
logger.info(f"Enrichment complete: {enriched}/{len(target)} events enriched")
return events
Anti-Bot Handling
The API is straightforward — 2000 requests per hour, generous. Keep time.sleep(0.5) between calls and you'll never hit it.
The web pages are different. Eventbrite runs Cloudflare on public pages. Direct requests calls with a default user agent get blocked with 403s or CAPTCHA pages. Session cookies from a browser help, but they expire.
When scraping event pages at any volume, rotating residential proxies are the practical solution. ThorData residential proxies don't trigger Cloudflare as easily as datacenter IPs:
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
def get_proxy_url(country: str = "US", session_id: str = None) -> str:
"""Build ThorData proxy URL."""
user = THORDATA_USER
if country:
user += f"-country-{country}"
if session_id:
user += f"-session-{session_id}"
return f"http://{user}:{THORDATA_PASS}@proxy.thordata.com:9000"
def scrape_events_with_rotation(
events: list,
max_per_session: int = 25,
) -> list:
"""
Scrape event pages with periodic IP rotation.
Creates new session every max_per_session pages.
"""
session_num = 0
proxy = get_proxy_url(session_id=f"eb-{session_num}")
session = make_scrape_session(proxy_url=proxy)
enriched_events = []
for i, event in enumerate(events):
# Rotate proxy every N pages
if i > 0 and i % max_per_session == 0:
session.close()
session_num += 1
proxy = get_proxy_url(session_id=f"eb-{session_num}")
session = make_scrape_session(proxy_url=proxy)
logger.info(f"Rotated to session {session_num}")
url = event.get("url")
if url:
enrichment = scrape_event_page(url, session)
event.update(enrichment)
enriched_events.append(event)
time.sleep(random.uniform(1.5, 4.0))
session.close()
return enriched_events
Data Storage with SQLite
def init_database(db_path: str = "eventbrite.db") -> sqlite3.Connection:
"""Initialize SQLite schema for Eventbrite data."""
conn = sqlite3.connect(db_path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS events (
id TEXT PRIMARY KEY,
name TEXT,
description TEXT,
url TEXT,
start_dt TEXT,
end_dt TEXT,
timezone TEXT,
status TEXT,
is_free INTEGER DEFAULT 0,
is_online INTEGER DEFAULT 0,
currency TEXT,
capacity INTEGER,
category_name TEXT,
category_id TEXT,
format_name TEXT,
organizer_name TEXT,
organizer_id TEXT,
organizer_url TEXT,
venue_name TEXT,
venue_address TEXT,
venue_city TEXT,
venue_country TEXT,
venue_lat REAL,
venue_lng REAL,
min_ticket_price REAL,
max_ticket_price REAL,
ticket_count INTEGER DEFAULT 0,
tickets_available INTEGER DEFAULT 0,
attendee_count INTEGER,
sold_out INTEGER DEFAULT 0,
page_tags TEXT,
organizer_followers INTEGER,
search_location TEXT,
search_keyword TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_city ON events(venue_city);
CREATE INDEX IF NOT EXISTS idx_category ON events(category_name);
CREATE INDEX IF NOT EXISTS idx_organizer ON events(organizer_id);
CREATE INDEX IF NOT EXISTS idx_start ON events(start_dt);
CREATE INDEX IF NOT EXISTS idx_free ON events(is_free);
CREATE TABLE IF NOT EXISTS ticket_classes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id TEXT,
name TEXT,
free INTEGER DEFAULT 0,
price REAL,
currency TEXT,
on_sale_status TEXT,
quantity_total INTEGER,
quantity_sold INTEGER,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_tc_event ON ticket_classes(event_id);
CREATE TABLE IF NOT EXISTS organizers (
id TEXT PRIMARY KEY,
name TEXT,
description TEXT,
url TEXT,
website TEXT,
num_past_events INTEGER,
num_future_events INTEGER,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
""")
conn.commit()
return conn
def save_events(
conn: sqlite3.Connection,
events: list,
search_location: str = "",
search_keyword: str = "",
) -> int:
"""Save event records."""
saved = 0
for event in events:
try:
conn.execute("""
INSERT OR REPLACE INTO events
(id, name, description, url, start_dt, end_dt, timezone,
status, is_free, is_online, currency, capacity,
category_name, category_id, format_name, organizer_name,
organizer_id, organizer_url, venue_name, venue_address,
venue_city, venue_country, venue_lat, venue_lng,
min_ticket_price, max_ticket_price, ticket_count,
tickets_available, attendee_count, sold_out, page_tags,
organizer_followers, search_location, search_keyword)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
event.get("id"), event.get("name"), event.get("description"),
event.get("url"), event.get("start"), event.get("end"),
event.get("timezone"), event.get("status"),
int(event.get("is_free", False)),
int(event.get("is_online", False)),
event.get("currency"), event.get("capacity"),
event.get("category_name"), event.get("category_id"),
event.get("format_name"), event.get("organizer_name"),
event.get("organizer_id"), event.get("organizer_url"),
event.get("venue_name"), event.get("venue_address"),
event.get("venue_city"), event.get("venue_country"),
event.get("venue_lat"), event.get("venue_lng"),
event.get("min_ticket_price"), event.get("max_ticket_price"),
event.get("ticket_count"), event.get("tickets_available"),
event.get("attendee_count"),
int(event.get("sold_out", False)),
json.dumps(event.get("page_tags", [])),
event.get("organizer_followers"),
search_location, search_keyword,
))
saved += 1
except sqlite3.Error as e:
logger.error(f"DB error saving event {event.get('id')}: {e}")
conn.commit()
return saved
def save_ticket_classes(conn: sqlite3.Connection, event_id: str, tickets: list) -> None:
"""Save ticket class records for an event."""
for tc in tickets:
try:
conn.execute("""
INSERT OR REPLACE INTO ticket_classes
(event_id, name, free, price, currency, on_sale_status,
quantity_total, quantity_sold)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
event_id, tc.get("name"), int(tc.get("free", False)),
tc.get("price"), tc.get("currency"),
tc.get("on_sale_status"),
tc.get("quantity_total"), tc.get("quantity_sold"),
))
except sqlite3.Error as e:
logger.error(f"DB error saving ticket class: {e}")
conn.commit()
Analytics Queries
def get_category_stats(conn: sqlite3.Connection, city: str = None) -> list:
"""Get event statistics by category."""
where = "WHERE venue_city = ?" if city else ""
params = (city,) if city else ()
cursor = conn.execute(f"""
SELECT
category_name,
COUNT(*) as event_count,
AVG(CASE WHEN is_free = 0 THEN min_ticket_price END) as avg_paid_price,
SUM(CASE WHEN is_free = 1 THEN 1 ELSE 0 END) as free_count,
AVG(attendee_count) as avg_attendees,
SUM(CASE WHEN sold_out = 1 THEN 1 ELSE 0 END) as sold_out_count
FROM events
{where}
GROUP BY category_name
HAVING event_count >= 5
ORDER BY event_count DESC
""", params)
return [dict(zip([d[0] for d in cursor.description], row)) for row in cursor.fetchall()]
def get_price_distribution(conn: sqlite3.Connection, city: str = None) -> dict:
"""Get ticket price distribution stats."""
where = "WHERE is_free = 0 AND min_ticket_price IS NOT NULL"
if city:
where += " AND venue_city = ?"
params = (city,) if city else ()
cursor = conn.execute(f"""
SELECT
COUNT(*) as count,
AVG(min_ticket_price) as avg,
MIN(min_ticket_price) as min,
MAX(min_ticket_price) as max
FROM events {where}
""", params)
row = cursor.fetchone()
return {
"count": row[0],
"avg_min_price": round(row[1] or 0, 2),
"min_price": round(row[2] or 0, 2),
"max_price": round(row[3] or 0, 2),
}
def get_top_organizers(conn: sqlite3.Connection, limit: int = 20) -> list:
"""Get organizers with most events."""
cursor = conn.execute("""
SELECT
organizer_name,
organizer_id,
COUNT(*) as event_count,
AVG(attendee_count) as avg_attendees,
SUM(CASE WHEN is_free = 1 THEN 1 ELSE 0 END) as free_events,
SUM(CASE WHEN sold_out = 1 THEN 1 ELSE 0 END) as sold_out_events
FROM events
WHERE organizer_name IS NOT NULL
GROUP BY organizer_id
ORDER BY event_count DESC
LIMIT ?
""", (limit,))
return [dict(zip([d[0] for d in cursor.description], row)) for row in cursor.fetchall()]
Complete Pipeline
def run_city_events_pipeline(
city: str,
keyword: str = "",
max_pages: int = 20,
enrich_with_scraping: bool = True,
proxy_url: str = None,
db_path: str = "eventbrite.db",
) -> None:
"""
Complete Eventbrite data collection pipeline for a city.
"""
conn = init_database(db_path)
# 1. Collect events via API
logger.info(f"Collecting events for: {city} keyword='{keyword}'")
events = search_events(
location=city,
keyword=keyword,
max_pages=max_pages,
)
logger.info(f"API returned {len(events)} events")
# 2. Fetch ticket details for each event
for i, event in enumerate(events):
event_id = event.get("id")
if event_id:
tickets = get_ticket_classes(event_id)
if tickets:
save_ticket_classes(conn, event_id, tickets)
# Update event with ticket stats
sold_qty = sum(tc.get("quantity_sold") or 0 for tc in tickets)
total_qty = sum(tc.get("quantity_total") or 0 for tc in tickets)
event["tickets_sold"] = sold_qty
event["tickets_total"] = total_qty
if (i + 1) % 50 == 0:
logger.info(f"Ticket fetch progress: {i+1}/{len(events)}")
time.sleep(1.0)
# 3. Enrich with HTML scraping (optional)
if enrich_with_scraping:
logger.info("Enriching with page scraping...")
session = make_scrape_session(proxy_url=proxy_url)
events = enrich_events_with_scraping(events, session, max_enrich=200)
session.close()
# 4. Save all events
saved = save_events(conn, events, search_location=city, search_keyword=keyword)
logger.info(f"Saved {saved}/{len(events)} events")
# 5. Print summary
print(f"\n=== {city} Events Summary ===")
category_stats = get_category_stats(conn, city=city)
print("\nTop Categories:")
for cat in category_stats[:10]:
print(
f" {(cat['category_name'] or 'Unknown'):<25} "
f"{cat['event_count']:>5} events "
f"avg ${cat['avg_paid_price'] or 0:.0f}"
)
price_stats = get_price_distribution(conn, city=city)
print(f"\nPaid Event Pricing:")
print(f" Avg min price: ${price_stats['avg_min_price']:.2f}")
print(f" Range: ${price_stats['min_price']:.2f} - ${price_stats['max_price']:.2f}")
conn.close()
if __name__ == "__main__":
run_city_events_pipeline(
city="New York",
keyword="",
max_pages=20,
enrich_with_scraping=True,
proxy_url=get_proxy_url(country="US") if True else None,
db_path="nyc_events.db",
)
What Changes Going Forward
Eventbrite's HTML structure shifts occasionally. The attendee count selector has moved between redesigns — treat API data as primary and HTML scraping as best-effort enrichment. If a page scrape fails, keep the API record and move on; don't block the pipeline on it.
The API itself is stable. The v3 endpoints have been consistent since 2020 with only additive changes. Rate limits are generous (2000/hour) and the schema is well-documented.
For building events intelligence tools, the combination of API + selective HTML scraping gives you: - Complete geographic coverage (any city Eventbrite operates in) - Full ticket pricing and availability data - Organizer performance tracking over time - Attendance signals for demand forecasting - Category-level market analysis
The sold quantity_sold field from ticket classes is particularly underappreciated — it tells you how many tickets have moved on paid events, which is a useful demand signal well before events sell out.