Scraping Airbnb Listings with Playwright and API Interception (2026)
Scraping Airbnb Listings with Playwright and API Interception (2026)
Airbnb is one of the more interesting scraping targets in 2026. The site has no public API, but its frontend loads rich JSON from internal endpoints — pricing calendars, review threads, host profiles, availability windows — all of it. The problem is that Airbnb has invested heavily in bot detection. Direct HTTP requests get flagged fast. The practical solution is to drive a real browser with Playwright and intercept the API responses as the page loads them naturally.
This guide walks through that exact approach: async Playwright, request interception, and how to structure what you capture into something useful.
What Data Is Available
Airbnb exposes more structured data than most people realize. Through browser interception you can collect:
- Property listings — name, type, location coordinates, photo URLs, amenity tags, bedroom/bathroom counts, superhost status
- Pricing — nightly rates by date, cleaning fees, service fees, total for a given stay
- Availability calendar — which dates are blocked, which are available, minimum-stay rules
- Reviews — individual review text, per-category star ratings (accuracy, cleanliness, communication, location, value), reviewer profiles
- Host profiles — join date, review count, response rate, response time, languages spoken, other listings
- Search results metadata — pagination cursors, total result count, map bounds
The calendar and review data in particular are difficult to scrape by parsing HTML — Airbnb renders them via JavaScript after page load. API interception sidesteps that entirely.
Anti-Bot Landscape
Before writing any code, understand what you're up against:
Cloudflare. Airbnb runs behind Cloudflare with bot score evaluation on nearly every route. This catches datacenter IPs, unusual request timing, and certain TLS patterns instantly.
Kasada / Shape Security. Airbnb has used Shape Security (now part of F5) for behavioral fingerprinting at the application layer. This runs inside the browser JavaScript and monitors mouse movements, keyboard cadence, scroll behavior, and event timing. Headless browsers without behavioral simulation get flagged.
TLS fingerprinting. The TLS handshake your HTTP client presents identifies your tool. Python's requests and httpx have recognizable TLS fingerprints that differ from real Chrome or Firefox. Playwright running actual Chromium sidesteps this because the browser handles the TLS layer.
Device fingerprinting. Canvas, WebGL, AudioContext, screen resolution, installed fonts — all of these are probed by Airbnb's JavaScript to build a device fingerprint. Default Playwright (without stealth patches) has known fingerprint values that get detected.
Rate limiting. Aggressive rate limiting kicks in well before you'd notice by eye. Rotating IPs and pacing requests are non-negotiable for any volume.
Setting Up
Install dependencies:
pip install playwright playwright-stealth httpx
playwright install chromium
Base Browser Setup
import asyncio
import json
import random
import sqlite3
from datetime import datetime
from playwright.async_api import async_playwright, Page, BrowserContext
from playwright_stealth import stealth_async
STEALTH_INIT_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined, configurable: true });
Object.defineProperty(navigator, 'plugins', {
get: () => [
{ name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer', description: 'Portable Document Format' },
{ name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai', description: '' },
{ name: 'Native Client', filename: 'internal-nacl-plugin', description: '' },
]
});
window.chrome = { runtime: {}, loadTimes: function() {}, csi: function() {}, app: {} };
"""
async def create_browser(proxy_url: str = None, headless: bool = True):
"""Create a hardened Playwright browser context."""
p = await async_playwright().start()
launch_args = [
"--no-sandbox",
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--disable-infobars",
"--window-size=1440,900",
"--disable-extensions",
]
browser = await p.chromium.launch(
headless=headless,
args=launch_args,
proxy={"server": proxy_url} if proxy_url else None,
)
context = await browser.new_context(
viewport={"width": 1440, "height": 900},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
locale="en-US",
timezone_id="America/New_York",
color_scheme="light",
device_scale_factor=1.0,
)
# Inject stealth scripts on every new page
await context.add_init_script(STEALTH_INIT_SCRIPT)
return p, browser, context
async def open_page(context: BrowserContext) -> Page:
"""Open a new page with stealth applied."""
page = await context.new_page()
await stealth_async(page)
return page
Intercepting API Responses
Airbnb's internal GraphQL API responds to recognizable URL patterns. The key is to register response handlers before navigation starts:
class AirbnbResponseCollector:
"""Collects and categorizes Airbnb API responses during page navigation."""
def __init__(self):
self.search_results = []
self.calendar_data = {}
self.reviews = []
self.listing_details = {}
self.errors = []
async def handle_response(self, response):
url = response.url
if response.status != 200:
return
# Skip non-JSON responses
content_type = response.headers.get("content-type", "")
if "json" not in content_type and "javascript" not in content_type:
return
try:
body = await response.json()
except Exception:
return
# Search results endpoint
if "ExploreSearch" in url or "StaysSearch" in url:
self._parse_search_results(body)
# Calendar availability
elif "CalendarMonths" in url or "PdpAvailabilityCalendar" in url:
self._parse_calendar(body)
# Reviews
elif "PdpReviews" in url or "StaysPdpReviews" in url:
self._parse_reviews(body)
# Listing details
elif "StaysPdpSections" in url or "PdpPlatformSections" in url:
self._parse_listing_details(body)
def _parse_search_results(self, body: dict):
"""Extract listing summaries from search response."""
try:
# Navigate the nested structure
data = (body.get("data", {})
.get("presentation", {})
.get("staysSearch", {})
.get("results", {})
.get("searchResults", []))
for item in data:
listing = item.get("listing", {})
pricing = item.get("pricingQuote", {})
if not listing:
continue
self.search_results.append({
"id": listing.get("id"),
"name": listing.get("name"),
"city": listing.get("city"),
"state": listing.get("state"),
"country": listing.get("country"),
"lat": listing.get("lat"),
"lng": listing.get("lng"),
"room_type": listing.get("roomTypeCategory"),
"person_capacity": listing.get("personCapacity"),
"bedrooms": listing.get("bedroomLabel"),
"bathrooms": listing.get("bathroomLabel"),
"beds": listing.get("bedLabel"),
"avg_rating": listing.get("avgRating"),
"reviews_count": listing.get("reviewsCount"),
"is_superhost": listing.get("isSuperhost", False),
"amenities": listing.get("amenityIds", []),
"photos": [p.get("picture") for p in listing.get("contextualPictures", [])[:3]],
"price_formatted": (pricing.get("price", {})
.get("total", {})
.get("amountFormatted")),
"price_per_night": (pricing.get("structuredStayDisplayPrice", {})
.get("primaryLine", {})
.get("accessibilityLabel")),
})
except Exception as e:
self.errors.append(f"Search parse error: {e}")
def _parse_calendar(self, body: dict):
"""Extract availability calendar data."""
try:
months = (body.get("data", {})
.get("merlinProductDetailsPlatformRequest", {})
.get("pdpAvailabilityCalendar", {})
.get("calendarMonths", []))
if not months:
# Try alternate path
months = body.get("calendar_months", [])
for month_data in months:
for day in month_data.get("days", []):
date = day.get("calendarDate") or day.get("date")
if date:
self.calendar_data[date] = {
"available": day.get("available", False),
"price": day.get("price", {}).get("localPriceFormatted"),
"min_nights": day.get("minNights"),
"available_for_checkin": day.get("availableForCheckin", day.get("available", False)),
}
except Exception as e:
self.errors.append(f"Calendar parse error: {e}")
def _parse_reviews(self, body: dict):
"""Extract individual reviews."""
try:
reviews_data = (body.get("data", {})
.get("merlinProductDetailsPlatformRequest", {})
.get("pdpReviewsData", {})
.get("reviews", []))
for r in reviews_data:
self.reviews.append({
"id": r.get("id"),
"date": r.get("localizedDate"),
"comments": r.get("comments"),
"rating": r.get("rating"),
"reviewer_name": r.get("reviewer", {}).get("firstName"),
"reviewer_id": r.get("reviewer", {}).get("id"),
"language": r.get("language"),
"response": r.get("response"),
})
except Exception as e:
self.errors.append(f"Reviews parse error: {e}")
def _parse_listing_details(self, body: dict):
"""Extract full listing details from PDP sections."""
try:
sections = (body.get("data", {})
.get("presentation", {})
.get("stayProductDetailPage", {})
.get("sections", {})
.get("sections", []))
for section in sections:
section_type = section.get("sectionId", "")
if "OVERVIEW" in section_type:
data = section.get("section", {})
self.listing_details["overview"] = {
"title": data.get("name"),
"description": data.get("description"),
"highlights": [h.get("headline") for h in data.get("highlights", [])],
}
elif "AMENITIES" in section_type:
amenities = section.get("section", {}).get("seeAllAmenitiesGroups", [])
all_amenities = []
for group in amenities:
for amenity in group.get("amenities", []):
all_amenities.append({
"title": amenity.get("title"),
"available": amenity.get("available", True),
"icon": amenity.get("icon"),
})
self.listing_details["amenities"] = all_amenities
elif "HOST_PROFILE" in section_type:
host = section.get("section", {})
self.listing_details["host"] = {
"name": host.get("title"),
"member_since": host.get("subtitle"),
"response_rate": host.get("responseRate"),
"response_time": host.get("responseTime"),
"is_superhost": host.get("isSuperhost", False),
"highlights": [h.get("headline") for h in host.get("highlights", [])],
}
except Exception as e:
self.errors.append(f"Listing details parse error: {e}")
Scraping Search Results
async def scrape_search(
location: str,
checkin: str,
checkout: str,
guests: int = 2,
proxy_url: str = None,
max_pages: int = 3,
) -> list[dict]:
"""
Scrape Airbnb search results for a location and date range.
Args:
location: City or neighborhood name
checkin: Check-in date (YYYY-MM-DD)
checkout: Check-out date (YYYY-MM-DD)
guests: Number of guests
proxy_url: Residential proxy URL
max_pages: How many result pages to scrape (20 listings each)
"""
all_listings = []
p, browser, context = await create_browser(proxy_url)
collector = AirbnbResponseCollector()
try:
page = await open_page(context)
page.on("response", collector.handle_response)
# Build search URL
location_slug = location.replace(" ", "-")
url = (
f"https://www.airbnb.com/s/{location_slug}/homes"
f"?checkin={checkin}&checkout={checkout}"
f"&adults={guests}&source=structured_search_input_header"
)
await page.goto(url, wait_until="networkidle", timeout=60000)
await page.wait_for_timeout(3000)
# Collect first page results
all_listings.extend(collector.search_results.copy())
collector.search_results.clear()
# Navigate to additional pages
for page_num in range(2, max_pages + 1):
next_btn = await page.query_selector("a[aria-label='Next']")
if not next_btn:
break
await next_btn.click()
await page.wait_for_load_state("networkidle")
await page.wait_for_timeout(3000)
all_listings.extend(collector.search_results.copy())
collector.search_results.clear()
# Simulate reading between pages
await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.3)")
await asyncio.sleep(random.uniform(2, 4))
finally:
await browser.close()
await p.stop()
return all_listings
# Usage
listings = asyncio.run(scrape_search(
location="New York",
checkin="2026-08-01",
checkout="2026-08-07",
guests=2,
proxy_url="http://user:[email protected]:9000",
max_pages=3,
))
print(f"Found {len(listings)} listings")
Scraping Listing Details, Calendar, and Reviews
For full data on individual properties:
async def scrape_listing(
listing_id: str,
proxy_url: str = None,
) -> dict:
"""
Scrape complete data for a single Airbnb listing.
Returns details, availability calendar, and reviews.
"""
p, browser, context = await create_browser(proxy_url)
collector = AirbnbResponseCollector()
result = {"listing_id": listing_id}
try:
page = await open_page(context)
page.on("response", collector.handle_response)
url = f"https://www.airbnb.com/rooms/{listing_id}"
await page.goto(url, wait_until="domcontentloaded", timeout=60000)
await page.wait_for_timeout(2000)
# Scroll to trigger lazy-loaded sections (reviews, calendar)
await human_scroll(page, steps=8, target_pct=0.4)
await page.wait_for_timeout(2000)
await human_scroll(page, steps=8, target_pct=0.7)
await page.wait_for_timeout(2000)
await human_scroll(page, steps=8, target_pct=0.95)
await page.wait_for_timeout(3000)
# Wait for reviews section to load
try:
await page.wait_for_selector("[data-section-id='REVIEWS_DEFAULT']", timeout=8000)
except Exception:
pass
# Scroll back up to trigger any remaining sections
await page.evaluate("window.scrollTo(0, 0)")
await page.wait_for_timeout(1500)
result["details"] = collector.listing_details
result["calendar"] = collector.calendar_data
result["reviews"] = collector.reviews
# Extract basic data from HTML as fallback
html = await page.content()
result["html_fallback"] = extract_listing_html_fallback(html)
finally:
await browser.close()
await p.stop()
return result
async def human_scroll(page: Page, steps: int = 10, target_pct: float = 1.0):
"""Simulate human-like scrolling behavior."""
current_pct = 0
step_size = target_pct / steps
for _ in range(steps):
current_pct += step_size + random.uniform(-0.02, 0.02)
current_pct = max(0, min(1.0, current_pct))
await page.evaluate(f"window.scrollTo(0, document.body.scrollHeight * {current_pct})")
await asyncio.sleep(random.uniform(0.1, 0.5))
def extract_listing_html_fallback(html: str) -> dict:
"""Extract basic listing data from HTML when API interception misses data."""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
result = {}
# Title
title_el = soup.select_one("h1")
result["title"] = title_el.get_text(strip=True) if title_el else None
# JSON-LD structured data
for script in soup.select("script[type='application/ld+json']"):
try:
data = json.loads(script.string)
if data.get("@type") == "LodgingBusiness":
result["name"] = data.get("name")
result["description"] = data.get("description")
result["address"] = data.get("address", {})
result["images"] = data.get("image", [])
rating = data.get("aggregateRating", {})
result["rating"] = rating.get("ratingValue")
result["review_count"] = rating.get("reviewCount")
break
except Exception:
continue
return result
Proxy Configuration
A headless browser on a datacenter IP will get blocked by Cloudflare before the first API response arrives. Residential proxies are required.
ThorData's residential proxy network rotates per request automatically and supports city-level targeting. Airbnb rates vary depending on where the search appears to originate from — so if you need location-specific pricing, set the proxy geo accordingly.
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "proxy.thordata.com"
THORDATA_PORT = 9000
def get_proxy(country: str = "US", city: str = None) -> str:
"""Build a ThorData proxy URL with optional geo-targeting."""
user = f"{THORDATA_USER}_country-{country}"
if city:
user += f"_city-{city.replace(' ', '')}"
return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"
# For New York searches, use a New York IP to get accurate local pricing
proxy_ny = get_proxy(country="US", city="NewYork")
proxy_la = get_proxy(country="US", city="LosAngeles")
Data Storage
def init_db(path: str = "airbnb.db") -> sqlite3.Connection:
"""Initialize SQLite database for Airbnb data."""
conn = sqlite3.connect(path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS listings (
id TEXT PRIMARY KEY,
name TEXT,
city TEXT,
state TEXT,
country TEXT,
lat REAL,
lng REAL,
room_type TEXT,
person_capacity INTEGER,
bedrooms TEXT,
bathrooms TEXT,
beds TEXT,
avg_rating REAL,
reviews_count INTEGER,
is_superhost INTEGER DEFAULT 0,
price_per_night TEXT,
price_formatted TEXT,
photos TEXT,
amenities TEXT,
raw_data TEXT,
scraped_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS availability (
listing_id TEXT NOT NULL,
date TEXT NOT NULL,
available INTEGER DEFAULT 0,
price TEXT,
min_nights INTEGER,
scraped_at TEXT DEFAULT (datetime('now')),
PRIMARY KEY (listing_id, date)
);
CREATE TABLE IF NOT EXISTS reviews (
id TEXT PRIMARY KEY,
listing_id TEXT NOT NULL,
reviewer_name TEXT,
reviewer_id TEXT,
date TEXT,
rating INTEGER,
comments TEXT,
language TEXT,
response TEXT,
scraped_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_listings_city ON listings(city);
CREATE INDEX IF NOT EXISTS idx_availability_listing ON availability(listing_id);
CREATE INDEX IF NOT EXISTS idx_reviews_listing ON reviews(listing_id);
""")
conn.commit()
return conn
def save_listing(conn: sqlite3.Connection, listing: dict):
"""Save a listing to the database."""
conn.execute("""
INSERT OR REPLACE INTO listings
(id, name, city, state, country, lat, lng, room_type, person_capacity,
bedrooms, bathrooms, beds, avg_rating, reviews_count, is_superhost,
price_per_night, price_formatted, photos, amenities, raw_data)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
listing.get("id"),
listing.get("name"),
listing.get("city"),
listing.get("state"),
listing.get("country"),
listing.get("lat"),
listing.get("lng"),
listing.get("room_type"),
listing.get("person_capacity"),
listing.get("bedrooms"),
listing.get("bathrooms"),
listing.get("beds"),
listing.get("avg_rating"),
listing.get("reviews_count"),
1 if listing.get("is_superhost") else 0,
listing.get("price_per_night"),
listing.get("price_formatted"),
json.dumps(listing.get("photos", [])),
json.dumps(listing.get("amenities", [])),
json.dumps(listing),
))
conn.commit()
def save_availability(conn: sqlite3.Connection, listing_id: str, calendar: dict):
"""Save availability calendar data."""
rows = [
(listing_id, date, 1 if data["available"] else 0, data.get("price"), data.get("min_nights"))
for date, data in calendar.items()
]
conn.executemany(
"INSERT OR REPLACE INTO availability (listing_id, date, available, price, min_nights) VALUES (?, ?, ?, ?, ?)",
rows,
)
conn.commit()
def save_reviews(conn: sqlite3.Connection, listing_id: str, reviews: list[dict]):
"""Save listing reviews."""
for r in reviews:
if not r.get("id"):
continue
conn.execute("""
INSERT OR IGNORE INTO reviews
(id, listing_id, reviewer_name, reviewer_id, date, rating, comments, language, response)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
r.get("id"), listing_id, r.get("reviewer_name"), r.get("reviewer_id"),
r.get("date"), r.get("rating"), r.get("comments"),
r.get("language"), r.get("response"),
))
conn.commit()
Rate Limiting and Anti-Detection
import time
async def scrape_listings_pipeline(
listing_ids: list[str],
proxy_url: str,
db_path: str = "airbnb.db",
delay_min: float = 5.0,
delay_max: float = 15.0,
) -> dict:
"""
Scrape multiple listings with rate limiting and error handling.
Saves to SQLite as it goes — safe to interrupt and resume.
"""
conn = init_db(db_path)
stats = {"success": 0, "error": 0, "skipped": 0}
# Check which listings are already scraped
existing = set(
r[0] for r in conn.execute(
"SELECT id FROM listings WHERE scraped_at > datetime('now', '-24 hours')"
).fetchall()
)
for i, listing_id in enumerate(listing_ids):
if listing_id in existing:
stats["skipped"] += 1
continue
print(f"[{i+1}/{len(listing_ids)}] Scraping listing {listing_id}...")
try:
data = await scrape_listing(listing_id, proxy_url=proxy_url)
if data.get("html_fallback"):
# Save from HTML fallback data if API interception didn't capture
fallback = data["html_fallback"]
save_listing(conn, {
"id": listing_id,
"name": fallback.get("name") or fallback.get("title"),
**fallback,
})
if data.get("calendar"):
save_availability(conn, listing_id, data["calendar"])
if data.get("reviews"):
save_reviews(conn, listing_id, data["reviews"])
stats["success"] += 1
print(f" OK: {len(data.get('reviews', []))} reviews, {len(data.get('calendar', {}))} calendar days")
except Exception as e:
stats["error"] += 1
print(f" Error: {e}")
# Delay between listings
delay = random.uniform(delay_min, delay_max)
print(f" Waiting {delay:.1f}s...")
await asyncio.sleep(delay)
conn.close()
return stats
# Usage
async def main():
listing_ids = ["1234567", "2345678", "3456789"]
stats = await scrape_listings_pipeline(
listing_ids,
proxy_url="http://user:[email protected]:9000",
delay_min=8.0,
delay_max=20.0,
)
print(f"Done: {stats}")
asyncio.run(main())
Analyzing Airbnb Data
Once you have data in SQLite, you can run analytics:
def analyze_market(conn: sqlite3.Connection, city: str) -> dict:
"""Analyze Airbnb market data for a city."""
# Price distribution
prices = conn.execute("""
SELECT price_per_night
FROM listings
WHERE city = ? AND price_per_night IS NOT NULL
""", (city,)).fetchall()
# Parse prices (they come as strings like "$125/night")
import re
price_values = []
for (price_str,) in prices:
match = re.search(r'\$?([\d,]+)', str(price_str))
if match:
price_values.append(float(match.group(1).replace(",", "")))
# Availability rates
availability = conn.execute("""
SELECT
listing_id,
COUNT(*) as total_days,
SUM(available) as available_days,
ROUND(CAST(SUM(available) AS REAL) / COUNT(*) * 100, 1) as availability_pct
FROM availability
WHERE date BETWEEN date('now') AND date('now', '+60 days')
GROUP BY listing_id
""").fetchall()
# Top-rated superhosts
superhosts = conn.execute("""
SELECT name, avg_rating, reviews_count, price_per_night
FROM listings
WHERE city = ? AND is_superhost = 1
ORDER BY reviews_count DESC
LIMIT 10
""", (city,)).fetchall()
return {
"city": city,
"total_listings": len(prices),
"avg_price": sum(price_values) / len(price_values) if price_values else 0,
"median_price": sorted(price_values)[len(price_values) // 2] if price_values else 0,
"min_price": min(price_values) if price_values else 0,
"max_price": max(price_values) if price_values else 0,
"avg_availability_pct": (
sum(r[3] for r in availability) / len(availability)
if availability else 0
),
"top_superhosts": [
{"name": r[0], "rating": r[1], "reviews": r[2], "price": r[3]}
for r in superhosts
],
}
Legal Note
Airbnb's Terms of Service prohibit automated scraping. Courts in the US have given mixed rulings on whether ToS violations constitute legal liability for scraping publicly visible data. The hiQ v. LinkedIn line of cases suggests that scraping public data is generally not a Computer Fraud and Abuse Act violation, but the law is still unsettled. Check your jurisdiction, use data responsibly, and do not scrape at a scale that disrupts Airbnb's infrastructure.
Key Takeaways
- Playwright with stealth patches is more reliable than direct HTTP requests for Airbnb because it handles TLS and basic fingerprinting automatically.
- Intercepting API responses via
page.on("response", ...)captures clean JSON without HTML parsing or CSS selector maintenance. - Airbnb loads different data types from different endpoints — search results, calendar, and reviews each have distinct URL patterns to filter on.
- Scroll simulation is necessary to trigger lazy-loaded content like review sections.
- Residential proxies are not optional for production volume — datacenter IPs get blocked at the Cloudflare layer before any page content loads. ThorData's residential proxies with city-level targeting plug directly into Playwright's context configuration.
- Always store raw API responses alongside parsed data so structural changes in Airbnb's API don't force a re-crawl.
- Build resume capability into your pipeline — scraping a listing takes 15-30 seconds with proper pacing, so large datasets take time.