Scraping Airbnb Listings with Playwright and API Interception (2026)

2026-04-09 ["airbnb" "web scraping" "python" "playwright" "real-estate"]

Scraping Airbnb Listings with Playwright and API Interception (2026)

Airbnb is one of the more interesting scraping targets in 2026. The site has no public API, but its frontend loads rich JSON from internal endpoints — pricing calendars, review threads, host profiles, availability windows — all of it. The problem is that Airbnb has invested heavily in bot detection. Direct HTTP requests get flagged fast. The practical solution is to drive a real browser with Playwright and intercept the API responses as the page loads them naturally.

This guide walks through that exact approach: async Playwright, request interception, and how to structure what you capture into something useful.

What Data Is Available

Airbnb exposes more structured data than most people realize. Through browser interception you can collect:

Property listings — name, type, location coordinates, photo URLs, amenity tags, bedroom/bathroom counts, superhost status
Pricing — nightly rates by date, cleaning fees, service fees, total for a given stay
Availability calendar — which dates are blocked, which are available, minimum-stay rules
Reviews — individual review text, per-category star ratings (accuracy, cleanliness, communication, location, value), reviewer profiles
Host profiles — join date, review count, response rate, response time, languages spoken, other listings
Search results metadata — pagination cursors, total result count, map bounds

The calendar and review data in particular are difficult to scrape by parsing HTML — Airbnb renders them via JavaScript after page load. API interception sidesteps that entirely.

Anti-Bot Landscape

Before writing any code, understand what you're up against:

Cloudflare. Airbnb runs behind Cloudflare with bot score evaluation on nearly every route. This catches datacenter IPs, unusual request timing, and certain TLS patterns instantly.

Kasada / Shape Security. Airbnb has used Shape Security (now part of F5) for behavioral fingerprinting at the application layer. This runs inside the browser JavaScript and monitors mouse movements, keyboard cadence, scroll behavior, and event timing. Headless browsers without behavioral simulation get flagged.

TLS fingerprinting. The TLS handshake your HTTP client presents identifies your tool. Python's requests and httpx have recognizable TLS fingerprints that differ from real Chrome or Firefox. Playwright running actual Chromium sidesteps this because the browser handles the TLS layer.

Device fingerprinting. Canvas, WebGL, AudioContext, screen resolution, installed fonts — all of these are probed by Airbnb's JavaScript to build a device fingerprint. Default Playwright (without stealth patches) has known fingerprint values that get detected.

Rate limiting. Aggressive rate limiting kicks in well before you'd notice by eye. Rotating IPs and pacing requests are non-negotiable for any volume.

Setting Up

Install dependencies:

pip install playwright playwright-stealth httpx
playwright install chromium

Base Browser Setup

import asyncio
import json
import random
import sqlite3
from datetime import datetime
from playwright.async_api import async_playwright, Page, BrowserContext
from playwright_stealth import stealth_async

STEALTH_INIT_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined, configurable: true });
Object.defineProperty(navigator, 'plugins', {
    get: () => [
        { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer', description: 'Portable Document Format' },
        { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai', description: '' },
        { name: 'Native Client', filename: 'internal-nacl-plugin', description: '' },
    ]
});
window.chrome = { runtime: {}, loadTimes: function() {}, csi: function() {}, app: {} };
"""

async def create_browser(proxy_url: str = None, headless: bool = True):
    """Create a hardened Playwright browser context."""
    p = await async_playwright().start()

    launch_args = [
        "--no-sandbox",
        "--disable-blink-features=AutomationControlled",
        "--disable-dev-shm-usage",
        "--disable-infobars",
        "--window-size=1440,900",
        "--disable-extensions",
    ]

    browser = await p.chromium.launch(
        headless=headless,
        args=launch_args,
        proxy={"server": proxy_url} if proxy_url else None,
    )

    context = await browser.new_context(
        viewport={"width": 1440, "height": 900},
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/124.0.0.0 Safari/537.36"
        ),
        locale="en-US",
        timezone_id="America/New_York",
        color_scheme="light",
        device_scale_factor=1.0,
    )

    # Inject stealth scripts on every new page
    await context.add_init_script(STEALTH_INIT_SCRIPT)

    return p, browser, context

async def open_page(context: BrowserContext) -> Page:
    """Open a new page with stealth applied."""
    page = await context.new_page()
    await stealth_async(page)
    return page

Intercepting API Responses

Airbnb's internal GraphQL API responds to recognizable URL patterns. The key is to register response handlers before navigation starts:

class AirbnbResponseCollector:
    """Collects and categorizes Airbnb API responses during page navigation."""

    def __init__(self):
        self.search_results = []
        self.calendar_data = {}
        self.reviews = []
        self.listing_details = {}
        self.errors = []

    async def handle_response(self, response):
        url = response.url
        if response.status != 200:
            return

        # Skip non-JSON responses
        content_type = response.headers.get("content-type", "")
        if "json" not in content_type and "javascript" not in content_type:
            return

        try:
            body = await response.json()
        except Exception:
            return

        # Search results endpoint
        if "ExploreSearch" in url or "StaysSearch" in url:
            self._parse_search_results(body)

        # Calendar availability
        elif "CalendarMonths" in url or "PdpAvailabilityCalendar" in url:
            self._parse_calendar(body)

        # Reviews
        elif "PdpReviews" in url or "StaysPdpReviews" in url:
            self._parse_reviews(body)

        # Listing details
        elif "StaysPdpSections" in url or "PdpPlatformSections" in url:
            self._parse_listing_details(body)

    def _parse_search_results(self, body: dict):
        """Extract listing summaries from search response."""
        try:
            # Navigate the nested structure
            data = (body.get("data", {})
                       .get("presentation", {})
                       .get("staysSearch", {})
                       .get("results", {})
                       .get("searchResults", []))

            for item in data:
                listing = item.get("listing", {})
                pricing = item.get("pricingQuote", {})
                if not listing:
                    continue

                self.search_results.append({
                    "id": listing.get("id"),
                    "name": listing.get("name"),
                    "city": listing.get("city"),
                    "state": listing.get("state"),
                    "country": listing.get("country"),
                    "lat": listing.get("lat"),
                    "lng": listing.get("lng"),
                    "room_type": listing.get("roomTypeCategory"),
                    "person_capacity": listing.get("personCapacity"),
                    "bedrooms": listing.get("bedroomLabel"),
                    "bathrooms": listing.get("bathroomLabel"),
                    "beds": listing.get("bedLabel"),
                    "avg_rating": listing.get("avgRating"),
                    "reviews_count": listing.get("reviewsCount"),
                    "is_superhost": listing.get("isSuperhost", False),
                    "amenities": listing.get("amenityIds", []),
                    "photos": [p.get("picture") for p in listing.get("contextualPictures", [])[:3]],
                    "price_formatted": (pricing.get("price", {})
                                              .get("total", {})
                                              .get("amountFormatted")),
                    "price_per_night": (pricing.get("structuredStayDisplayPrice", {})
                                               .get("primaryLine", {})
                                               .get("accessibilityLabel")),
                })
        except Exception as e:
            self.errors.append(f"Search parse error: {e}")

    def _parse_calendar(self, body: dict):
        """Extract availability calendar data."""
        try:
            months = (body.get("data", {})
                         .get("merlinProductDetailsPlatformRequest", {})
                         .get("pdpAvailabilityCalendar", {})
                         .get("calendarMonths", []))

            if not months:
                # Try alternate path
                months = body.get("calendar_months", [])

            for month_data in months:
                for day in month_data.get("days", []):
                    date = day.get("calendarDate") or day.get("date")
                    if date:
                        self.calendar_data[date] = {
                            "available": day.get("available", False),
                            "price": day.get("price", {}).get("localPriceFormatted"),
                            "min_nights": day.get("minNights"),
                            "available_for_checkin": day.get("availableForCheckin", day.get("available", False)),
                        }
        except Exception as e:
            self.errors.append(f"Calendar parse error: {e}")

    def _parse_reviews(self, body: dict):
        """Extract individual reviews."""
        try:
            reviews_data = (body.get("data", {})
                               .get("merlinProductDetailsPlatformRequest", {})
                               .get("pdpReviewsData", {})
                               .get("reviews", []))

            for r in reviews_data:
                self.reviews.append({
                    "id": r.get("id"),
                    "date": r.get("localizedDate"),
                    "comments": r.get("comments"),
                    "rating": r.get("rating"),
                    "reviewer_name": r.get("reviewer", {}).get("firstName"),
                    "reviewer_id": r.get("reviewer", {}).get("id"),
                    "language": r.get("language"),
                    "response": r.get("response"),
                })
        except Exception as e:
            self.errors.append(f"Reviews parse error: {e}")

    def _parse_listing_details(self, body: dict):
        """Extract full listing details from PDP sections."""
        try:
            sections = (body.get("data", {})
                           .get("presentation", {})
                           .get("stayProductDetailPage", {})
                           .get("sections", {})
                           .get("sections", []))

            for section in sections:
                section_type = section.get("sectionId", "")

                if "OVERVIEW" in section_type:
                    data = section.get("section", {})
                    self.listing_details["overview"] = {
                        "title": data.get("name"),
                        "description": data.get("description"),
                        "highlights": [h.get("headline") for h in data.get("highlights", [])],
                    }

                elif "AMENITIES" in section_type:
                    amenities = section.get("section", {}).get("seeAllAmenitiesGroups", [])
                    all_amenities = []
                    for group in amenities:
                        for amenity in group.get("amenities", []):
                            all_amenities.append({
                                "title": amenity.get("title"),
                                "available": amenity.get("available", True),
                                "icon": amenity.get("icon"),
                            })
                    self.listing_details["amenities"] = all_amenities

                elif "HOST_PROFILE" in section_type:
                    host = section.get("section", {})
                    self.listing_details["host"] = {
                        "name": host.get("title"),
                        "member_since": host.get("subtitle"),
                        "response_rate": host.get("responseRate"),
                        "response_time": host.get("responseTime"),
                        "is_superhost": host.get("isSuperhost", False),
                        "highlights": [h.get("headline") for h in host.get("highlights", [])],
                    }

        except Exception as e:
            self.errors.append(f"Listing details parse error: {e}")

Scraping Search Results

async def scrape_search(
    location: str,
    checkin: str,
    checkout: str,
    guests: int = 2,
    proxy_url: str = None,
    max_pages: int = 3,
) -> list[dict]:
    """
    Scrape Airbnb search results for a location and date range.

    Args:
        location: City or neighborhood name
        checkin: Check-in date (YYYY-MM-DD)
        checkout: Check-out date (YYYY-MM-DD)
        guests: Number of guests
        proxy_url: Residential proxy URL
        max_pages: How many result pages to scrape (20 listings each)
    """
    all_listings = []

    p, browser, context = await create_browser(proxy_url)
    collector = AirbnbResponseCollector()

    try:
        page = await open_page(context)
        page.on("response", collector.handle_response)

        # Build search URL
        location_slug = location.replace(" ", "-")
        url = (
            f"https://www.airbnb.com/s/{location_slug}/homes"
            f"?checkin={checkin}&checkout={checkout}"
            f"&adults={guests}&source=structured_search_input_header"
        )

        await page.goto(url, wait_until="networkidle", timeout=60000)
        await page.wait_for_timeout(3000)

        # Collect first page results
        all_listings.extend(collector.search_results.copy())
        collector.search_results.clear()

        # Navigate to additional pages
        for page_num in range(2, max_pages + 1):
            next_btn = await page.query_selector("a[aria-label='Next']")
            if not next_btn:
                break

            await next_btn.click()
            await page.wait_for_load_state("networkidle")
            await page.wait_for_timeout(3000)

            all_listings.extend(collector.search_results.copy())
            collector.search_results.clear()

            # Simulate reading between pages
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.3)")
            await asyncio.sleep(random.uniform(2, 4))

    finally:
        await browser.close()
        await p.stop()

    return all_listings

# Usage
listings = asyncio.run(scrape_search(
    location="New York",
    checkin="2026-08-01",
    checkout="2026-08-07",
    guests=2,
    proxy_url="http://user:[email protected]:9000",
    max_pages=3,
))
print(f"Found {len(listings)} listings")

Scraping Listing Details, Calendar, and Reviews

For full data on individual properties:

async def scrape_listing(
    listing_id: str,
    proxy_url: str = None,
) -> dict:
    """
    Scrape complete data for a single Airbnb listing.
    Returns details, availability calendar, and reviews.
    """
    p, browser, context = await create_browser(proxy_url)
    collector = AirbnbResponseCollector()
    result = {"listing_id": listing_id}

    try:
        page = await open_page(context)
        page.on("response", collector.handle_response)

        url = f"https://www.airbnb.com/rooms/{listing_id}"
        await page.goto(url, wait_until="domcontentloaded", timeout=60000)
        await page.wait_for_timeout(2000)

        # Scroll to trigger lazy-loaded sections (reviews, calendar)
        await human_scroll(page, steps=8, target_pct=0.4)
        await page.wait_for_timeout(2000)

        await human_scroll(page, steps=8, target_pct=0.7)
        await page.wait_for_timeout(2000)

        await human_scroll(page, steps=8, target_pct=0.95)
        await page.wait_for_timeout(3000)

        # Wait for reviews section to load
        try:
            await page.wait_for_selector("[data-section-id='REVIEWS_DEFAULT']", timeout=8000)
        except Exception:
            pass

        # Scroll back up to trigger any remaining sections
        await page.evaluate("window.scrollTo(0, 0)")
        await page.wait_for_timeout(1500)

        result["details"] = collector.listing_details
        result["calendar"] = collector.calendar_data
        result["reviews"] = collector.reviews

        # Extract basic data from HTML as fallback
        html = await page.content()
        result["html_fallback"] = extract_listing_html_fallback(html)

    finally:
        await browser.close()
        await p.stop()

    return result

async def human_scroll(page: Page, steps: int = 10, target_pct: float = 1.0):
    """Simulate human-like scrolling behavior."""
    current_pct = 0
    step_size = target_pct / steps

    for _ in range(steps):
        current_pct += step_size + random.uniform(-0.02, 0.02)
        current_pct = max(0, min(1.0, current_pct))
        await page.evaluate(f"window.scrollTo(0, document.body.scrollHeight * {current_pct})")
        await asyncio.sleep(random.uniform(0.1, 0.5))

def extract_listing_html_fallback(html: str) -> dict:
    """Extract basic listing data from HTML when API interception misses data."""
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html, "html.parser")

    result = {}

    # Title
    title_el = soup.select_one("h1")
    result["title"] = title_el.get_text(strip=True) if title_el else None

    # JSON-LD structured data
    for script in soup.select("script[type='application/ld+json']"):
        try:
            data = json.loads(script.string)
            if data.get("@type") == "LodgingBusiness":
                result["name"] = data.get("name")
                result["description"] = data.get("description")
                result["address"] = data.get("address", {})
                result["images"] = data.get("image", [])
                rating = data.get("aggregateRating", {})
                result["rating"] = rating.get("ratingValue")
                result["review_count"] = rating.get("reviewCount")
                break
        except Exception:
            continue

    return result

Proxy Configuration

A headless browser on a datacenter IP will get blocked by Cloudflare before the first API response arrives. Residential proxies are required.

ThorData's residential proxy network rotates per request automatically and supports city-level targeting. Airbnb rates vary depending on where the search appears to originate from — so if you need location-specific pricing, set the proxy geo accordingly.

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "proxy.thordata.com"
THORDATA_PORT = 9000

def get_proxy(country: str = "US", city: str = None) -> str:
    """Build a ThorData proxy URL with optional geo-targeting."""
    user = f"{THORDATA_USER}_country-{country}"
    if city:
        user += f"_city-{city.replace(' ', '')}"
    return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"

# For New York searches, use a New York IP to get accurate local pricing
proxy_ny = get_proxy(country="US", city="NewYork")
proxy_la = get_proxy(country="US", city="LosAngeles")

Data Storage

def init_db(path: str = "airbnb.db") -> sqlite3.Connection:
    """Initialize SQLite database for Airbnb data."""
    conn = sqlite3.connect(path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS listings (
            id TEXT PRIMARY KEY,
            name TEXT,
            city TEXT,
            state TEXT,
            country TEXT,
            lat REAL,
            lng REAL,
            room_type TEXT,
            person_capacity INTEGER,
            bedrooms TEXT,
            bathrooms TEXT,
            beds TEXT,
            avg_rating REAL,
            reviews_count INTEGER,
            is_superhost INTEGER DEFAULT 0,
            price_per_night TEXT,
            price_formatted TEXT,
            photos TEXT,
            amenities TEXT,
            raw_data TEXT,
            scraped_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS availability (
            listing_id TEXT NOT NULL,
            date TEXT NOT NULL,
            available INTEGER DEFAULT 0,
            price TEXT,
            min_nights INTEGER,
            scraped_at TEXT DEFAULT (datetime('now')),
            PRIMARY KEY (listing_id, date)
        );

        CREATE TABLE IF NOT EXISTS reviews (
            id TEXT PRIMARY KEY,
            listing_id TEXT NOT NULL,
            reviewer_name TEXT,
            reviewer_id TEXT,
            date TEXT,
            rating INTEGER,
            comments TEXT,
            language TEXT,
            response TEXT,
            scraped_at TEXT DEFAULT (datetime('now'))
        );

        CREATE INDEX IF NOT EXISTS idx_listings_city ON listings(city);
        CREATE INDEX IF NOT EXISTS idx_availability_listing ON availability(listing_id);
        CREATE INDEX IF NOT EXISTS idx_reviews_listing ON reviews(listing_id);
    """)
    conn.commit()
    return conn

def save_listing(conn: sqlite3.Connection, listing: dict):
    """Save a listing to the database."""
    conn.execute("""
        INSERT OR REPLACE INTO listings
        (id, name, city, state, country, lat, lng, room_type, person_capacity,
         bedrooms, bathrooms, beds, avg_rating, reviews_count, is_superhost,
         price_per_night, price_formatted, photos, amenities, raw_data)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        listing.get("id"),
        listing.get("name"),
        listing.get("city"),
        listing.get("state"),
        listing.get("country"),
        listing.get("lat"),
        listing.get("lng"),
        listing.get("room_type"),
        listing.get("person_capacity"),
        listing.get("bedrooms"),
        listing.get("bathrooms"),
        listing.get("beds"),
        listing.get("avg_rating"),
        listing.get("reviews_count"),
        1 if listing.get("is_superhost") else 0,
        listing.get("price_per_night"),
        listing.get("price_formatted"),
        json.dumps(listing.get("photos", [])),
        json.dumps(listing.get("amenities", [])),
        json.dumps(listing),
    ))
    conn.commit()

def save_availability(conn: sqlite3.Connection, listing_id: str, calendar: dict):
    """Save availability calendar data."""
    rows = [
        (listing_id, date, 1 if data["available"] else 0, data.get("price"), data.get("min_nights"))
        for date, data in calendar.items()
    ]
    conn.executemany(
        "INSERT OR REPLACE INTO availability (listing_id, date, available, price, min_nights) VALUES (?, ?, ?, ?, ?)",
        rows,
    )
    conn.commit()

def save_reviews(conn: sqlite3.Connection, listing_id: str, reviews: list[dict]):
    """Save listing reviews."""
    for r in reviews:
        if not r.get("id"):
            continue
        conn.execute("""
            INSERT OR IGNORE INTO reviews
            (id, listing_id, reviewer_name, reviewer_id, date, rating, comments, language, response)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            r.get("id"), listing_id, r.get("reviewer_name"), r.get("reviewer_id"),
            r.get("date"), r.get("rating"), r.get("comments"),
            r.get("language"), r.get("response"),
        ))
    conn.commit()

Rate Limiting and Anti-Detection

import time

async def scrape_listings_pipeline(
    listing_ids: list[str],
    proxy_url: str,
    db_path: str = "airbnb.db",
    delay_min: float = 5.0,
    delay_max: float = 15.0,
) -> dict:
    """
    Scrape multiple listings with rate limiting and error handling.
    Saves to SQLite as it goes — safe to interrupt and resume.
    """
    conn = init_db(db_path)
    stats = {"success": 0, "error": 0, "skipped": 0}

    # Check which listings are already scraped
    existing = set(
        r[0] for r in conn.execute(
            "SELECT id FROM listings WHERE scraped_at > datetime('now', '-24 hours')"
        ).fetchall()
    )

    for i, listing_id in enumerate(listing_ids):
        if listing_id in existing:
            stats["skipped"] += 1
            continue

        print(f"[{i+1}/{len(listing_ids)}] Scraping listing {listing_id}...")

        try:
            data = await scrape_listing(listing_id, proxy_url=proxy_url)

            if data.get("html_fallback"):
                # Save from HTML fallback data if API interception didn't capture
                fallback = data["html_fallback"]
                save_listing(conn, {
                    "id": listing_id,
                    "name": fallback.get("name") or fallback.get("title"),
                    **fallback,
                })

            if data.get("calendar"):
                save_availability(conn, listing_id, data["calendar"])

            if data.get("reviews"):
                save_reviews(conn, listing_id, data["reviews"])

            stats["success"] += 1
            print(f"  OK: {len(data.get('reviews', []))} reviews, {len(data.get('calendar', {}))} calendar days")

        except Exception as e:
            stats["error"] += 1
            print(f"  Error: {e}")

        # Delay between listings
        delay = random.uniform(delay_min, delay_max)
        print(f"  Waiting {delay:.1f}s...")
        await asyncio.sleep(delay)

    conn.close()
    return stats

# Usage
async def main():
    listing_ids = ["1234567", "2345678", "3456789"]
    stats = await scrape_listings_pipeline(
        listing_ids,
        proxy_url="http://user:[email protected]:9000",
        delay_min=8.0,
        delay_max=20.0,
    )
    print(f"Done: {stats}")

asyncio.run(main())

Analyzing Airbnb Data

Once you have data in SQLite, you can run analytics:

def analyze_market(conn: sqlite3.Connection, city: str) -> dict:
    """Analyze Airbnb market data for a city."""
    # Price distribution
    prices = conn.execute("""
        SELECT price_per_night
        FROM listings
        WHERE city = ? AND price_per_night IS NOT NULL
    """, (city,)).fetchall()

    # Parse prices (they come as strings like "$125/night")
    import re
    price_values = []
    for (price_str,) in prices:
        match = re.search(r'\$?([\d,]+)', str(price_str))
        if match:
            price_values.append(float(match.group(1).replace(",", "")))

    # Availability rates
    availability = conn.execute("""
        SELECT
            listing_id,
            COUNT(*) as total_days,
            SUM(available) as available_days,
            ROUND(CAST(SUM(available) AS REAL) / COUNT(*) * 100, 1) as availability_pct
        FROM availability
        WHERE date BETWEEN date('now') AND date('now', '+60 days')
        GROUP BY listing_id
    """).fetchall()

    # Top-rated superhosts
    superhosts = conn.execute("""
        SELECT name, avg_rating, reviews_count, price_per_night
        FROM listings
        WHERE city = ? AND is_superhost = 1
        ORDER BY reviews_count DESC
        LIMIT 10
    """, (city,)).fetchall()

    return {
        "city": city,
        "total_listings": len(prices),
        "avg_price": sum(price_values) / len(price_values) if price_values else 0,
        "median_price": sorted(price_values)[len(price_values) // 2] if price_values else 0,
        "min_price": min(price_values) if price_values else 0,
        "max_price": max(price_values) if price_values else 0,
        "avg_availability_pct": (
            sum(r[3] for r in availability) / len(availability)
            if availability else 0
        ),
        "top_superhosts": [
            {"name": r[0], "rating": r[1], "reviews": r[2], "price": r[3]}
            for r in superhosts
        ],
    }

Legal Note

Airbnb's Terms of Service prohibit automated scraping. Courts in the US have given mixed rulings on whether ToS violations constitute legal liability for scraping publicly visible data. The hiQ v. LinkedIn line of cases suggests that scraping public data is generally not a Computer Fraud and Abuse Act violation, but the law is still unsettled. Check your jurisdiction, use data responsibly, and do not scrape at a scale that disrupts Airbnb's infrastructure.

Key Takeaways

Playwright with stealth patches is more reliable than direct HTTP requests for Airbnb because it handles TLS and basic fingerprinting automatically.
Intercepting API responses via page.on("response", ...) captures clean JSON without HTML parsing or CSS selector maintenance.
Airbnb loads different data types from different endpoints — search results, calendar, and reviews each have distinct URL patterns to filter on.
Scroll simulation is necessary to trigger lazy-loaded content like review sections.
Residential proxies are not optional for production volume — datacenter IPs get blocked at the Cloudflare layer before any page content loads. ThorData's residential proxies with city-level targeting plug directly into Playwright's context configuration.
Always store raw API responses alongside parsed data so structural changes in Airbnb's API don't force a re-crawl.
Build resume capability into your pipeline — scraping a listing takes 15-30 seconds with proper pacing, so large datasets take time.