Scraping Booking.com Hotel Prices and Availability in 2026 with Playwright

2026-04-09 [scraping booking hotels playwright python travel]

Scraping Booking.com Hotel Prices and Availability in 2026

Hotel price data is one of the most valuable scraping targets. Revenue managers use it to undercut competitors. Travel startups use it to build comparison engines. Researchers use it to study dynamic pricing algorithms.

Booking.com is the largest source — over 28 million listings across 200+ countries. The data is public on the page. The challenge is that almost nothing is in the initial HTML. Prices load dynamically based on your check-in/check-out dates, guest count, and — crucially — your location. This means requests won't work. You need a real browser.

Why Playwright

Booking.com runs heavy JavaScript that renders prices client-side. It also uses sophisticated bot detection that flags headless browsers. Playwright handles both:

Full browser rendering (Chromium, Firefox, or WebKit)
Built-in stealth capabilities when configured correctly
Network interception to capture API responses directly
Geolocation spoofing for location-dependent pricing

pip install playwright selectolax
playwright install chromium

Basic Hotel Search Scraper

import asyncio
import json
from playwright.async_api import async_playwright
from datetime import date, timedelta
from urllib.parse import urlencode

async def search_hotels(
    destination: str,
    checkin: str,
    checkout: str,
    adults: int = 2,
    rooms: int = 1,
) -> list[dict]:
    """
    Search Booking.com for hotels and return listing data.
    Dates in YYYY-MM-DD format.
    """
    params = {
        "ss": destination,
        "checkin": checkin,
        "checkout": checkout,
        "group_adults": adults,
        "no_rooms": rooms,
        "selected_currency": "USD",
    }
    url = f"https://www.booking.com/searchresults.html?{urlencode(params)}"

    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--disable-dev-shm-usage",
            ]
        )

        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
            locale="en-US",
            geolocation={"latitude": 40.7128, "longitude": -74.0060},
            permissions=["geolocation"],
        )

        page = await context.new_page()

        # Block images and fonts to speed up loading
        await page.route("**/*.{png,jpg,jpeg,gif,svg,woff,woff2}",
                         lambda route: route.abort())

        await page.goto(url, wait_until="domcontentloaded")

        # Close cookie banner if present
        try:
            await page.click('[id="onetrust-accept-btn-handler"]', timeout=3000)
        except:
            pass

        # Wait for price elements to render
        await page.wait_for_selector('[data-testid="price-and-discounted-price"]',
                                      timeout=15000)

        # Extract hotel data from the page
        hotels = await page.evaluate("""
            () => {
                const cards = document.querySelectorAll('[data-testid="property-card"]');
                return Array.from(cards).map(card => {
                    const nameEl = card.querySelector('[data-testid="title"]');
                    const priceEl = card.querySelector('[data-testid="price-and-discounted-price"]');
                    const ratingEl = card.querySelector('[data-testid="review-score"]');
                    const locationEl = card.querySelector('[data-testid="distance"]');
                    const linkEl = card.querySelector('a[data-testid="title-link"]');

                    const priceText = priceEl ? priceEl.innerText.replace(/[^0-9]/g, '') : null;
                    const ratingText = ratingEl ? ratingEl.innerText : '';
                    const ratingMatch = ratingText.match(/([\d.]+)/);

                    return {
                        name: nameEl ? nameEl.innerText.trim() : null,
                        price_usd: priceText ? parseInt(priceText) : null,
                        rating: ratingMatch ? parseFloat(ratingMatch[1]) : null,
                        review_text: ratingText.trim(),
                        distance: locationEl ? locationEl.innerText.trim() : null,
                        url: linkEl ? linkEl.href.split('?')[0] : null,
                    };
                }).filter(h => h.name && h.price_usd);
            }
        """)

        await browser.close()

    return hotels

# Search for hotels in Barcelona
checkin = (date.today() + timedelta(days=30)).isoformat()
checkout = (date.today() + timedelta(days=33)).isoformat()

hotels = asyncio.run(search_hotels("Barcelona, Spain", checkin, checkout))
for h in hotels[:10]:
    rating = f"{h['rating']}/10" if h['rating'] else "N/A"
    print(f"${h['price_usd']:>4} | {rating:>7} | {h['name']}")

Scraping Individual Hotel Pages

The search results give you a summary. For room-level pricing and availability, you need the hotel detail page:

async def scrape_hotel_details(hotel_url: str, checkin: str, checkout: str) -> dict:
    """Scrape room types, prices, and amenities from a hotel page."""

    url = f"{hotel_url}?checkin={checkin}&checkout={checkout}&selected_currency=USD"

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True, args=[
            "--disable-blink-features=AutomationControlled",
        ])
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
        )
        page = await context.new_page()

        # Intercept the availability API call for cleaner data
        api_data = {}
        async def capture_api(response):
            if "roomrates" in response.url or "availability" in response.url:
                try:
                    api_data["rooms"] = await response.json()
                except:
                    pass

        page.on("response", capture_api)
        await page.goto(url, wait_until="networkidle")

        # Extract from DOM as fallback
        rooms = await page.evaluate("""
            () => {
                const rows = document.querySelectorAll('table.hprt-table tr');
                const results = [];

                for (const row of rows) {
                    const typeEl = row.querySelector('.hprt-roomtype-icon-link');
                    const priceEl = row.querySelector('.prco-valign-middle-helper');
                    const capacityEl = row.querySelector('.hprt-occupancy-occupancy-info');

                    if (!typeEl || !priceEl) continue;

                    const priceText = priceEl.innerText.replace(/[^0-9]/g, '');

                    results.push({
                        room_type: typeEl.innerText.trim(),
                        price_per_night: priceText ? parseInt(priceText) : null,
                        max_guests: capacityEl ? capacityEl.innerText.trim() : null,
                    });
                }
                return results;
            }
        """)

        # Get review summary
        review_score = await page.evaluate("""
            () => {
                const el = document.querySelector('[data-testid="review-score-component"]');
                return el ? el.innerText.trim() : null;
            }
        """)

        await browser.close()

    return {
        "url": hotel_url,
        "rooms": rooms,
        "review_score": review_score,
        "api_data": api_data.get("rooms"),
    }

Handling Anti-Bot Detection

Booking.com uses a combination of Akamai Bot Manager and their own detection. Here's what specifically catches scrapers:

Browser fingerprinting: They check navigator.webdriver, plugin count, WebGL renderer, and canvas fingerprint. Playwright's default Chromium flags webdriver=true.

Rate limiting: More than ~30 searches per minute from one IP triggers a CAPTCHA. More than ~100 triggers a temporary block.

Session behavior: They track whether you actually behave like a user — do you scroll? Do you click on results? Do you have cookies from a previous visit?

The mitigation stack that works:

async def create_stealth_context(playwright):
    """Create a browser context that passes Booking.com's bot detection."""
    browser = await playwright.chromium.launch(
        headless=True,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-features=IsolateOrigins,site-per-process",
            "--disable-dev-shm-usage",
        ]
    )

    context = await browser.new_context(
        viewport={"width": 1920, "height": 1080},
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
        locale="en-US",
        timezone_id="America/New_York",
    )

    # Patch webdriver flag and add realistic browser properties
    await context.add_init_script("""
        Object.defineProperty(navigator, 'webdriver', { get: () => false });
        Object.defineProperty(navigator, 'plugins', {
            get: () => [1, 2, 3, 4, 5],
        });
        window.chrome = { runtime: {} };
    """)

    return browser, context

For IP rotation, the math is simple: if you need to scrape 500 hotels and the rate limit is ~30/minute/IP, you either wait 17 minutes on one IP or use 17 IPs and finish in a minute. Residential proxies are necessary here — Booking.com blocks all major datacenter ranges. ThorData's rotating residential proxies support city-level targeting, which matters for Booking.com since the prices shown vary by the requester's apparent location.

# Using proxy with Playwright
context = await browser.new_context(
    proxy={"server": "http://proxy.thordata.com:9000",
           "username": "user", "password": "pass"},
    # ... other options
)

Price Monitoring Over Time

The real value is tracking prices across days. Hotels use dynamic pricing — rates change based on demand, day of week, and how far out the booking is.

import sqlite3
from datetime import datetime

def store_price_snapshot(hotels: list[dict], destination: str,
                         checkin: str, db_path: str = "hotel_prices.db"):
    """Store a price snapshot for historical tracking."""
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS prices (
            hotel_name TEXT, destination TEXT, checkin_date TEXT,
            price_usd INTEGER, rating REAL,
            scraped_at TEXT, url TEXT
        )
    """)

    now = datetime.now().isoformat()
    for h in hotels:
        conn.execute(
            "INSERT INTO prices VALUES (?, ?, ?, ?, ?, ?, ?)",
            (h["name"], destination, checkin, h["price_usd"],
             h["rating"], now, h.get("url"))
        )

    conn.commit()
    conn.close()

def get_price_trends(hotel_name: str, db_path: str = "hotel_prices.db") -> list:
    """Get price history for a specific hotel."""
    conn = sqlite3.connect(db_path)
    rows = conn.execute("""
        SELECT checkin_date, price_usd, scraped_at
        FROM prices WHERE hotel_name = ?
        ORDER BY scraped_at
    """, (hotel_name,)).fetchall()
    conn.close()
    return rows

Run the search scraper daily via cron. After a week you'll have enough data to see pricing patterns — when prices spike, how far in advance to book, and which hotels are consistently cheaper than their competitors.

Scraping Reviews

Hotel reviews are on the detail page but paginated. Booking.com loads them via XHR, which you can intercept:

async def scrape_reviews(hotel_url: str, max_pages: int = 5) -> list[dict]:
    """Scrape hotel reviews by intercepting the review API calls."""
    reviews = []

    async with async_playwright() as p:
        browser, context = await create_stealth_context(p)
        page = await context.new_page()

        async def capture_reviews(response):
            if "review_list" in response.url or "reviews" in response.url:
                try:
                    data = await response.json()
                    # Structure varies — handle common formats
                    if isinstance(data, dict) and "result" in data:
                        for r in data["result"]:
                            reviews.append({
                                "score": r.get("average_score"),
                                "title": r.get("title"),
                                "pros": r.get("pros"),
                                "cons": r.get("cons"),
                                "date": r.get("date"),
                                "traveler_type": r.get("travel_purpose"),
                            })
                except:
                    pass

        page.on("response", capture_reviews)
        await page.goto(f"{hotel_url}#tab-reviews", wait_until="networkidle")

        for _ in range(max_pages - 1):
            try:
                next_btn = await page.query_selector('[data-testid="reviews-pagination-next"]')
                if next_btn:
                    await next_btn.click()
                    await page.wait_for_timeout(2000)
            except:
                break

        await browser.close()

    return reviews

Legal and Ethical Considerations

Booking.com's terms of service prohibit scraping. That said, the data is publicly visible — no login required. Courts in the US have generally ruled that scraping publicly available data doesn't violate the CFAA (see hiQ v. LinkedIn). The EU has similar precedent under GDPR for publicly accessible non-personal data.

The pragmatic approach: don't scrape so aggressively that you impact their service. Use delays. Don't republish their content verbatim. Use the data for analysis, price comparison, or research — not to clone their listings.

Bulk Hotel Data Collection

For price monitoring across hundreds of properties, use an async approach with controlled concurrency:

import asyncio
import sqlite3
import random
from datetime import date, timedelta
from playwright.async_api import async_playwright

async def collect_city_hotels(
    city: str,
    checkin_offset_days: int = 30,
    num_nights: int = 2,
    max_hotels: int = 50,
    proxy_config: dict = None,
) -> list[dict]:
    """
    Collect all hotels for a city with pricing.
    Returns list of hotel dicts with prices and ratings.
    """
    checkin = (date.today() + timedelta(days=checkin_offset_days)).isoformat()
    checkout = (date.today() + timedelta(days=checkin_offset_days + num_nights)).isoformat()

    hotels = await search_hotels(city, checkin, checkout)
    return hotels[:max_hotels]


async def run_city_comparison(
    cities: list,
    checkin_offset: int = 30,
    db_path: str = "hotel_prices.db",
):
    """Compare hotel prices across multiple cities."""
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS prices (
            hotel_name TEXT, city TEXT, checkin_date TEXT,
            price_usd INTEGER, rating REAL, url TEXT,
            scraped_at TEXT
        )
    """)

    from datetime import datetime
    now = datetime.now().isoformat()
    checkin = (date.today() + timedelta(days=checkin_offset)).isoformat()

    for city in cities:
        print(f"Collecting: {city}")
        try:
            hotels = await collect_city_hotels(city, checkin_offset_days=checkin_offset)
            for h in hotels:
                conn.execute(
                    "INSERT INTO prices VALUES (?,?,?,?,?,?,?)",
                    (h["name"], city, checkin, h["price_usd"],
                     h["rating"], h.get("url"), now)
                )
            conn.commit()
            print(f"  Saved {len(hotels)} hotels")
        except Exception as e:
            print(f"  Error: {e}")

        await asyncio.sleep(random.uniform(10, 20))

    conn.close()
    return conn


# Compare weekend prices in European capitals
CITIES = [
    "Paris, France", "London, UK", "Amsterdam, Netherlands",
    "Barcelona, Spain", "Rome, Italy", "Berlin, Germany",
    "Prague, Czech Republic", "Lisbon, Portugal"
]

asyncio.run(run_city_comparison(CITIES, checkin_offset=45))

Intercepting the Availability API

Booking.com makes internal API calls when loading hotel availability. These can be intercepted for cleaner data:

import asyncio
import json
from playwright.async_api import async_playwright

async def intercept_availability_api(
    hotel_url: str,
    checkin: str,
    checkout: str,
) -> dict:
    """
    Intercept Booking.com's internal availability API calls.
    Returns cleaner structured data than DOM parsing.
    """
    api_responses = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled"]
        )
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
        )
        page = await context.new_page()

        async def capture_response(response):
            url = response.url
            # Capture availability, room rate, and property data endpoints
            if any(keyword in url for keyword in [
                "availabilityCalendar", "roomRates", "propertyInfo",
                "getReviewsDetails", "accommodations"
            ]):
                try:
                    body = await response.json()
                    api_responses.append({
                        "endpoint": url.split("?")[0].split("/")[-1],
                        "url": url,
                        "data": body,
                    })
                except Exception:
                    pass

        page.on("response", capture_response)

        full_url = f"{hotel_url}?checkin={checkin}&checkout={checkout}&selected_currency=USD"
        await page.goto(full_url, wait_until="networkidle", timeout=40000)
        await asyncio.sleep(5)  # Wait for all async API calls to complete

        await browser.close()

    # Parse the captured responses
    result = {"url": hotel_url, "checkin": checkin, "checkout": checkout}

    for resp in api_responses:
        endpoint = resp["endpoint"]
        data = resp["data"]

        if "roomRates" in endpoint or "accommodations" in endpoint:
            # Extract room pricing
            rooms = []
            if isinstance(data, dict):
                # Handle various response formats
                room_list = (
                    data.get("result", {}).get("room_types", []) or
                    data.get("rooms", []) or
                    data.get("data", {}).get("roomTypes", [])
                )
                for room in room_list:
                    rooms.append({
                        "name": room.get("name") or room.get("room_type_name"),
                        "price_usd": room.get("price", {}).get("total") or room.get("rate"),
                        "max_occupancy": room.get("max_occupancy") or room.get("maxOccupancy"),
                    })
            result["rooms"] = rooms

        elif "propertyInfo" in endpoint:
            result["property_details"] = data

        elif "getReviewsDetails" in endpoint or "reviews" in endpoint:
            result["review_data"] = data

    return result

Price Calendar Scraping

Booking.com shows a price calendar for flexible dates. This reveals the cheapest time to visit:

async def scrape_price_calendar(
    hotel_url: str,
    months_ahead: int = 3,
) -> dict:
    """
    Scrape Booking.com's price calendar for a hotel.
    Shows cheapest available rates for each day.
    """
    calendar_data = {}

    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled"]
        )
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
        )
        page = await context.new_page()

        async def capture_calendar(response):
            if "availabilityCalendar" in response.url:
                try:
                    data = await response.json()
                    if isinstance(data, dict) and "result" in data:
                        for entry in data["result"]:
                            cal_date = entry.get("checkin")
                            price = entry.get("avg_round_price") or entry.get("price")
                            if cal_date and price:
                                calendar_data[cal_date] = price
                except Exception:
                    pass

        page.on("response", capture_calendar)

        # Navigate to flexible dates view
        url = f"{hotel_url}?flexible_dates=1"
        await page.goto(url, wait_until="networkidle", timeout=40000)
        await asyncio.sleep(3)

        # Click "flexible dates" option if available
        try:
            flex_btn = await page.query_selector('[data-testid="flexible-search-button"]')
            if flex_btn:
                await flex_btn.click()
                await asyncio.sleep(3)
        except Exception:
            pass

        await browser.close()

    return {
        "url": hotel_url,
        "calendar": dict(sorted(calendar_data.items())),
        "cheapest_date": min(calendar_data, key=calendar_data.get) if calendar_data else None,
        "cheapest_price": min(calendar_data.values()) if calendar_data else None,
    }

Monitoring Price Drops

Build a price alert system that notifies you when prices drop below a threshold:

import sqlite3
from datetime import datetime

def check_price_drops(
    db_path: str = "hotel_prices.db",
    drop_threshold_pct: float = 15.0,
) -> list:
    """
    Compare latest prices against 7-day baseline.
    Returns hotels where price dropped more than threshold.
    """
    conn = sqlite3.connect(db_path)

    # Get latest and 7-day-ago prices for each hotel/date combo
    drops = conn.execute("""
        WITH recent AS (
            SELECT hotel_name, city, checkin_date, price_usd,
                   ROW_NUMBER() OVER (PARTITION BY hotel_name, checkin_date
                                     ORDER BY scraped_at DESC) as rn
            FROM prices
        ),
        baseline AS (
            SELECT hotel_name, checkin_date,
                   AVG(price_usd) as avg_price_7d
            FROM prices
            WHERE scraped_at < datetime('now', '-6 days')
            GROUP BY hotel_name, checkin_date
        )
        SELECT r.hotel_name, r.city, r.checkin_date,
               r.price_usd as current_price,
               b.avg_price_7d as baseline_price,
               ROUND((b.avg_price_7d - r.price_usd) / b.avg_price_7d * 100, 1) as drop_pct
        FROM recent r
        JOIN baseline b ON r.hotel_name = b.hotel_name
                       AND r.checkin_date = b.checkin_date
        WHERE r.rn = 1
          AND b.avg_price_7d > 0
          AND (b.avg_price_7d - r.price_usd) / b.avg_price_7d * 100 >= ?
        ORDER BY drop_pct DESC
    """, (drop_threshold_pct,)).fetchall()

    conn.close()
    return drops


drops = check_price_drops(drop_threshold_pct=20.0)
for d in drops:
    print(f"{d[0]} ({d[1]}) - {d[2]}: ${d[3]} (was ${d[4]:.0f}, drop {d[5]}%)")

Proxy Configuration Details

Booking.com's anti-bot stack (Akamai + their own systems) is particularly sensitive to IP geolocation. Prices displayed vary by the visitor's country -- a user in the US sees USD prices, a user in Germany sees EUR prices with sometimes different availability.

ThorData's residential proxies support geo-targeting, which matters for:

Currency consistency: always request from a US IP to get USD prices for comparison
Availability accuracy: some hotels show different room types by visitor region
Bot detection bypass: Akamai's reputation scores are much better for residential IPs

# Country-targeted proxy config for consistent USD pricing
PROXY_USD = {
    "server": "http://proxy.thordata.com:9000",
    "username": "username-country-us",  # US IP for USD prices
    "password": "your_password",
}

# European pricing research
PROXY_EUR = {
    "server": "http://proxy.thordata.com:9000",
    "username": "username-country-de",  # German IP for EUR prices
    "password": "your_password",
}

Complete Monitoring Pipeline

async def run_hotel_monitor(
    destinations: list,
    checkin_offsets: list = [30, 60, 90],
    db_path: str = "hotel_prices.db",
):
    """
    Full hotel price monitoring pipeline.
    Collects prices for multiple destinations at multiple future dates.
    Run daily via cron for trend analysis.
    """
    results = {}

    for destination in destinations:
        results[destination] = {}

        for offset in checkin_offsets:
            checkin = (date.today() + timedelta(days=offset)).isoformat()
            checkout = (date.today() + timedelta(days=offset + 2)).isoformat()

            print(f"  {destination}: {checkin} ({offset} days out)")

            hotels = await search_hotels(destination, checkin, checkout)
            if hotels:
                store_price_snapshot(hotels, destination, checkin, db_path)
                results[destination][checkin] = {
                    "count": len(hotels),
                    "min_price": min(h["price_usd"] for h in hotels),
                    "avg_price": sum(h["price_usd"] for h in hotels) / len(hotels),
                }

            await asyncio.sleep(random.uniform(15, 30))

    return results


DESTINATIONS = ["Barcelona, Spain", "Lisbon, Portugal", "Prague, Czech Republic"]
asyncio.run(run_hotel_monitor(DESTINATIONS))

Data Fields Reference

Here is a complete list of fields you can reliably extract from Booking.com in 2026:

Search results page: - name - Hotel name - price_usd - Nightly rate in USD (varies by currency/geo) - rating - Review score (0-10 scale) - review_count - Number of reviews - distance - Distance from city center or search point - url - Direct link to hotel page

Hotel detail page (individual scrape): - rooms - Array of room types with prices and occupancy - review_score - Detailed review breakdown (cleanliness, location, etc.) - amenities - Pool, gym, WiFi, parking, breakfast - check_in_out - Check-in/check-out times - cancellation_policy - Free cancellation or non-refundable

Review data (API intercept): - score - Numeric rating - title - Review headline - pros - Positive comments - cons - Negative comments - date - Review date - traveler_type - Business, couple, family, solo

Legal Considerations

Booking.com's terms of service prohibit scraping. That said, the data is publicly visible -- no login required. Courts in the US have generally ruled that scraping publicly available data does not violate the CFAA (see hiQ v. LinkedIn). The EU has similar precedent under GDPR for publicly accessible non-personal data.

The pragmatic approach: use reasonable delays, do not scrape so aggressively that you impact their service, and use the data for analysis or personal price monitoring rather than cloning their listings. Hotel names, prices, and ratings are factual data that cannot be copyrighted.