Scraping Booking.com Hotel Data (2026)

2026-04-01 booking.com web-scraping python playwright proxies hotels

Scraping Booking.com Hotel Data (2026)

Booking.com is protected by Datadome — one of the more aggressive bot detection systems deployed at scale. It combines TLS fingerprinting, behavioral analysis, device fingerprinting, and IP reputation. A plain requests.get returns a 403 or a Datadome challenge page within seconds.

This guide covers what actually works: URL construction, intercepting internal JSON endpoints, Playwright stealth automation, ThorData residential proxy integration, pagination handling, and a complete data storage pipeline.

Why Scrape Booking.com?

Booking.com lists 28+ million accommodations across 228 countries, updating prices thousands of times per day. Use cases:

Price comparison engines: Build or power hotel comparison tools with live Booking.com pricing
Travel market research: Analyze pricing patterns by destination, season, property type, and star rating
Revenue management consulting: Track competitor pricing for specific hotels or markets
Review intelligence: Aggregate guest feedback for hospitality quality benchmarking
Availability monitoring: Track booking windows — how far in advance rooms sell out by property type and market
Affiliate marketing optimization: Identify high-demand destinations and optimize content for peak booking periods

URL Construction and the Search Endpoint

Booking.com's search results page embeds structured JSON in the HTML and also fires internal API calls you can intercept. Start with URL construction — the parameters are well-understood and stable:

https://www.booking.com/searchresults.html?ss=Barcelona&checkin=2026-06-01&checkout=2026-06-05&group_adults=2&no_rooms=1&selected_currency=USD

Key parameters: - ss — destination (city, landmark, or property name) - checkin / checkout — ISO dates (YYYY-MM-DD) - group_adults — number of guests - no_rooms — number of rooms - selected_currency — force currency to avoid price inconsistencies - offset — pagination, increments by 25 (offset=0, offset=25, offset=50) - rows — results per page, max 25 for the search grid - nflt — filter parameter (stars, property type, amenities)

import asyncio
import json
import time
import random
import sqlite3
import re
from datetime import datetime, timedelta
from typing import Optional, Dict, List, Any
from urllib.parse import urlencode, urljoin


BASE_SEARCH_URL = "https://www.booking.com/searchresults.html"


def build_search_url(
    city: str,
    checkin: str,
    checkout: str,
    adults: int = 2,
    rooms: int = 1,
    page: int = 0,
    currency: str = "USD",
    min_stars: Optional[int] = None,
) -> str:
    """Build a paginated Booking.com search URL."""
    params = {
        "ss": city,
        "checkin": checkin,
        "checkout": checkout,
        "group_adults": adults,
        "no_rooms": rooms,
        "selected_currency": currency,
        "offset": page * 25,
        "rows": 25,
    }

    if min_stars:
        # Booking uses nflt=class%3D3 for 3-star minimum
        params["nflt"] = f"class%3D{min_stars}"

    return f"{BASE_SEARCH_URL}?{urlencode(params)}"


def build_property_url(
    hotel_name_slug: str,
    country_code: str,
    checkin: str,
    checkout: str,
    adults: int = 2,
) -> str:
    """Build a Booking.com property page URL with dates.

    Include dates so prices/availability render correctly.
    """
    return (
        f"https://www.booking.com/hotel/{country_code}/{hotel_name_slug}.html"
        f"?checkin={checkin}&checkout={checkout}&group_adults={adults}&no_rooms=1"
    )

The Unofficial Search JSON Endpoint

Booking.com's search page makes a background call to populate the map view. This endpoint returns clean JSON with hotel IDs, coordinates, prices, and ratings.

Add ajax=1 to the search URL to trigger the JSON response:

import httpx
from curl_cffi import requests as cffi_requests


HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Accept": "application/json, text/html, */*",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer": "https://www.booking.com/",
    "X-Requested-With": "XMLHttpRequest",
}


def try_ajax_endpoint(
    city: str,
    checkin: str,
    checkout: str,
    page: int = 0,
    proxy: Optional[str] = None,
) -> Optional[Dict]:
    """Attempt the Booking.com internal AJAX endpoint.

    This works intermittently from residential IPs. Requires
    proper TLS fingerprinting via curl_cffi.
    """
    params = {
        "ss": city,
        "checkin": checkin,
        "checkout": checkout,
        "group_adults": 2,
        "no_rooms": 1,
        "selected_currency": "USD",
        "offset": page * 25,
        "ajax": 1,
    }

    proxies = {"http": proxy, "https": proxy} if proxy else None

    try:
        session = cffi_requests.Session(impersonate="chrome124")
        if proxies:
            session.proxies = proxies

        resp = session.get(
            BASE_SEARCH_URL,
            params=params,
            headers=HEADERS,
            timeout=30,
        )

        if resp.status_code == 200:
            try:
                data = resp.json()
                if "results" in data:
                    print(f"  AJAX endpoint success: {len(data.get('results', []))} hotels")
                    return data
                else:
                    print("  AJAX returned non-results JSON (Datadome challenge likely)")
                    return None
            except json.JSONDecodeError:
                print("  Non-JSON response — Datadome challenge served")
                return None
        else:
            print(f"  AJAX blocked: HTTP {resp.status_code}")
            return None

    except Exception as e:
        print(f"  AJAX error: {e}")
        return None

Playwright Stealth: The Reliable Path

Datadome injects JavaScript that runs device fingerprinting — canvas entropy, WebGL renderer strings, audio context, navigator properties. Playwright with stealth patches bypasses most of this.

The most reliable approach: intercept the network responses rather than parsing HTML. Booking.com's frontend fires the search AJAX call automatically when the page loads — you capture the exact JSON the browser receives.

from playwright.async_api import async_playwright, BrowserContext


STEALTH_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
window.chrome = { runtime: {} };
const getParameter = WebGLRenderingContext.prototype.getParameter;
WebGLRenderingContext.prototype.getParameter = function(param) {
    if (param === 37445) return 'Intel Inc.';
    if (param === 37446) return 'Intel Iris OpenGL Engine';
    return getParameter.call(this, param);
};
const getParameterWebGL2 = WebGL2RenderingContext.prototype.getParameter;
WebGL2RenderingContext.prototype.getParameter = function(param) {
    if (param === 37445) return 'Intel Inc.';
    if (param === 37446) return 'Intel Iris OpenGL Engine';
    return getParameterWebGL2.call(this, param);
};
"""


async def scrape_booking_playwright(
    city: str,
    checkin: str,
    checkout: str,
    pages: int = 3,
    proxy_server: Optional[str] = None,
    adults: int = 2,
    currency: str = "USD",
) -> List[Dict]:
    """Scrape Booking.com search results via Playwright with network interception."""
    all_hotels = []

    async with async_playwright() as p:
        launch_opts = {
            "headless": True,
            "args": [
                "--no-sandbox",
                "--disable-blink-features=AutomationControlled",
                "--disable-infobars",
                "--disable-extensions",
            ],
        }
        if proxy_server:
            launch_opts["proxy"] = {"server": proxy_server}

        browser = await p.chromium.launch(**launch_opts)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
            viewport={"width": 1366, "height": 768},
            locale="en-US",
            timezone_id="America/New_York",
            extra_http_headers={
                "Accept-Language": "en-US,en;q=0.9",
            },
        )
        await context.add_init_script(STEALTH_SCRIPT)

        page = await context.new_page()

        for pg in range(pages):
            page_hotels = []

            # Intercept the search AJAX response
            async def intercept_response(response):
                url = response.url
                if "searchresults.html" in url and ("ajax=1" in url or "src=searchresults" in url):
                    try:
                        body = await response.json()
                        if "results" in body:
                            page_hotels.extend(body["results"])
                    except Exception:
                        pass

            page.on("response", intercept_response)

            # Build search URL for this page
            search_url = build_search_url(city, checkin, checkout, adults=adults, page=pg, currency=currency)

            await page.goto(search_url, wait_until="networkidle", timeout=60000)
            await page.wait_for_timeout(2500)  # Allow lazy requests to complete

            if page_hotels:
                all_hotels.extend(page_hotels)
                print(f"  Page {pg + 1}: {len(page_hotels)} hotels (via network intercept)")
            else:
                # Fallback: parse DOM
                dom_hotels = await _parse_hotel_cards_dom(page)
                all_hotels.extend(dom_hotels)
                print(f"  Page {pg + 1}: {len(dom_hotels)} hotels (via DOM parsing)")

            page.remove_listener("response", intercept_response)

            if pg < pages - 1:
                await asyncio.sleep(random.uniform(3.0, 6.0))

        await browser.close()

    return all_hotels


async def _parse_hotel_cards_dom(page) -> List[Dict]:
    """Parse hotel cards from page DOM as fallback."""
    hotels = []
    cards = await page.query_selector_all('[data-testid="property-card"]')

    for card in cards:
        try:
            name_el = await card.query_selector('[data-testid="title"]')
            price_el = await card.query_selector('[data-testid="price-and-discounted-price"]')
            score_el = await card.query_selector('[data-testid="review-score"]')
            link_el = await card.query_selector('a[data-testid="title-link"]')
            location_el = await card.query_selector('[data-testid="address"]')

            hotels.append({
                "hotel_name": await name_el.inner_text() if name_el else "",
                "price_display": await price_el.inner_text() if price_el else "",
                "review_score_text": await score_el.inner_text() if score_el else "",
                "url": await link_el.get_attribute("href") if link_el else "",
                "address": await location_el.inner_text() if location_el else "",
                "source": "dom",
            })
        except Exception:
            continue

    return hotels

Extracting Fields from HTML (Fallback Parser)

For cases where you can get the HTML but can't intercept AJAX:

from bs4 import BeautifulSoup


def parse_hotel_cards(html: str) -> List[Dict]:
    """Parse hotel property cards from Booking.com search results HTML."""
    soup = BeautifulSoup(html, "html.parser")
    hotels = []

    for card in soup.select('[data-testid="property-card"]'):
        name_el = card.select_one('[data-testid="title"]')
        price_el = card.select_one('[data-testid="price-and-discounted-price"]')
        score_el = card.select_one('[data-testid="review-score"] div:first-child')
        count_el = card.select_one('[data-testid="review-score"] div:last-child')
        address_el = card.select_one('[data-testid="address"]')
        stars_el = card.select_one('[data-testid="rating-stars"]')
        distance_el = card.select_one('[data-testid="distance"]')
        link_el = card.select_one('a[data-testid="title-link"]')

        # Extract numeric price if possible
        price_raw = price_el.get_text(strip=True) if price_el else ""
        price_numeric = None
        price_match = re.search(r"\$?([\d,]+)", price_raw.replace(",", ""))
        if price_match:
            try:
                price_numeric = float(price_match.group(1).replace(",", ""))
            except ValueError:
                pass

        # Extract numeric score
        score_raw = score_el.get_text(strip=True) if score_el else ""
        score_numeric = None
        try:
            score_numeric = float(score_raw)
        except ValueError:
            pass

        # Extract review count
        count_text = count_el.get_text(strip=True) if count_el else ""
        review_count = None
        count_match = re.search(r"([\d,]+)", count_text)
        if count_match:
            try:
                review_count = int(count_match.group(1).replace(",", ""))
            except ValueError:
                pass

        hotels.append({
            "name": name_el.get_text(strip=True) if name_el else None,
            "price_display": price_raw,
            "price_usd": price_numeric,
            "review_score": score_numeric,
            "review_count": review_count,
            "address": address_el.get_text(strip=True) if address_el else None,
            "star_rating": _count_stars(stars_el),
            "distance": distance_el.get_text(strip=True) if distance_el else None,
            "url": link_el.get("href") if link_el else None,
            "source": "html",
        })

    return hotels


def _count_stars(el) -> Optional[int]:
    """Count star rating from stars element."""
    if not el:
        return None
    # Booking renders stars as individual span elements or SVG icons
    stars = el.find_all("svg") or el.find_all("[class*='star']")
    return len(stars) if stars else None

Individual Property Data

For full property detail — amenities, room types, full review text — you need the property page. Always include checkin/checkout dates — without them, prices won't render.

async def scrape_property_detail(
    context: BrowserContext,
    hotel_url: str,
    checkin: str,
    checkout: str,
) -> Dict:
    """Scrape full property detail page."""
    # Ensure dates are in the URL
    if "checkin=" not in hotel_url:
        sep = "&" if "?" in hotel_url else "?"
        hotel_url = f"{hotel_url}{sep}checkin={checkin}&checkout={checkout}&group_adults=2&no_rooms=1"

    page = await context.new_page()
    await page.goto(hotel_url, wait_until="networkidle", timeout=60000)
    await page.wait_for_timeout(2000)

    data = await page.evaluate("""
        () => {
            const get = (sel, attr) => {
                const el = document.querySelector(sel);
                return el ? (attr ? el.getAttribute(attr) : el.innerText.trim()) : null;
            };
            const getAll = (sel) => Array.from(document.querySelectorAll(sel)).map(e => e.innerText.trim());

            return {
                name: get('h2.pp-header__title') || get('[data-testid="property-header"] h2'),
                address: get('.hp_address_subtitle, [data-testid="property-header__address"]'),
                description: get('#property_description_content, .hp-desc-highlighted'),
                review_score: parseFloat(get('.d10a6220b4, [data-testid="review-score-right-component"] .a3b8729ab1') || '0') || null,
                review_count: parseInt((get('.d935416c47, [data-testid="review-score-right-component"] .d8eab2cf7f') || '0').replace(/[^\d]/g, '')) || 0,
                star_rating: document.querySelectorAll('.b_star_icon, .hp_hotel_star').length || null,
                facilities: getAll('.facilityIcon, .hp_facilities li').slice(0, 30),
                room_types: Array.from(document.querySelectorAll('.hprt-table tbody tr')).slice(0, 10).map(row => {
                    const type = row.querySelector('.hprt-roomtype-icon-link');
                    const price = row.querySelector('.prco-valign-middle-helper, .bui-price-display__value');
                    return { type: type ? type.innerText.trim() : '', price: price ? price.innerText.trim() : '' };
                }).filter(r => r.type),
                latitude: parseFloat(document.querySelector('[data-atlas-latlng]')?.getAttribute('data-atlas-latlng')?.split(',')[0]) || null,
                longitude: parseFloat(document.querySelector('[data-atlas-latlng]')?.getAttribute('data-atlas-latlng')?.split(',')[1]) || null,
            };
        }
    """)

    # Also extract JSON-LD structured data
    json_ld = await page.evaluate("""
        () => {
            const scripts = document.querySelectorAll('script[type="application/ld+json"]');
            for (const s of scripts) {
                try {
                    const d = JSON.parse(s.textContent);
                    if (d['@type'] === 'Hotel' || d['@type'] === 'LodgingBusiness') return d;
                } catch(e) {}
            }
            return null;
        }
    """)

    if json_ld:
        data["aggregate_rating"] = json_ld.get("aggregateRating", {})
        data["price_range"] = json_ld.get("priceRange")
        data["amenities_from_schema"] = [
            a.get("name") for a in json_ld.get("amenityFeature", [])
            if isinstance(a, dict)
        ][:20]

    await page.close()
    return data

ThorData Proxy Integration

Datadome maintains a real-time IP reputation database. All datacenter CIDR ranges — AWS, GCP, Azure, Hetzner, DigitalOcean — are flagged as high-risk. Requests from those IPs hit the challenge wall before any content loads.

Residential proxies route traffic through real ISP-assigned addresses. ThorData has a residential pool with city-level geo-targeting — useful because Booking.com localizes prices based on your apparent location. Scraping from a US residential IP while targeting European hotels shows different rates than European users see. Use geo-targeted proxies matching your target market.

class ThorDataProxyPool:
    """ThorData residential proxy pool for Booking.com scraping."""

    def __init__(self, username: str, password: str):
        self.username = username
        self.password = password
        self.host = "gate.thordata.com"
        self.port = 9000

    def get_proxy(
        self,
        country: str = "US",
        city: Optional[str] = None,
        session_id: Optional[str] = None,
    ) -> str:
        """Get proxy URL with geo-targeting options."""
        user = f"{self.username}-country-{country.upper()}"
        if city:
            user = f"{user}-city-{city.lower()}"
        if session_id:
            user = f"{user}-session-{session_id}"
        return f"http://{user}:{self.password}@{self.host}:{self.port}"

    def get_rotating(self, country: str = "US") -> str:
        """Per-request IP rotation."""
        return self.get_proxy(country)

    def get_sticky(self, session_id: str, country: str = "US") -> str:
        """Sticky session — same IP for 2-5 min of browsing."""
        return self.get_proxy(country, session_id=session_id)

    def get_european_proxy(self) -> str:
        """Get European IP for European hotel pricing."""
        country = random.choice(["DE", "FR", "GB", "NL", "ES"])
        return self.get_proxy(country)

Pagination Handling

Booking.com paginates search results in increments of 25, with the offset parameter controlling position.

async def scrape_full_search(
    city: str,
    checkin: str,
    checkout: str,
    proxy_pool: Optional[ThorDataProxyPool] = None,
    max_pages: int = 10,
    adults: int = 2,
) -> List[Dict]:
    """Scrape all pages of Booking.com search results."""
    all_hotels = []

    for page in range(max_pages):
        print(f"\n  [PAGE {page + 1}/{max_pages}]")

        # Fresh proxy for each page to avoid session tracking
        proxy = proxy_pool.get_european_proxy() if proxy_pool and "europe" in city.lower() else (
            proxy_pool.get_rotating() if proxy_pool else None
        )

        page_hotels = await scrape_booking_playwright(
            city, checkin, checkout, pages=1,
            proxy_server=proxy,
            adults=adults,
        )

        if not page_hotels:
            print(f"  No results on page {page + 1} — stopping")
            break

        all_hotels.extend(page_hotels)
        print(f"  Total so far: {len(all_hotels)} hotels")

        await asyncio.sleep(random.uniform(4.0, 8.0))

    # Deduplicate by hotel_id or hotel_name
    seen = set()
    unique_hotels = []
    for hotel in all_hotels:
        key = hotel.get("hotel_id") or hotel.get("hotel_name") or hotel.get("name")
        if key and key not in seen:
            seen.add(key)
            unique_hotels.append(hotel)

    return unique_hotels

Data Storage

def init_database(db_path: str = "booking_hotels.db") -> sqlite3.Connection:
    """Initialize the Booking.com data database."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")

    conn.executescript("""
        CREATE TABLE IF NOT EXISTS hotels (
            hotel_id INTEGER,
            hotel_name TEXT,
            city TEXT,
            address TEXT,
            star_rating INTEGER,
            review_score REAL,
            review_count INTEGER,
            latitude REAL,
            longitude REAL,
            url TEXT,
            scraped_at TEXT,
            PRIMARY KEY (hotel_id, city)
        );

        CREATE TABLE IF NOT EXISTS price_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            hotel_id INTEGER,
            hotel_name TEXT,
            city TEXT,
            checkin TEXT,
            checkout TEXT,
            min_price REAL,
            currency TEXT,
            is_free_cancellable INTEGER,
            snapshot_date TEXT
        );

        CREATE TABLE IF NOT EXISTS room_types (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            hotel_id INTEGER,
            room_type TEXT,
            price_display TEXT,
            checkin TEXT,
            checkout TEXT,
            scraped_at TEXT
        );

        CREATE INDEX IF NOT EXISTS idx_price_hotel ON price_snapshots(hotel_id, checkin);
        CREATE INDEX IF NOT EXISTS idx_hotels_city ON hotels(city);
    """)

    conn.commit()
    return conn


def save_hotel(conn: sqlite3.Connection, hotel: Dict, city: str):
    """Save hotel data and price snapshot."""
    hotel_id = hotel.get("hotel_id")
    hotel_name = hotel.get("hotel_name") or hotel.get("name", "")

    if hotel_id:
        conn.execute(
            """INSERT OR REPLACE INTO hotels
               (hotel_id, hotel_name, city, address, star_rating, review_score,
                review_count, latitude, longitude, url, scraped_at)
               VALUES (?,?,?,?,?,?,?,?,?,?,?)""",
            (
                hotel_id, hotel_name, city,
                hotel.get("address"),
                hotel.get("class") or hotel.get("star_rating"),
                hotel.get("review_score"),
                hotel.get("review_nr") or hotel.get("review_count"),
                hotel.get("latitude"),
                hotel.get("longitude"),
                hotel.get("url"),
                datetime.utcnow().isoformat(),
            )
        )

    # Price snapshot
    if hotel.get("min_total_price") or hotel.get("price_usd"):
        conn.execute(
            """INSERT INTO price_snapshots
               (hotel_id, hotel_name, city, checkin, checkout, min_price, currency,
                is_free_cancellable, snapshot_date)
               VALUES (?,?,?,?,?,?,?,?,?)""",
            (
                hotel_id, hotel_name, city,
                hotel.get("checkin", ""),
                hotel.get("checkout", ""),
                hotel.get("min_total_price") or hotel.get("price_usd"),
                hotel.get("currency_code", "USD"),
                int(hotel.get("is_free_cancellable", 0)),
                datetime.utcnow().date().isoformat(),
            )
        )

    conn.commit()


def get_price_trend(
    conn: sqlite3.Connection,
    hotel_id: int,
    checkin: str,
) -> List[Dict]:
    """Get price history for a hotel on a specific checkin date."""
    rows = conn.execute(
        """SELECT min_price, currency, snapshot_date
           FROM price_snapshots
           WHERE hotel_id = ? AND checkin = ?
           ORDER BY snapshot_date ASC""",
        (hotel_id, checkin)
    ).fetchall()

    return [{"price": r[0], "currency": r[1], "date": r[2]} for r in rows]

Complete Production Pipeline

async def run_booking_pipeline(
    destinations: List[Dict],  # [{"city": "Barcelona", "checkin": "...", "checkout": "..."}]
    db_path: str = "booking_hotels.db",
    proxy_pool: Optional[ThorDataProxyPool] = None,
    max_pages: int = 5,
) -> Dict:
    """Full pipeline: search → detail → database."""
    conn = init_database(db_path)
    stats = {"destinations": 0, "hotels_found": 0, "hotels_saved": 0, "errors": 0}

    for dest in destinations:
        city = dest["city"]
        checkin = dest["checkin"]
        checkout = dest["checkout"]
        print(f"\n[{city}] {checkin} to {checkout}")

        # Try AJAX endpoint first (fast, no JS overhead)
        proxy = proxy_pool.get_european_proxy() if proxy_pool else None
        ajax_data = try_ajax_endpoint(city, checkin, checkout, proxy=proxy)

        if ajax_data and ajax_data.get("results"):
            hotels = ajax_data["results"]
            # Add checkin/checkout to each hotel for storage
            for h in hotels:
                h["checkin"] = checkin
                h["checkout"] = checkout
        else:
            # Fall back to Playwright
            print("  AJAX failed, using Playwright...")
            hotels = await scrape_full_search(
                city, checkin, checkout,
                proxy_pool=proxy_pool,
                max_pages=max_pages,
            )
            for h in hotels:
                h["checkin"] = checkin
                h["checkout"] = checkout

        stats["hotels_found"] += len(hotels)
        print(f"  Found {len(hotels)} hotels")

        for hotel in hotels:
            try:
                save_hotel(conn, hotel, city)
                stats["hotels_saved"] += 1
            except Exception as e:
                print(f"  [ERROR] Save failed: {e}")
                stats["errors"] += 1

        stats["destinations"] += 1
        await asyncio.sleep(random.uniform(10.0, 20.0))

    conn.close()
    print(f"\nPipeline complete: {stats}")
    return stats


# Example usage
async def main():
    DESTINATIONS = [
        {"city": "Barcelona", "checkin": "2026-06-01", "checkout": "2026-06-05"},
        {"city": "Amsterdam", "checkin": "2026-07-01", "checkout": "2026-07-04"},
        {"city": "Rome", "checkin": "2026-08-15", "checkout": "2026-08-18"},
    ]

    # pool = ThorDataProxyPool("YOUR_USER", "YOUR_PASS")
    # results = await run_booking_pipeline(DESTINATIONS, proxy_pool=pool)
    results = await run_booking_pipeline(DESTINATIONS)
    print(results)


asyncio.run(main())

Rate Limiting and Behavioral Patterns

Even with residential proxies, Booking.com tracks behavioral patterns:

More than ~30 search requests per hour from one IP triggers soft blocking
Requests completing faster than a human could read the page look synthetic
Identical search parameters repeated in sequence are flagged
Sessions that never click on results (just search and leave) are suspicious

Add random delays (3-8 seconds between requests), randomize user agents between sessions, and vary your search parameters. Rotating sessions — new browser context per 10-15 requests — helps reset fingerprint state.

Per-request rotation works for search result pages. For individual property pages where you're simulating browsing through room options, sticky sessions (same IP for 2-5 minutes) work better and are more realistic.

ThorData's residential proxy network with their geo-targeting feature makes this straightforward — use European IPs for European hotel searches to see the same prices local users see.

What You Can't Get Without Accounts

Booking.com's review API returns full review text but requires an authenticated session to paginate past the first page. Aggregate scores and total review counts are freely available; individual review text at scale requires either logged-in session scraping or the official Affiliate API.

For most use cases — price monitoring, availability tracking, competitive analysis — the unauthenticated search data is sufficient and covers the most valuable data points.