Scraping Booking.com Hotel Prices and Availability in 2026 with Playwright
Scraping Booking.com Hotel Prices and Availability in 2026
Hotel price data is one of the most valuable scraping targets. Revenue managers use it to undercut competitors. Travel startups use it to build comparison engines. Researchers use it to study dynamic pricing algorithms.
Booking.com is the largest source — over 28 million listings across 200+ countries. The data is public on the page. The challenge is that almost nothing is in the initial HTML. Prices load dynamically based on your check-in/check-out dates, guest count, and — crucially — your location. This means requests won't work. You need a real browser.
Why Playwright
Booking.com runs heavy JavaScript that renders prices client-side. It also uses sophisticated bot detection that flags headless browsers. Playwright handles both:
- Full browser rendering (Chromium, Firefox, or WebKit)
- Built-in stealth capabilities when configured correctly
- Network interception to capture API responses directly
- Geolocation spoofing for location-dependent pricing
pip install playwright selectolax
playwright install chromium
Basic Hotel Search Scraper
import asyncio
import json
from playwright.async_api import async_playwright
from datetime import date, timedelta
from urllib.parse import urlencode
async def search_hotels(
destination: str,
checkin: str,
checkout: str,
adults: int = 2,
rooms: int = 1,
) -> list[dict]:
"""
Search Booking.com for hotels and return listing data.
Dates in YYYY-MM-DD format.
"""
params = {
"ss": destination,
"checkin": checkin,
"checkout": checkout,
"group_adults": adults,
"no_rooms": rooms,
"selected_currency": "USD",
}
url = f"https://www.booking.com/searchresults.html?{urlencode(params)}"
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
]
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
locale="en-US",
geolocation={"latitude": 40.7128, "longitude": -74.0060},
permissions=["geolocation"],
)
page = await context.new_page()
# Block images and fonts to speed up loading
await page.route("**/*.{png,jpg,jpeg,gif,svg,woff,woff2}",
lambda route: route.abort())
await page.goto(url, wait_until="domcontentloaded")
# Close cookie banner if present
try:
await page.click('[id="onetrust-accept-btn-handler"]', timeout=3000)
except:
pass
# Wait for price elements to render
await page.wait_for_selector('[data-testid="price-and-discounted-price"]',
timeout=15000)
# Extract hotel data from the page
hotels = await page.evaluate("""
() => {
const cards = document.querySelectorAll('[data-testid="property-card"]');
return Array.from(cards).map(card => {
const nameEl = card.querySelector('[data-testid="title"]');
const priceEl = card.querySelector('[data-testid="price-and-discounted-price"]');
const ratingEl = card.querySelector('[data-testid="review-score"]');
const locationEl = card.querySelector('[data-testid="distance"]');
const linkEl = card.querySelector('a[data-testid="title-link"]');
const priceText = priceEl ? priceEl.innerText.replace(/[^0-9]/g, '') : null;
const ratingText = ratingEl ? ratingEl.innerText : '';
const ratingMatch = ratingText.match(/([\d.]+)/);
return {
name: nameEl ? nameEl.innerText.trim() : null,
price_usd: priceText ? parseInt(priceText) : null,
rating: ratingMatch ? parseFloat(ratingMatch[1]) : null,
review_text: ratingText.trim(),
distance: locationEl ? locationEl.innerText.trim() : null,
url: linkEl ? linkEl.href.split('?')[0] : null,
};
}).filter(h => h.name && h.price_usd);
}
""")
await browser.close()
return hotels
# Search for hotels in Barcelona
checkin = (date.today() + timedelta(days=30)).isoformat()
checkout = (date.today() + timedelta(days=33)).isoformat()
hotels = asyncio.run(search_hotels("Barcelona, Spain", checkin, checkout))
for h in hotels[:10]:
rating = f"{h['rating']}/10" if h['rating'] else "N/A"
print(f"${h['price_usd']:>4} | {rating:>7} | {h['name']}")
Scraping Individual Hotel Pages
The search results give you a summary. For room-level pricing and availability, you need the hotel detail page:
async def scrape_hotel_details(hotel_url: str, checkin: str, checkout: str) -> dict:
"""Scrape room types, prices, and amenities from a hotel page."""
url = f"{hotel_url}?checkin={checkin}&checkout={checkout}&selected_currency=USD"
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True, args=[
"--disable-blink-features=AutomationControlled",
])
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
)
page = await context.new_page()
# Intercept the availability API call for cleaner data
api_data = {}
async def capture_api(response):
if "roomrates" in response.url or "availability" in response.url:
try:
api_data["rooms"] = await response.json()
except:
pass
page.on("response", capture_api)
await page.goto(url, wait_until="networkidle")
# Extract from DOM as fallback
rooms = await page.evaluate("""
() => {
const rows = document.querySelectorAll('table.hprt-table tr');
const results = [];
for (const row of rows) {
const typeEl = row.querySelector('.hprt-roomtype-icon-link');
const priceEl = row.querySelector('.prco-valign-middle-helper');
const capacityEl = row.querySelector('.hprt-occupancy-occupancy-info');
if (!typeEl || !priceEl) continue;
const priceText = priceEl.innerText.replace(/[^0-9]/g, '');
results.push({
room_type: typeEl.innerText.trim(),
price_per_night: priceText ? parseInt(priceText) : null,
max_guests: capacityEl ? capacityEl.innerText.trim() : null,
});
}
return results;
}
""")
# Get review summary
review_score = await page.evaluate("""
() => {
const el = document.querySelector('[data-testid="review-score-component"]');
return el ? el.innerText.trim() : null;
}
""")
await browser.close()
return {
"url": hotel_url,
"rooms": rooms,
"review_score": review_score,
"api_data": api_data.get("rooms"),
}
Handling Anti-Bot Detection
Booking.com uses a combination of Akamai Bot Manager and their own detection. Here's what specifically catches scrapers:
Browser fingerprinting: They check navigator.webdriver, plugin count, WebGL renderer, and canvas fingerprint. Playwright's default Chromium flags webdriver=true.
Rate limiting: More than ~30 searches per minute from one IP triggers a CAPTCHA. More than ~100 triggers a temporary block.
Session behavior: They track whether you actually behave like a user — do you scroll? Do you click on results? Do you have cookies from a previous visit?
The mitigation stack that works:
async def create_stealth_context(playwright):
"""Create a browser context that passes Booking.com's bot detection."""
browser = await playwright.chromium.launch(
headless=True,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
"--disable-dev-shm-usage",
]
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
locale="en-US",
timezone_id="America/New_York",
)
# Patch webdriver flag and add realistic browser properties
await context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', { get: () => false });
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
window.chrome = { runtime: {} };
""")
return browser, context
For IP rotation, the math is simple: if you need to scrape 500 hotels and the rate limit is ~30/minute/IP, you either wait 17 minutes on one IP or use 17 IPs and finish in a minute. Residential proxies are necessary here — Booking.com blocks all major datacenter ranges. ThorData's rotating residential proxies support city-level targeting, which matters for Booking.com since the prices shown vary by the requester's apparent location.
# Using proxy with Playwright
context = await browser.new_context(
proxy={"server": "http://proxy.thordata.com:9000",
"username": "user", "password": "pass"},
# ... other options
)
Price Monitoring Over Time
The real value is tracking prices across days. Hotels use dynamic pricing — rates change based on demand, day of week, and how far out the booking is.
import sqlite3
from datetime import datetime
def store_price_snapshot(hotels: list[dict], destination: str,
checkin: str, db_path: str = "hotel_prices.db"):
"""Store a price snapshot for historical tracking."""
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS prices (
hotel_name TEXT, destination TEXT, checkin_date TEXT,
price_usd INTEGER, rating REAL,
scraped_at TEXT, url TEXT
)
""")
now = datetime.now().isoformat()
for h in hotels:
conn.execute(
"INSERT INTO prices VALUES (?, ?, ?, ?, ?, ?, ?)",
(h["name"], destination, checkin, h["price_usd"],
h["rating"], now, h.get("url"))
)
conn.commit()
conn.close()
def get_price_trends(hotel_name: str, db_path: str = "hotel_prices.db") -> list:
"""Get price history for a specific hotel."""
conn = sqlite3.connect(db_path)
rows = conn.execute("""
SELECT checkin_date, price_usd, scraped_at
FROM prices WHERE hotel_name = ?
ORDER BY scraped_at
""", (hotel_name,)).fetchall()
conn.close()
return rows
Run the search scraper daily via cron. After a week you'll have enough data to see pricing patterns — when prices spike, how far in advance to book, and which hotels are consistently cheaper than their competitors.
Scraping Reviews
Hotel reviews are on the detail page but paginated. Booking.com loads them via XHR, which you can intercept:
async def scrape_reviews(hotel_url: str, max_pages: int = 5) -> list[dict]:
"""Scrape hotel reviews by intercepting the review API calls."""
reviews = []
async with async_playwright() as p:
browser, context = await create_stealth_context(p)
page = await context.new_page()
async def capture_reviews(response):
if "review_list" in response.url or "reviews" in response.url:
try:
data = await response.json()
# Structure varies — handle common formats
if isinstance(data, dict) and "result" in data:
for r in data["result"]:
reviews.append({
"score": r.get("average_score"),
"title": r.get("title"),
"pros": r.get("pros"),
"cons": r.get("cons"),
"date": r.get("date"),
"traveler_type": r.get("travel_purpose"),
})
except:
pass
page.on("response", capture_reviews)
await page.goto(f"{hotel_url}#tab-reviews", wait_until="networkidle")
for _ in range(max_pages - 1):
try:
next_btn = await page.query_selector('[data-testid="reviews-pagination-next"]')
if next_btn:
await next_btn.click()
await page.wait_for_timeout(2000)
except:
break
await browser.close()
return reviews
Legal and Ethical Considerations
Booking.com's terms of service prohibit scraping. That said, the data is publicly visible — no login required. Courts in the US have generally ruled that scraping publicly available data doesn't violate the CFAA (see hiQ v. LinkedIn). The EU has similar precedent under GDPR for publicly accessible non-personal data.
The pragmatic approach: don't scrape so aggressively that you impact their service. Use delays. Don't republish their content verbatim. Use the data for analysis, price comparison, or research — not to clone their listings.
Bulk Hotel Data Collection
For price monitoring across hundreds of properties, use an async approach with controlled concurrency:
import asyncio
import sqlite3
import random
from datetime import date, timedelta
from playwright.async_api import async_playwright
async def collect_city_hotels(
city: str,
checkin_offset_days: int = 30,
num_nights: int = 2,
max_hotels: int = 50,
proxy_config: dict = None,
) -> list[dict]:
"""
Collect all hotels for a city with pricing.
Returns list of hotel dicts with prices and ratings.
"""
checkin = (date.today() + timedelta(days=checkin_offset_days)).isoformat()
checkout = (date.today() + timedelta(days=checkin_offset_days + num_nights)).isoformat()
hotels = await search_hotels(city, checkin, checkout)
return hotels[:max_hotels]
async def run_city_comparison(
cities: list,
checkin_offset: int = 30,
db_path: str = "hotel_prices.db",
):
"""Compare hotel prices across multiple cities."""
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS prices (
hotel_name TEXT, city TEXT, checkin_date TEXT,
price_usd INTEGER, rating REAL, url TEXT,
scraped_at TEXT
)
""")
from datetime import datetime
now = datetime.now().isoformat()
checkin = (date.today() + timedelta(days=checkin_offset)).isoformat()
for city in cities:
print(f"Collecting: {city}")
try:
hotels = await collect_city_hotels(city, checkin_offset_days=checkin_offset)
for h in hotels:
conn.execute(
"INSERT INTO prices VALUES (?,?,?,?,?,?,?)",
(h["name"], city, checkin, h["price_usd"],
h["rating"], h.get("url"), now)
)
conn.commit()
print(f" Saved {len(hotels)} hotels")
except Exception as e:
print(f" Error: {e}")
await asyncio.sleep(random.uniform(10, 20))
conn.close()
return conn
# Compare weekend prices in European capitals
CITIES = [
"Paris, France", "London, UK", "Amsterdam, Netherlands",
"Barcelona, Spain", "Rome, Italy", "Berlin, Germany",
"Prague, Czech Republic", "Lisbon, Portugal"
]
asyncio.run(run_city_comparison(CITIES, checkin_offset=45))
Intercepting the Availability API
Booking.com makes internal API calls when loading hotel availability. These can be intercepted for cleaner data:
import asyncio
import json
from playwright.async_api import async_playwright
async def intercept_availability_api(
hotel_url: str,
checkin: str,
checkout: str,
) -> dict:
"""
Intercept Booking.com's internal availability API calls.
Returns cleaner structured data than DOM parsing.
"""
api_responses = []
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled"]
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
)
page = await context.new_page()
async def capture_response(response):
url = response.url
# Capture availability, room rate, and property data endpoints
if any(keyword in url for keyword in [
"availabilityCalendar", "roomRates", "propertyInfo",
"getReviewsDetails", "accommodations"
]):
try:
body = await response.json()
api_responses.append({
"endpoint": url.split("?")[0].split("/")[-1],
"url": url,
"data": body,
})
except Exception:
pass
page.on("response", capture_response)
full_url = f"{hotel_url}?checkin={checkin}&checkout={checkout}&selected_currency=USD"
await page.goto(full_url, wait_until="networkidle", timeout=40000)
await asyncio.sleep(5) # Wait for all async API calls to complete
await browser.close()
# Parse the captured responses
result = {"url": hotel_url, "checkin": checkin, "checkout": checkout}
for resp in api_responses:
endpoint = resp["endpoint"]
data = resp["data"]
if "roomRates" in endpoint or "accommodations" in endpoint:
# Extract room pricing
rooms = []
if isinstance(data, dict):
# Handle various response formats
room_list = (
data.get("result", {}).get("room_types", []) or
data.get("rooms", []) or
data.get("data", {}).get("roomTypes", [])
)
for room in room_list:
rooms.append({
"name": room.get("name") or room.get("room_type_name"),
"price_usd": room.get("price", {}).get("total") or room.get("rate"),
"max_occupancy": room.get("max_occupancy") or room.get("maxOccupancy"),
})
result["rooms"] = rooms
elif "propertyInfo" in endpoint:
result["property_details"] = data
elif "getReviewsDetails" in endpoint or "reviews" in endpoint:
result["review_data"] = data
return result
Price Calendar Scraping
Booking.com shows a price calendar for flexible dates. This reveals the cheapest time to visit:
async def scrape_price_calendar(
hotel_url: str,
months_ahead: int = 3,
) -> dict:
"""
Scrape Booking.com's price calendar for a hotel.
Shows cheapest available rates for each day.
"""
calendar_data = {}
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled"]
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
)
page = await context.new_page()
async def capture_calendar(response):
if "availabilityCalendar" in response.url:
try:
data = await response.json()
if isinstance(data, dict) and "result" in data:
for entry in data["result"]:
cal_date = entry.get("checkin")
price = entry.get("avg_round_price") or entry.get("price")
if cal_date and price:
calendar_data[cal_date] = price
except Exception:
pass
page.on("response", capture_calendar)
# Navigate to flexible dates view
url = f"{hotel_url}?flexible_dates=1"
await page.goto(url, wait_until="networkidle", timeout=40000)
await asyncio.sleep(3)
# Click "flexible dates" option if available
try:
flex_btn = await page.query_selector('[data-testid="flexible-search-button"]')
if flex_btn:
await flex_btn.click()
await asyncio.sleep(3)
except Exception:
pass
await browser.close()
return {
"url": hotel_url,
"calendar": dict(sorted(calendar_data.items())),
"cheapest_date": min(calendar_data, key=calendar_data.get) if calendar_data else None,
"cheapest_price": min(calendar_data.values()) if calendar_data else None,
}
Monitoring Price Drops
Build a price alert system that notifies you when prices drop below a threshold:
import sqlite3
from datetime import datetime
def check_price_drops(
db_path: str = "hotel_prices.db",
drop_threshold_pct: float = 15.0,
) -> list:
"""
Compare latest prices against 7-day baseline.
Returns hotels where price dropped more than threshold.
"""
conn = sqlite3.connect(db_path)
# Get latest and 7-day-ago prices for each hotel/date combo
drops = conn.execute("""
WITH recent AS (
SELECT hotel_name, city, checkin_date, price_usd,
ROW_NUMBER() OVER (PARTITION BY hotel_name, checkin_date
ORDER BY scraped_at DESC) as rn
FROM prices
),
baseline AS (
SELECT hotel_name, checkin_date,
AVG(price_usd) as avg_price_7d
FROM prices
WHERE scraped_at < datetime('now', '-6 days')
GROUP BY hotel_name, checkin_date
)
SELECT r.hotel_name, r.city, r.checkin_date,
r.price_usd as current_price,
b.avg_price_7d as baseline_price,
ROUND((b.avg_price_7d - r.price_usd) / b.avg_price_7d * 100, 1) as drop_pct
FROM recent r
JOIN baseline b ON r.hotel_name = b.hotel_name
AND r.checkin_date = b.checkin_date
WHERE r.rn = 1
AND b.avg_price_7d > 0
AND (b.avg_price_7d - r.price_usd) / b.avg_price_7d * 100 >= ?
ORDER BY drop_pct DESC
""", (drop_threshold_pct,)).fetchall()
conn.close()
return drops
drops = check_price_drops(drop_threshold_pct=20.0)
for d in drops:
print(f"{d[0]} ({d[1]}) - {d[2]}: ${d[3]} (was ${d[4]:.0f}, drop {d[5]}%)")
Proxy Configuration Details
Booking.com's anti-bot stack (Akamai + their own systems) is particularly sensitive to IP geolocation. Prices displayed vary by the visitor's country -- a user in the US sees USD prices, a user in Germany sees EUR prices with sometimes different availability.
ThorData's residential proxies support geo-targeting, which matters for:
- Currency consistency: always request from a US IP to get USD prices for comparison
- Availability accuracy: some hotels show different room types by visitor region
- Bot detection bypass: Akamai's reputation scores are much better for residential IPs
# Country-targeted proxy config for consistent USD pricing
PROXY_USD = {
"server": "http://proxy.thordata.com:9000",
"username": "username-country-us", # US IP for USD prices
"password": "your_password",
}
# European pricing research
PROXY_EUR = {
"server": "http://proxy.thordata.com:9000",
"username": "username-country-de", # German IP for EUR prices
"password": "your_password",
}
Complete Monitoring Pipeline
async def run_hotel_monitor(
destinations: list,
checkin_offsets: list = [30, 60, 90],
db_path: str = "hotel_prices.db",
):
"""
Full hotel price monitoring pipeline.
Collects prices for multiple destinations at multiple future dates.
Run daily via cron for trend analysis.
"""
results = {}
for destination in destinations:
results[destination] = {}
for offset in checkin_offsets:
checkin = (date.today() + timedelta(days=offset)).isoformat()
checkout = (date.today() + timedelta(days=offset + 2)).isoformat()
print(f" {destination}: {checkin} ({offset} days out)")
hotels = await search_hotels(destination, checkin, checkout)
if hotels:
store_price_snapshot(hotels, destination, checkin, db_path)
results[destination][checkin] = {
"count": len(hotels),
"min_price": min(h["price_usd"] for h in hotels),
"avg_price": sum(h["price_usd"] for h in hotels) / len(hotels),
}
await asyncio.sleep(random.uniform(15, 30))
return results
DESTINATIONS = ["Barcelona, Spain", "Lisbon, Portugal", "Prague, Czech Republic"]
asyncio.run(run_hotel_monitor(DESTINATIONS))
Data Fields Reference
Here is a complete list of fields you can reliably extract from Booking.com in 2026:
Search results page:
- name - Hotel name
- price_usd - Nightly rate in USD (varies by currency/geo)
- rating - Review score (0-10 scale)
- review_count - Number of reviews
- distance - Distance from city center or search point
- url - Direct link to hotel page
Hotel detail page (individual scrape):
- rooms - Array of room types with prices and occupancy
- review_score - Detailed review breakdown (cleanliness, location, etc.)
- amenities - Pool, gym, WiFi, parking, breakfast
- check_in_out - Check-in/check-out times
- cancellation_policy - Free cancellation or non-refundable
Review data (API intercept):
- score - Numeric rating
- title - Review headline
- pros - Positive comments
- cons - Negative comments
- date - Review date
- traveler_type - Business, couple, family, solo
Legal Considerations
Booking.com's terms of service prohibit scraping. That said, the data is publicly visible -- no login required. Courts in the US have generally ruled that scraping publicly available data does not violate the CFAA (see hiQ v. LinkedIn). The EU has similar precedent under GDPR for publicly accessible non-personal data.
The pragmatic approach: use reasonable delays, do not scrape so aggressively that you impact their service, and use the data for analysis or personal price monitoring rather than cloning their listings. Hotel names, prices, and ratings are factual data that cannot be copyrighted.