Scraping Booking.com Hotel Data (2026)
Scraping Booking.com Hotel Data (2026)
Booking.com is protected by Datadome — one of the more aggressive bot detection systems deployed at scale. It combines TLS fingerprinting, behavioral analysis, device fingerprinting, and IP reputation. A plain requests.get returns a 403 or a Datadome challenge page within seconds.
This guide covers what actually works: URL construction, intercepting internal JSON endpoints, Playwright stealth automation, ThorData residential proxy integration, pagination handling, and a complete data storage pipeline.
Why Scrape Booking.com?
Booking.com lists 28+ million accommodations across 228 countries, updating prices thousands of times per day. Use cases:
- Price comparison engines: Build or power hotel comparison tools with live Booking.com pricing
- Travel market research: Analyze pricing patterns by destination, season, property type, and star rating
- Revenue management consulting: Track competitor pricing for specific hotels or markets
- Review intelligence: Aggregate guest feedback for hospitality quality benchmarking
- Availability monitoring: Track booking windows — how far in advance rooms sell out by property type and market
- Affiliate marketing optimization: Identify high-demand destinations and optimize content for peak booking periods
URL Construction and the Search Endpoint
Booking.com's search results page embeds structured JSON in the HTML and also fires internal API calls you can intercept. Start with URL construction — the parameters are well-understood and stable:
https://www.booking.com/searchresults.html?ss=Barcelona&checkin=2026-06-01&checkout=2026-06-05&group_adults=2&no_rooms=1&selected_currency=USD
Key parameters:
- ss — destination (city, landmark, or property name)
- checkin / checkout — ISO dates (YYYY-MM-DD)
- group_adults — number of guests
- no_rooms — number of rooms
- selected_currency — force currency to avoid price inconsistencies
- offset — pagination, increments by 25 (offset=0, offset=25, offset=50)
- rows — results per page, max 25 for the search grid
- nflt — filter parameter (stars, property type, amenities)
import asyncio
import json
import time
import random
import sqlite3
import re
from datetime import datetime, timedelta
from typing import Optional, Dict, List, Any
from urllib.parse import urlencode, urljoin
BASE_SEARCH_URL = "https://www.booking.com/searchresults.html"
def build_search_url(
city: str,
checkin: str,
checkout: str,
adults: int = 2,
rooms: int = 1,
page: int = 0,
currency: str = "USD",
min_stars: Optional[int] = None,
) -> str:
"""Build a paginated Booking.com search URL."""
params = {
"ss": city,
"checkin": checkin,
"checkout": checkout,
"group_adults": adults,
"no_rooms": rooms,
"selected_currency": currency,
"offset": page * 25,
"rows": 25,
}
if min_stars:
# Booking uses nflt=class%3D3 for 3-star minimum
params["nflt"] = f"class%3D{min_stars}"
return f"{BASE_SEARCH_URL}?{urlencode(params)}"
def build_property_url(
hotel_name_slug: str,
country_code: str,
checkin: str,
checkout: str,
adults: int = 2,
) -> str:
"""Build a Booking.com property page URL with dates.
Include dates so prices/availability render correctly.
"""
return (
f"https://www.booking.com/hotel/{country_code}/{hotel_name_slug}.html"
f"?checkin={checkin}&checkout={checkout}&group_adults={adults}&no_rooms=1"
)
The Unofficial Search JSON Endpoint
Booking.com's search page makes a background call to populate the map view. This endpoint returns clean JSON with hotel IDs, coordinates, prices, and ratings.
Add ajax=1 to the search URL to trigger the JSON response:
import httpx
from curl_cffi import requests as cffi_requests
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Accept": "application/json, text/html, */*",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.booking.com/",
"X-Requested-With": "XMLHttpRequest",
}
def try_ajax_endpoint(
city: str,
checkin: str,
checkout: str,
page: int = 0,
proxy: Optional[str] = None,
) -> Optional[Dict]:
"""Attempt the Booking.com internal AJAX endpoint.
This works intermittently from residential IPs. Requires
proper TLS fingerprinting via curl_cffi.
"""
params = {
"ss": city,
"checkin": checkin,
"checkout": checkout,
"group_adults": 2,
"no_rooms": 1,
"selected_currency": "USD",
"offset": page * 25,
"ajax": 1,
}
proxies = {"http": proxy, "https": proxy} if proxy else None
try:
session = cffi_requests.Session(impersonate="chrome124")
if proxies:
session.proxies = proxies
resp = session.get(
BASE_SEARCH_URL,
params=params,
headers=HEADERS,
timeout=30,
)
if resp.status_code == 200:
try:
data = resp.json()
if "results" in data:
print(f" AJAX endpoint success: {len(data.get('results', []))} hotels")
return data
else:
print(" AJAX returned non-results JSON (Datadome challenge likely)")
return None
except json.JSONDecodeError:
print(" Non-JSON response — Datadome challenge served")
return None
else:
print(f" AJAX blocked: HTTP {resp.status_code}")
return None
except Exception as e:
print(f" AJAX error: {e}")
return None
Playwright Stealth: The Reliable Path
Datadome injects JavaScript that runs device fingerprinting — canvas entropy, WebGL renderer strings, audio context, navigator properties. Playwright with stealth patches bypasses most of this.
The most reliable approach: intercept the network responses rather than parsing HTML. Booking.com's frontend fires the search AJAX call automatically when the page loads — you capture the exact JSON the browser receives.
from playwright.async_api import async_playwright, BrowserContext
STEALTH_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3, 4, 5] });
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
window.chrome = { runtime: {} };
const getParameter = WebGLRenderingContext.prototype.getParameter;
WebGLRenderingContext.prototype.getParameter = function(param) {
if (param === 37445) return 'Intel Inc.';
if (param === 37446) return 'Intel Iris OpenGL Engine';
return getParameter.call(this, param);
};
const getParameterWebGL2 = WebGL2RenderingContext.prototype.getParameter;
WebGL2RenderingContext.prototype.getParameter = function(param) {
if (param === 37445) return 'Intel Inc.';
if (param === 37446) return 'Intel Iris OpenGL Engine';
return getParameterWebGL2.call(this, param);
};
"""
async def scrape_booking_playwright(
city: str,
checkin: str,
checkout: str,
pages: int = 3,
proxy_server: Optional[str] = None,
adults: int = 2,
currency: str = "USD",
) -> List[Dict]:
"""Scrape Booking.com search results via Playwright with network interception."""
all_hotels = []
async with async_playwright() as p:
launch_opts = {
"headless": True,
"args": [
"--no-sandbox",
"--disable-blink-features=AutomationControlled",
"--disable-infobars",
"--disable-extensions",
],
}
if proxy_server:
launch_opts["proxy"] = {"server": proxy_server}
browser = await p.chromium.launch(**launch_opts)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
viewport={"width": 1366, "height": 768},
locale="en-US",
timezone_id="America/New_York",
extra_http_headers={
"Accept-Language": "en-US,en;q=0.9",
},
)
await context.add_init_script(STEALTH_SCRIPT)
page = await context.new_page()
for pg in range(pages):
page_hotels = []
# Intercept the search AJAX response
async def intercept_response(response):
url = response.url
if "searchresults.html" in url and ("ajax=1" in url or "src=searchresults" in url):
try:
body = await response.json()
if "results" in body:
page_hotels.extend(body["results"])
except Exception:
pass
page.on("response", intercept_response)
# Build search URL for this page
search_url = build_search_url(city, checkin, checkout, adults=adults, page=pg, currency=currency)
await page.goto(search_url, wait_until="networkidle", timeout=60000)
await page.wait_for_timeout(2500) # Allow lazy requests to complete
if page_hotels:
all_hotels.extend(page_hotels)
print(f" Page {pg + 1}: {len(page_hotels)} hotels (via network intercept)")
else:
# Fallback: parse DOM
dom_hotels = await _parse_hotel_cards_dom(page)
all_hotels.extend(dom_hotels)
print(f" Page {pg + 1}: {len(dom_hotels)} hotels (via DOM parsing)")
page.remove_listener("response", intercept_response)
if pg < pages - 1:
await asyncio.sleep(random.uniform(3.0, 6.0))
await browser.close()
return all_hotels
async def _parse_hotel_cards_dom(page) -> List[Dict]:
"""Parse hotel cards from page DOM as fallback."""
hotels = []
cards = await page.query_selector_all('[data-testid="property-card"]')
for card in cards:
try:
name_el = await card.query_selector('[data-testid="title"]')
price_el = await card.query_selector('[data-testid="price-and-discounted-price"]')
score_el = await card.query_selector('[data-testid="review-score"]')
link_el = await card.query_selector('a[data-testid="title-link"]')
location_el = await card.query_selector('[data-testid="address"]')
hotels.append({
"hotel_name": await name_el.inner_text() if name_el else "",
"price_display": await price_el.inner_text() if price_el else "",
"review_score_text": await score_el.inner_text() if score_el else "",
"url": await link_el.get_attribute("href") if link_el else "",
"address": await location_el.inner_text() if location_el else "",
"source": "dom",
})
except Exception:
continue
return hotels
Extracting Fields from HTML (Fallback Parser)
For cases where you can get the HTML but can't intercept AJAX:
from bs4 import BeautifulSoup
def parse_hotel_cards(html: str) -> List[Dict]:
"""Parse hotel property cards from Booking.com search results HTML."""
soup = BeautifulSoup(html, "html.parser")
hotels = []
for card in soup.select('[data-testid="property-card"]'):
name_el = card.select_one('[data-testid="title"]')
price_el = card.select_one('[data-testid="price-and-discounted-price"]')
score_el = card.select_one('[data-testid="review-score"] div:first-child')
count_el = card.select_one('[data-testid="review-score"] div:last-child')
address_el = card.select_one('[data-testid="address"]')
stars_el = card.select_one('[data-testid="rating-stars"]')
distance_el = card.select_one('[data-testid="distance"]')
link_el = card.select_one('a[data-testid="title-link"]')
# Extract numeric price if possible
price_raw = price_el.get_text(strip=True) if price_el else ""
price_numeric = None
price_match = re.search(r"\$?([\d,]+)", price_raw.replace(",", ""))
if price_match:
try:
price_numeric = float(price_match.group(1).replace(",", ""))
except ValueError:
pass
# Extract numeric score
score_raw = score_el.get_text(strip=True) if score_el else ""
score_numeric = None
try:
score_numeric = float(score_raw)
except ValueError:
pass
# Extract review count
count_text = count_el.get_text(strip=True) if count_el else ""
review_count = None
count_match = re.search(r"([\d,]+)", count_text)
if count_match:
try:
review_count = int(count_match.group(1).replace(",", ""))
except ValueError:
pass
hotels.append({
"name": name_el.get_text(strip=True) if name_el else None,
"price_display": price_raw,
"price_usd": price_numeric,
"review_score": score_numeric,
"review_count": review_count,
"address": address_el.get_text(strip=True) if address_el else None,
"star_rating": _count_stars(stars_el),
"distance": distance_el.get_text(strip=True) if distance_el else None,
"url": link_el.get("href") if link_el else None,
"source": "html",
})
return hotels
def _count_stars(el) -> Optional[int]:
"""Count star rating from stars element."""
if not el:
return None
# Booking renders stars as individual span elements or SVG icons
stars = el.find_all("svg") or el.find_all("[class*='star']")
return len(stars) if stars else None
Individual Property Data
For full property detail — amenities, room types, full review text — you need the property page. Always include checkin/checkout dates — without them, prices won't render.
async def scrape_property_detail(
context: BrowserContext,
hotel_url: str,
checkin: str,
checkout: str,
) -> Dict:
"""Scrape full property detail page."""
# Ensure dates are in the URL
if "checkin=" not in hotel_url:
sep = "&" if "?" in hotel_url else "?"
hotel_url = f"{hotel_url}{sep}checkin={checkin}&checkout={checkout}&group_adults=2&no_rooms=1"
page = await context.new_page()
await page.goto(hotel_url, wait_until="networkidle", timeout=60000)
await page.wait_for_timeout(2000)
data = await page.evaluate("""
() => {
const get = (sel, attr) => {
const el = document.querySelector(sel);
return el ? (attr ? el.getAttribute(attr) : el.innerText.trim()) : null;
};
const getAll = (sel) => Array.from(document.querySelectorAll(sel)).map(e => e.innerText.trim());
return {
name: get('h2.pp-header__title') || get('[data-testid="property-header"] h2'),
address: get('.hp_address_subtitle, [data-testid="property-header__address"]'),
description: get('#property_description_content, .hp-desc-highlighted'),
review_score: parseFloat(get('.d10a6220b4, [data-testid="review-score-right-component"] .a3b8729ab1') || '0') || null,
review_count: parseInt((get('.d935416c47, [data-testid="review-score-right-component"] .d8eab2cf7f') || '0').replace(/[^\d]/g, '')) || 0,
star_rating: document.querySelectorAll('.b_star_icon, .hp_hotel_star').length || null,
facilities: getAll('.facilityIcon, .hp_facilities li').slice(0, 30),
room_types: Array.from(document.querySelectorAll('.hprt-table tbody tr')).slice(0, 10).map(row => {
const type = row.querySelector('.hprt-roomtype-icon-link');
const price = row.querySelector('.prco-valign-middle-helper, .bui-price-display__value');
return { type: type ? type.innerText.trim() : '', price: price ? price.innerText.trim() : '' };
}).filter(r => r.type),
latitude: parseFloat(document.querySelector('[data-atlas-latlng]')?.getAttribute('data-atlas-latlng')?.split(',')[0]) || null,
longitude: parseFloat(document.querySelector('[data-atlas-latlng]')?.getAttribute('data-atlas-latlng')?.split(',')[1]) || null,
};
}
""")
# Also extract JSON-LD structured data
json_ld = await page.evaluate("""
() => {
const scripts = document.querySelectorAll('script[type="application/ld+json"]');
for (const s of scripts) {
try {
const d = JSON.parse(s.textContent);
if (d['@type'] === 'Hotel' || d['@type'] === 'LodgingBusiness') return d;
} catch(e) {}
}
return null;
}
""")
if json_ld:
data["aggregate_rating"] = json_ld.get("aggregateRating", {})
data["price_range"] = json_ld.get("priceRange")
data["amenities_from_schema"] = [
a.get("name") for a in json_ld.get("amenityFeature", [])
if isinstance(a, dict)
][:20]
await page.close()
return data
ThorData Proxy Integration
Datadome maintains a real-time IP reputation database. All datacenter CIDR ranges — AWS, GCP, Azure, Hetzner, DigitalOcean — are flagged as high-risk. Requests from those IPs hit the challenge wall before any content loads.
Residential proxies route traffic through real ISP-assigned addresses. ThorData has a residential pool with city-level geo-targeting — useful because Booking.com localizes prices based on your apparent location. Scraping from a US residential IP while targeting European hotels shows different rates than European users see. Use geo-targeted proxies matching your target market.
class ThorDataProxyPool:
"""ThorData residential proxy pool for Booking.com scraping."""
def __init__(self, username: str, password: str):
self.username = username
self.password = password
self.host = "gate.thordata.com"
self.port = 9000
def get_proxy(
self,
country: str = "US",
city: Optional[str] = None,
session_id: Optional[str] = None,
) -> str:
"""Get proxy URL with geo-targeting options."""
user = f"{self.username}-country-{country.upper()}"
if city:
user = f"{user}-city-{city.lower()}"
if session_id:
user = f"{user}-session-{session_id}"
return f"http://{user}:{self.password}@{self.host}:{self.port}"
def get_rotating(self, country: str = "US") -> str:
"""Per-request IP rotation."""
return self.get_proxy(country)
def get_sticky(self, session_id: str, country: str = "US") -> str:
"""Sticky session — same IP for 2-5 min of browsing."""
return self.get_proxy(country, session_id=session_id)
def get_european_proxy(self) -> str:
"""Get European IP for European hotel pricing."""
country = random.choice(["DE", "FR", "GB", "NL", "ES"])
return self.get_proxy(country)
Pagination Handling
Booking.com paginates search results in increments of 25, with the offset parameter controlling position.
async def scrape_full_search(
city: str,
checkin: str,
checkout: str,
proxy_pool: Optional[ThorDataProxyPool] = None,
max_pages: int = 10,
adults: int = 2,
) -> List[Dict]:
"""Scrape all pages of Booking.com search results."""
all_hotels = []
for page in range(max_pages):
print(f"\n [PAGE {page + 1}/{max_pages}]")
# Fresh proxy for each page to avoid session tracking
proxy = proxy_pool.get_european_proxy() if proxy_pool and "europe" in city.lower() else (
proxy_pool.get_rotating() if proxy_pool else None
)
page_hotels = await scrape_booking_playwright(
city, checkin, checkout, pages=1,
proxy_server=proxy,
adults=adults,
)
if not page_hotels:
print(f" No results on page {page + 1} — stopping")
break
all_hotels.extend(page_hotels)
print(f" Total so far: {len(all_hotels)} hotels")
await asyncio.sleep(random.uniform(4.0, 8.0))
# Deduplicate by hotel_id or hotel_name
seen = set()
unique_hotels = []
for hotel in all_hotels:
key = hotel.get("hotel_id") or hotel.get("hotel_name") or hotel.get("name")
if key and key not in seen:
seen.add(key)
unique_hotels.append(hotel)
return unique_hotels
Data Storage
def init_database(db_path: str = "booking_hotels.db") -> sqlite3.Connection:
"""Initialize the Booking.com data database."""
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.executescript("""
CREATE TABLE IF NOT EXISTS hotels (
hotel_id INTEGER,
hotel_name TEXT,
city TEXT,
address TEXT,
star_rating INTEGER,
review_score REAL,
review_count INTEGER,
latitude REAL,
longitude REAL,
url TEXT,
scraped_at TEXT,
PRIMARY KEY (hotel_id, city)
);
CREATE TABLE IF NOT EXISTS price_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hotel_id INTEGER,
hotel_name TEXT,
city TEXT,
checkin TEXT,
checkout TEXT,
min_price REAL,
currency TEXT,
is_free_cancellable INTEGER,
snapshot_date TEXT
);
CREATE TABLE IF NOT EXISTS room_types (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hotel_id INTEGER,
room_type TEXT,
price_display TEXT,
checkin TEXT,
checkout TEXT,
scraped_at TEXT
);
CREATE INDEX IF NOT EXISTS idx_price_hotel ON price_snapshots(hotel_id, checkin);
CREATE INDEX IF NOT EXISTS idx_hotels_city ON hotels(city);
""")
conn.commit()
return conn
def save_hotel(conn: sqlite3.Connection, hotel: Dict, city: str):
"""Save hotel data and price snapshot."""
hotel_id = hotel.get("hotel_id")
hotel_name = hotel.get("hotel_name") or hotel.get("name", "")
if hotel_id:
conn.execute(
"""INSERT OR REPLACE INTO hotels
(hotel_id, hotel_name, city, address, star_rating, review_score,
review_count, latitude, longitude, url, scraped_at)
VALUES (?,?,?,?,?,?,?,?,?,?,?)""",
(
hotel_id, hotel_name, city,
hotel.get("address"),
hotel.get("class") or hotel.get("star_rating"),
hotel.get("review_score"),
hotel.get("review_nr") or hotel.get("review_count"),
hotel.get("latitude"),
hotel.get("longitude"),
hotel.get("url"),
datetime.utcnow().isoformat(),
)
)
# Price snapshot
if hotel.get("min_total_price") or hotel.get("price_usd"):
conn.execute(
"""INSERT INTO price_snapshots
(hotel_id, hotel_name, city, checkin, checkout, min_price, currency,
is_free_cancellable, snapshot_date)
VALUES (?,?,?,?,?,?,?,?,?)""",
(
hotel_id, hotel_name, city,
hotel.get("checkin", ""),
hotel.get("checkout", ""),
hotel.get("min_total_price") or hotel.get("price_usd"),
hotel.get("currency_code", "USD"),
int(hotel.get("is_free_cancellable", 0)),
datetime.utcnow().date().isoformat(),
)
)
conn.commit()
def get_price_trend(
conn: sqlite3.Connection,
hotel_id: int,
checkin: str,
) -> List[Dict]:
"""Get price history for a hotel on a specific checkin date."""
rows = conn.execute(
"""SELECT min_price, currency, snapshot_date
FROM price_snapshots
WHERE hotel_id = ? AND checkin = ?
ORDER BY snapshot_date ASC""",
(hotel_id, checkin)
).fetchall()
return [{"price": r[0], "currency": r[1], "date": r[2]} for r in rows]
Complete Production Pipeline
async def run_booking_pipeline(
destinations: List[Dict], # [{"city": "Barcelona", "checkin": "...", "checkout": "..."}]
db_path: str = "booking_hotels.db",
proxy_pool: Optional[ThorDataProxyPool] = None,
max_pages: int = 5,
) -> Dict:
"""Full pipeline: search → detail → database."""
conn = init_database(db_path)
stats = {"destinations": 0, "hotels_found": 0, "hotels_saved": 0, "errors": 0}
for dest in destinations:
city = dest["city"]
checkin = dest["checkin"]
checkout = dest["checkout"]
print(f"\n[{city}] {checkin} to {checkout}")
# Try AJAX endpoint first (fast, no JS overhead)
proxy = proxy_pool.get_european_proxy() if proxy_pool else None
ajax_data = try_ajax_endpoint(city, checkin, checkout, proxy=proxy)
if ajax_data and ajax_data.get("results"):
hotels = ajax_data["results"]
# Add checkin/checkout to each hotel for storage
for h in hotels:
h["checkin"] = checkin
h["checkout"] = checkout
else:
# Fall back to Playwright
print(" AJAX failed, using Playwright...")
hotels = await scrape_full_search(
city, checkin, checkout,
proxy_pool=proxy_pool,
max_pages=max_pages,
)
for h in hotels:
h["checkin"] = checkin
h["checkout"] = checkout
stats["hotels_found"] += len(hotels)
print(f" Found {len(hotels)} hotels")
for hotel in hotels:
try:
save_hotel(conn, hotel, city)
stats["hotels_saved"] += 1
except Exception as e:
print(f" [ERROR] Save failed: {e}")
stats["errors"] += 1
stats["destinations"] += 1
await asyncio.sleep(random.uniform(10.0, 20.0))
conn.close()
print(f"\nPipeline complete: {stats}")
return stats
# Example usage
async def main():
DESTINATIONS = [
{"city": "Barcelona", "checkin": "2026-06-01", "checkout": "2026-06-05"},
{"city": "Amsterdam", "checkin": "2026-07-01", "checkout": "2026-07-04"},
{"city": "Rome", "checkin": "2026-08-15", "checkout": "2026-08-18"},
]
# pool = ThorDataProxyPool("YOUR_USER", "YOUR_PASS")
# results = await run_booking_pipeline(DESTINATIONS, proxy_pool=pool)
results = await run_booking_pipeline(DESTINATIONS)
print(results)
asyncio.run(main())
Rate Limiting and Behavioral Patterns
Even with residential proxies, Booking.com tracks behavioral patterns:
- More than ~30 search requests per hour from one IP triggers soft blocking
- Requests completing faster than a human could read the page look synthetic
- Identical search parameters repeated in sequence are flagged
- Sessions that never click on results (just search and leave) are suspicious
Add random delays (3-8 seconds between requests), randomize user agents between sessions, and vary your search parameters. Rotating sessions — new browser context per 10-15 requests — helps reset fingerprint state.
Per-request rotation works for search result pages. For individual property pages where you're simulating browsing through room options, sticky sessions (same IP for 2-5 minutes) work better and are more realistic.
ThorData's residential proxy network with their geo-targeting feature makes this straightforward — use European IPs for European hotel searches to see the same prices local users see.
What You Can't Get Without Accounts
Booking.com's review API returns full review text but requires an authenticated session to paginate past the first page. Aggregate scores and total review counts are freely available; individual review text at scale requires either logged-in session scraping or the official Affiliate API.
For most use cases — price monitoring, availability tracking, competitive analysis — the unauthenticated search data is sufficient and covers the most valuable data points.