How to Scrape Price Comparison Sites in 2026: Google Shopping, CamelCamelCamel & Dynamic Pricing

2026-04-09 ["price comparison" "web scraping" "python" "e-commerce" "dynamic pricing" "google shopping" "camelcamelcamel"]

How to Scrape Price Comparison Sites in 2026: Google Shopping, CamelCamelCamel & Dynamic Pricing

Price data is some of the most valuable structured information on the web. Whether you are building a deal alert tool, tracking competitor pricing, analyzing historical trends, or detecting dynamic pricing across regions, the extraction patterns for Google Shopping, CamelCamelCamel, and major e-commerce sites are stable enough to build production pipelines around — if you handle the anti-bot layer correctly.

This post covers the full stack: JSON-LD extraction from Google Shopping, CamelCamelCamel price history parsing, dynamic pricing detection with Playwright, multi-region comparison, SQLite storage schema, and why geo-diverse proxies are non-negotiable for accurate price comparison.

Why Price Scraping Is Harder Than Most Data Tasks

Price data has properties that make scraping unusually difficult:

Dynamic rendering. Most e-commerce sites compute prices client-side after loading. The HTML delivered to your HTTP client contains placeholders or stale cache values. Actual prices require JavaScript execution.

Geographic segmentation. The same product can have meaningfully different prices depending on the country the request originates from. Amazon, Expedia, and consumer electronics retailers all implement geo-based pricing. A scraper that ignores this collects incomplete data.

Session state. Many retailers adjust prices based on logged-in state, browsing history, or loyalty program membership. Prices seen by a fresh anonymous session differ from prices seen by a returning customer. Some retailers have been documented showing higher prices to users who previously searched for the same item.

Structural instability. E-commerce page layouts change frequently, especially around sale events. Hard-coded selectors break during high-traffic periods when retailers modify markup for performance or A/B testing.

Active bot detection. Google Shopping, Amazon, and major retailers invest heavily in anti-bot infrastructure. TLS fingerprinting, behavioral biometrics, IP reputation databases, and cookie chain validation are all active.

What Price Data Is Worth Extracting

Not all price data is equal. The most useful fields per product:

Current price and currency — the offer price, not the crossed-out "was" price
Merchant/seller name — for multi-seller marketplace comparisons
Stock status — out-of-stock items distort price averages
Price history timestamps — a single snapshot is nearly useless for trend detection
Geographic price variants — the same product often has a different price in US, UK, and DE storefronts
Sale or promotional flags — differentiate organic price movement from promotional markdowns
Shipping cost — listed price without shipping is misleading for comparison

Google Shopping: JSON-LD Extraction

Google Shopping search result pages embed application/ld+json blocks and itemscope microdata. The JSON-LD approach is more reliable than CSS selectors because it is structurally enforced by Google's own schema requirements.

import httpx
import json
import re
import time
from bs4 import BeautifulSoup
from typing import Optional


def extract_google_shopping(
    query: str,
    proxy: str = None,
    country: str = "us",
    language: str = "en",
) -> list[dict]:
    """
    Extract product listings from a Google Shopping search.

    Returns list of dicts with name, price, currency, seller, and availability.
    """
    url = "https://www.google.com/search"
    params = {
        "q": query,
        "tbm": "shop",
        "hl": language,
        "gl": country,
        "num": "40",
    }
    headers = {
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/124.0.0.0 Safari/537.36"
        ),
        "Accept-Language": f"{language}-{country.upper()},{language};q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://www.google.com/",
        "DNT": "1",
    }

    transport_kwargs = {}
    if proxy:
        transport_kwargs["proxy"] = proxy

    with httpx.Client(
        headers=headers,
        follow_redirects=True,
        timeout=20,
        **transport_kwargs,
    ) as client:
        resp = client.get(url, params=params)
        resp.raise_for_status()

    soup = BeautifulSoup(resp.text, "html.parser")
    results = []

    # Primary: JSON-LD extraction
    for tag in soup.find_all("script", type="application/ld+json"):
        try:
            data = json.loads(tag.string or "")
        except (json.JSONDecodeError, TypeError):
            continue

        if not isinstance(data, dict):
            continue

        item_type = data.get("@type", "")

        # ItemList containing products
        if item_type == "ItemList":
            for item in data.get("itemListElement", []):
                product = item.get("item", {})
                offers = product.get("offers", {})
                if isinstance(offers, list):
                    offers = offers[0]
                results.append({
                    "name": product.get("name"),
                    "url": product.get("url"),
                    "price": offers.get("price"),
                    "currency": offers.get("priceCurrency"),
                    "seller": (offers.get("seller") or {}).get("name"),
                    "availability": (offers.get("availability") or "").split("/")[-1],
                    "source": "json-ld-itemlist",
                })

        # Direct Product schema
        elif item_type == "Product":
            offers = data.get("offers", {})
            if isinstance(offers, list):
                for offer in offers:
                    results.append({
                        "name": data.get("name"),
                        "url": data.get("url") or data.get("@id"),
                        "price": offer.get("price"),
                        "currency": offer.get("priceCurrency"),
                        "seller": (offer.get("seller") or {}).get("name"),
                        "availability": (offer.get("availability") or "").split("/")[-1],
                        "source": "json-ld-product",
                    })
            else:
                results.append({
                    "name": data.get("name"),
                    "url": data.get("url") or data.get("@id"),
                    "price": offers.get("price"),
                    "currency": offers.get("priceCurrency"),
                    "seller": (offers.get("seller") or {}).get("name"),
                    "availability": (offers.get("availability") or "").split("/")[-1],
                    "source": "json-ld-product",
                })

    # Fallback: microdata extraction if JSON-LD was empty
    if not results:
        for item in soup.find_all(itemprop="offers"):
            price_el = item.find(itemprop="price")
            currency_el = item.find(itemprop="priceCurrency")
            name_el = soup.find(itemprop="name")
            if price_el:
                results.append({
                    "name": name_el.get_text(strip=True) if name_el else None,
                    "price": price_el.get("content") or price_el.get_text(strip=True),
                    "currency": (currency_el.get("content") if currency_el else None),
                    "seller": None,
                    "availability": None,
                    "source": "microdata",
                })

    return results


def search_product_prices(
    product_name: str,
    regions: list[tuple[str, str]],
    proxy: str = None,
) -> dict:
    """
    Search Google Shopping in multiple regions and compare prices.

    regions: list of (country_code, language) tuples
    Returns dict mapping region to list of results.
    """
    regional_results = {}

    for country, language in regions:
        results = extract_google_shopping(product_name, proxy, country, language)
        regional_results[f"{country}_{language}"] = results
        time.sleep(2.0)  # avoid triggering Google's rate limits

    return regional_results

CamelCamelCamel: Price History Extraction

CamelCamelCamel tracks Amazon price history and exposes chart data through predictable URL structures:

import httpx
import re
import json
from datetime import datetime, timezone


def get_camel_price_history(
    asin: str,
    proxy: str = None,
    store: str = "com",
) -> dict:
    """
    Extract price history data for an Amazon ASIN from CamelCamelCamel.

    asin: Amazon ASIN (e.g., 'B0CHWRXH8B')
    store: Amazon store suffix ('com', 'co.uk', 'de', etc.)
    Returns dict mapping series name to list of {timestamp_ms, price} dicts.
    """
    # Map Amazon store to CamelCamelCamel domain
    domain_map = {
        "com": "camelcamelcamel.com",
        "co.uk": "uk.camelcamelcamel.com",
        "de": "de.camelcamelcamel.com",
        "co.jp": "jp.camelcamelcamel.com",
        "ca": "ca.camelcamelcamel.com",
        "com.au": "au.camelcamelcamel.com",
        "fr": "fr.camelcamelcamel.com",
        "it": "it.camelcamelcamel.com",
        "es": "es.camelcamelcamel.com",
    }
    domain = domain_map.get(store, "camelcamelcamel.com")
    url = f"https://{domain}/product/{asin}"

    headers = {
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/124.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
        "Referer": f"https://{domain}/",
        "Accept": "text/html,application/xhtml+xml",
    }

    client_kwargs = {}
    if proxy:
        client_kwargs["transport"] = httpx.HTTPTransport(proxy=proxy)

    with httpx.Client(
        headers=headers,
        follow_redirects=True,
        timeout=20,
        **client_kwargs,
    ) as client:
        resp = client.get(url)
        resp.raise_for_status()

    # CamelCamelCamel embeds Highcharts data as inline JS
    # Pattern: {"data":[[timestamp_ms, price], ...], "name": "label"}
    chart_data_pattern = re.compile(
        r'\{"data"\s*:\s*(\[\[.*?\]\])\s*,\s*.*?"name"\s*:\s*"([^"]+)"',
        re.DOTALL,
    )

    # Fallback pattern for different JS structure
    series_pattern = re.compile(
        r'"series"\s*:\s*\[(.*?)\]',
        re.DOTALL,
    )

    result = {}
    html = resp.text

    # Try direct data extraction
    for match in chart_data_pattern.finditer(html):
        data_json = match.group(1)
        name = match.group(2)
        try:
            points = json.loads(data_json)
            result[name] = [
                {
                    "timestamp_ms": p[0],
                    "price": p[1],
                    "date": datetime.fromtimestamp(p[0] / 1000, tz=timezone.utc).isoformat()[:10],
                }
                for p in points
                if len(p) == 2 and p[1] is not None
            ]
        except (json.JSONDecodeError, IndexError, TypeError):
            continue

    # If nothing found, try to extract product name at minimum
    if not result:
        title_match = re.search(r'<title>([^<]+)</title>', html)
        if title_match:
            result["_product_title"] = title_match.group(1).strip()
        result["_extraction_failed"] = True

    return result


def extract_price_stats_from_history(history: dict) -> dict:
    """
    Compute summary statistics from CamelCamelCamel price history.

    Returns min, max, avg, current, and 90-day average prices per series.
    """
    from datetime import datetime, timedelta, timezone

    stats = {}
    cutoff_90d = (datetime.now(timezone.utc) - timedelta(days=90)).timestamp() * 1000

    for series_name, points in history.items():
        if series_name.startswith("_") or not points:
            continue

        prices = [p["price"] for p in points if p["price"] is not None]
        recent_prices = [
            p["price"] for p in points
            if p["price"] is not None and p["timestamp_ms"] >= cutoff_90d
        ]

        if not prices:
            continue

        stats[series_name] = {
            "min_all_time": min(prices),
            "max_all_time": max(prices),
            "avg_all_time": round(sum(prices) / len(prices), 2),
            "current_price": prices[-1],
            "avg_90_days": round(sum(recent_prices) / len(recent_prices), 2) if recent_prices else None,
            "data_points": len(prices),
            "first_recorded": points[0]["date"],
            "last_recorded": points[-1]["date"],
        }

    return stats

Dynamic Pricing Detection with Playwright

Detect regional price differences by loading the same product from multiple proxy locations simultaneously:

import asyncio
import random
from playwright.async_api import async_playwright


async def get_price_in_context(
    url: str,
    proxy_server: str,
    locale: str = "en-US",
    timezone_id: str = "America/New_York",
    currency_symbol: str = "$",
) -> dict:
    """
    Load a product page from a specific proxy and extract price.

    Returns dict with price, currency, and any detected variant info.
    """
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled", "--no-sandbox"],
            proxy={"server": proxy_server},
        )
        context = await browser.new_context(
            locale=locale,
            timezone_id=timezone_id,
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/124.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1440, "height": 900},
        )

        await context.add_init_script(
            "Object.defineProperty(navigator, 'webdriver', {get: () => undefined});"
        )

        page = await context.new_page()

        try:
            await page.goto(url, wait_until="domcontentloaded", timeout=30000)
            await asyncio.sleep(random.uniform(1.5, 3.0))

            # Try JSON-LD first (most reliable)
            price_data = await page.evaluate("""
                () => {
                    const scripts = document.querySelectorAll('script[type="application/ld+json"]');
                    for (const s of scripts) {
                        try {
                            const d = JSON.parse(s.textContent);
                            if (d['@type'] === 'Product') {
                                const offers = Array.isArray(d.offers)
                                    ? d.offers[0] : d.offers;
                                if (offers && offers.price) {
                                    return {
                                        price: String(offers.price),
                                        currency: offers.priceCurrency || null,
                                        availability: offers.availability || null,
                                        method: 'json-ld',
                                    };
                                }
                            }
                        } catch (e) {}
                    }
                    return null;
                }
            """)

            # Fallback to common price selectors
            if not price_data:
                price_data = await page.evaluate(f"""
                    () => {{
                        const selectors = [
                            '[data-testid="price"]',
                            '[class*="price-current"]',
                            '[class*="current-price"]',
                            'span[class*="price"]',
                            'meta[itemprop="price"]',
                        ];
                        for (const sel of selectors) {{
                            const el = document.querySelector(sel);
                            if (el) {{
                                const text = el.getAttribute('content') || el.textContent;
                                const match = text.match(/[\\d,]+\\.?\\d*/);
                                if (match) {{
                                    return {{
                                        price: match[0].replace(',', ''),
                                        currency: null,
                                        method: 'dom-selector',
                                    }};
                                }}
                            }}
                        }}
                        return null;
                    }}
                """)

        except Exception as e:
            price_data = {"error": str(e)}
        finally:
            await browser.close()

        return {
            "locale": locale,
            "timezone": timezone_id,
            "proxy": proxy_server[:30] + "...",
            "url": url,
            **(price_data or {"price": None, "currency": None}),
        }


async def detect_dynamic_pricing(
    product_url: str,
    proxy_configs: list[dict],
    delay_between: float = 1.0,
) -> list[dict]:
    """
    Check the same product URL from multiple geographic locations.

    proxy_configs: list of dicts with 'proxy', 'locale', 'timezone' keys
    Returns list of results, one per proxy config.
    """
    tasks = []
    for cfg in proxy_configs:
        task = get_price_in_context(
            product_url,
            cfg["proxy"],
            cfg.get("locale", "en-US"),
            cfg.get("timezone", "UTC"),
        )
        tasks.append(task)

    # Run all contexts simultaneously for a true snapshot comparison
    results = await asyncio.gather(*tasks, return_exceptions=True)

    clean = []
    for i, r in enumerate(results):
        if isinstance(r, Exception):
            clean.append({"error": str(r), **proxy_configs[i]})
        else:
            clean.append(r)

    return clean


def analyze_price_variance(results: list[dict]) -> dict:
    """
    Analyze dynamic pricing across regional results.

    Returns summary including min/max/spread and whether dynamic pricing is detected.
    """
    prices = []
    for r in results:
        if r.get("price") and not r.get("error"):
            try:
                price_str = str(r["price"]).replace(",", "").replace("$", "").replace("£", "").strip()
                prices.append(float(price_str))
            except (ValueError, TypeError):
                pass

    if len(prices) < 2:
        return {"insufficient_data": True, "results": results}

    spread = max(prices) - min(prices)
    spread_pct = (spread / min(prices)) * 100

    return {
        "min_price": min(prices),
        "max_price": max(prices),
        "spread": round(spread, 2),
        "spread_pct": round(spread_pct, 2),
        "dynamic_pricing_detected": spread_pct > 2.0,
        "result_count": len(prices),
        "results": results,
    }


# Example usage
proxy_configs = [
    {
        "proxy": "http://USER:[email protected]:9000",
        "locale": "en-US",
        "timezone": "America/New_York",
        "label": "US",
    },
    {
        "proxy": "http://USER:[email protected]:9000",
        "locale": "en-GB",
        "timezone": "Europe/London",
        "label": "UK",
    },
    {
        "proxy": "http://USER:[email protected]:9000",
        "locale": "de-DE",
        "timezone": "Europe/Berlin",
        "label": "DE",
    },
]

# url = "https://www.amazon.com/dp/B0CHWRXH8B"
# regional_data = asyncio.run(detect_dynamic_pricing(url, proxy_configs))
# variance = analyze_price_variance(regional_data)
# print(f"Dynamic pricing: {variance['dynamic_pricing_detected']}, spread: {variance['spread_pct']:.1f}%")

Multi-Retailer Price Comparison

Compare the same product across multiple retailers:

import httpx
from bs4 import BeautifulSoup
import re
import time


def extract_bestbuy_price(product_url: str, proxy: str = None) -> dict:
    """Extract price from a Best Buy product page via JSON-LD."""
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9",
    }
    client_kwargs = {}
    if proxy:
        client_kwargs["transport"] = httpx.HTTPTransport(proxy=proxy)

    with httpx.Client(headers=headers, follow_redirects=True, timeout=20, **client_kwargs) as c:
        resp = c.get(product_url)

    soup = BeautifulSoup(resp.text, "html.parser")

    for tag in soup.find_all("script", type="application/ld+json"):
        try:
            data = json.loads(tag.string or "")
            if data.get("@type") == "Product":
                offers = data.get("offers", {})
                if isinstance(offers, list):
                    offers = offers[0]
                return {
                    "retailer": "bestbuy",
                    "price": offers.get("price"),
                    "currency": offers.get("priceCurrency"),
                    "availability": (offers.get("availability") or "").split("/")[-1],
                    "url": product_url,
                }
        except Exception:
            continue

    return {"retailer": "bestbuy", "price": None, "url": product_url, "error": "not_found"}


def compare_retailers(
    product_urls: dict,
    proxy: str = None,
    delay: float = 2.0,
) -> list[dict]:
    """
    Fetch prices from multiple retailers for the same product.

    product_urls: dict mapping retailer name to product URL
    """
    results = []
    for retailer, url in product_urls.items():
        try:
            if "bestbuy" in url:
                result = extract_bestbuy_price(url, proxy)
            else:
                result = {"retailer": retailer, "url": url, "price": "manual_check"}

            results.append(result)
            time.sleep(delay)
        except Exception as e:
            results.append({"retailer": retailer, "url": url, "error": str(e)})

    return sorted(
        [r for r in results if r.get("price") and r.get("price") != "manual_check"],
        key=lambda x: float(str(x["price"]).replace(",", "")) if x.get("price") else float("inf"),
    )

SQLite Storage Schema

A minimal schema handling multi-region, multi-source, time-series price data:

import sqlite3
from datetime import datetime, timezone


def init_price_db(db_path: str = "prices.db") -> sqlite3.Connection:
    """Initialize the price tracking database."""
    conn = sqlite3.connect(db_path)

    conn.executescript("""
        CREATE TABLE IF NOT EXISTS products (
            asin            TEXT,
            retailer        TEXT,
            name            TEXT,
            category        TEXT,
            url             TEXT,
            added_at        TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            PRIMARY KEY (asin, retailer)
        );

        CREATE TABLE IF NOT EXISTS price_snapshots (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            asin            TEXT NOT NULL,
            retailer        TEXT NOT NULL,
            region          TEXT NOT NULL,
            price           REAL,
            currency        TEXT DEFAULT 'USD',
            seller          TEXT,
            availability    TEXT,
            source          TEXT,
            scraped_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS price_history_camel (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            asin            TEXT NOT NULL,
            store           TEXT NOT NULL,
            series          TEXT NOT NULL,
            price_date      TEXT NOT NULL,
            price           REAL,
            timestamp_ms    INTEGER,
            UNIQUE (asin, store, series, price_date)
        );

        CREATE TABLE IF NOT EXISTS dynamic_pricing_checks (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            asin            TEXT,
            product_url     TEXT,
            checked_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            min_price       REAL,
            max_price       REAL,
            spread_pct      REAL,
            dynamic_detected INTEGER,
            raw_results     TEXT    -- JSON
        );

        CREATE INDEX IF NOT EXISTS idx_snapshots_asin ON price_snapshots(asin, scraped_at);
        CREATE INDEX IF NOT EXISTS idx_snapshots_region ON price_snapshots(asin, region);
        CREATE INDEX IF NOT EXISTS idx_camel_asin ON price_history_camel(asin, store);
        CREATE INDEX IF NOT EXISTS idx_dynamic_asin ON dynamic_pricing_checks(asin);
    """)

    conn.commit()
    return conn


def insert_snapshot(
    conn: sqlite3.Connection,
    asin: str,
    retailer: str,
    region: str,
    price: float,
    currency: str = "USD",
    seller: str = None,
    availability: str = None,
    source: str = "live_scrape",
):
    """Insert a single price observation."""
    conn.execute("""
        INSERT INTO price_snapshots
            (asin, retailer, region, price, currency, seller, availability, source, scraped_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        asin, retailer, region, price, currency, seller, availability, source,
        datetime.now(timezone.utc).isoformat(),
    ))
    conn.commit()


def bulk_insert_camel_history(
    conn: sqlite3.Connection,
    asin: str,
    store: str,
    history: dict,
):
    """Bulk insert CamelCamelCamel price history into the database."""
    rows = []
    for series, points in history.items():
        if series.startswith("_"):
            continue
        for p in points:
            if p.get("price") is not None:
                rows.append((asin, store, series, p["date"], p["price"], p["timestamp_ms"]))

    conn.executemany("""
        INSERT OR IGNORE INTO price_history_camel
            (asin, store, series, price_date, price, timestamp_ms)
        VALUES (?, ?, ?, ?, ?, ?)
    """, rows)
    conn.commit()
    return len(rows)


def get_price_variance_report(
    conn: sqlite3.Connection,
    asin: str,
    days_back: int = 30,
) -> list:
    """Show price statistics per region for an ASIN over the last N days."""
    return conn.execute("""
        SELECT
            region,
            currency,
            COUNT(*) as snapshots,
            MIN(price) as min_price,
            MAX(price) as max_price,
            ROUND(AVG(price), 2) as avg_price,
            MIN(scraped_at) as first_seen,
            MAX(scraped_at) as last_seen
        FROM price_snapshots
        WHERE asin = ?
          AND scraped_at >= datetime('now', ? || ' days')
          AND price IS NOT NULL
        GROUP BY region, currency
        ORDER BY avg_price ASC
    """, (asin, f"-{days_back}")).fetchall()


def get_price_drop_alerts(
    conn: sqlite3.Connection,
    threshold_pct: float = 10.0,
) -> list:
    """Find ASINs where latest price is below 90-day average by threshold."""
    return conn.execute("""
        WITH recent AS (
            SELECT asin, region, AVG(price) as avg_90d
            FROM price_snapshots
            WHERE scraped_at >= datetime('now', '-90 days')
            GROUP BY asin, region
        ),
        latest AS (
            SELECT asin, region, price, scraped_at,
                   ROW_NUMBER() OVER (PARTITION BY asin, region ORDER BY scraped_at DESC) as rn
            FROM price_snapshots
        )
        SELECT l.asin, l.region, l.price as current_price, r.avg_90d,
               ROUND((r.avg_90d - l.price) / r.avg_90d * 100, 1) as drop_pct
        FROM latest l
        JOIN recent r ON l.asin = r.asin AND l.region = r.region
        WHERE l.rn = 1
          AND r.avg_90d > 0
          AND (r.avg_90d - l.price) / r.avg_90d * 100 >= ?
        ORDER BY drop_pct DESC
    """, (threshold_pct,)).fetchall()

Anti-Bot Measures and Proxy Strategy

Google Shopping runs behind Google's in-house bot detection. CamelCamelCamel uses Cloudflare. Major e-commerce sites layer TLS fingerprinting, behavioral analysis, and IP reputation scoring.

Practical countermeasures:

TLS fingerprint matching. httpx with default settings presents a Python TLS handshake, not a Chrome one. Tools like curl_cffi or using Playwright render with a real Chrome instance and automatically present a matching TLS fingerprint. For anything that checks TLS (Amazon, Google Shopping), use curl_cffi or Playwright.

Request header consistency. Set Accept-Language, Accept-Encoding, and Referer headers consistently. A request with Chrome User-Agent but no Accept-Language header is a red flag.

Retry-After compliance. Hammering a 429 response accelerates your IP into a block list. Read the Retry-After header and honor it.

IP rotation cadence. For light scraping (personal research, daily price checks), datacenter proxies work on CamelCamelCamel but get blocked on Google Shopping and Amazon within minutes. Residential proxies are required for those.

For price comparison specifically, residential proxy rotation is not optional — it is the primary mechanism for accurate multi-region price detection. A German Amazon price is meaningless if the request comes from a US datacenter IP, because Amazon's geo-detection will serve the US price regardless of locale headers.

ThorData maintains geo-segmented residential pools across 190+ countries, which means you can pin requests to specific countries or cities. For dynamic pricing detection, this lets you request the same product URL from a residential IP in New York, London, and Berlin simultaneously and compare responses with confidence that the IP origin is genuine.

import httpx

# httpx client with ThorData proxy
PROXY_URL = "http://USER:[email protected]:9000"
# Country-specific routing (check ThorData dashboard for exact suffix format)
US_PROXY = "http://USER-country-us:[email protected]:9000"
UK_PROXY = "http://USER-country-gb:[email protected]:9000"
DE_PROXY = "http://USER-country-de:[email protected]:9000"

# For httpx
client = httpx.Client(
    transport=httpx.HTTPTransport(proxy=PROXY_URL),
    timeout=20,
)

# For Playwright (per-context)
proxy_config = {
    "server": "http://proxy.thordata.com:9000",
    "username": "USER",
    "password": "PASS",
}

Complete Pipeline Example

An end-to-end run for a single ASIN combining history and live regional prices:

import asyncio
import json

ASIN = "B0CHWRXH8B"
PROXY = "http://USER:[email protected]:9000"

conn = init_price_db("prices.db")

# 1. Pull CamelCamelCamel historical data
history = get_camel_price_history(ASIN, proxy=PROXY, store="com")
inserted = bulk_insert_camel_history(conn, ASIN, "com", history)
print(f"Historical: {inserted} data points stored")

camel_stats = extract_price_stats_from_history(history)
for series, stats in camel_stats.items():
    print(f"  {series}: ${stats['min_all_time']} - ${stats['max_all_time']} "
          f"(avg ${stats['avg_all_time']})")

# 2. Detect current dynamic pricing
proxy_configs = [
    {"proxy": PROXY, "locale": "en-US", "timezone": "America/New_York"},
    {"proxy": PROXY, "locale": "en-GB", "timezone": "Europe/London"},
]
url = f"https://www.amazon.com/dp/{ASIN}"
live = asyncio.run(detect_dynamic_pricing(url, proxy_configs))
variance = analyze_price_variance(live)

# 3. Store live snapshots
for r in live:
    if r.get("price") and not r.get("error"):
        try:
            insert_snapshot(
                conn, ASIN, "amazon", r["locale"][:2].upper(),
                float(str(r["price"]).replace(",", "")),
                currency=r.get("currency", "USD"),
                source="playwright",
            )
        except (ValueError, TypeError):
            pass

# 4. Report
print(f"\nDynamic pricing detected: {variance.get('dynamic_pricing_detected', False)}")
print(f"Price spread: ${variance.get('spread', 0):.2f} ({variance.get('spread_pct', 0):.1f}%)")

for row in get_price_variance_report(conn, ASIN):
    print(f"  {row[0]:5s} | {row[4]:>8.2f} avg | {row[3]:>8.2f} min | {row[4]:>8.2f} max")

Legal Notes

Google's Terms of Service prohibit automated scraping of search results. Amazon's Conditions of Use prohibit scraping product data. CamelCamelCamel has its own ToS and itself scrapes Amazon. The techniques documented here are for educational purposes — personal research, price alerts for personal use, and academic analysis of pricing patterns. For commercial applications at scale, evaluate official data partnerships (Amazon Product Advertising API, Google Shopping API) alongside the scraping approach.