How to Scrape Price Comparison Sites in 2026: Google Shopping, CamelCamelCamel & Dynamic Pricing
How to Scrape Price Comparison Sites in 2026: Google Shopping, CamelCamelCamel & Dynamic Pricing
Price data is some of the most valuable structured information on the web. Whether you are building a deal alert tool, tracking competitor pricing, analyzing historical trends, or detecting dynamic pricing across regions, the extraction patterns for Google Shopping, CamelCamelCamel, and major e-commerce sites are stable enough to build production pipelines around — if you handle the anti-bot layer correctly.
This post covers the full stack: JSON-LD extraction from Google Shopping, CamelCamelCamel price history parsing, dynamic pricing detection with Playwright, multi-region comparison, SQLite storage schema, and why geo-diverse proxies are non-negotiable for accurate price comparison.
Why Price Scraping Is Harder Than Most Data Tasks
Price data has properties that make scraping unusually difficult:
Dynamic rendering. Most e-commerce sites compute prices client-side after loading. The HTML delivered to your HTTP client contains placeholders or stale cache values. Actual prices require JavaScript execution.
Geographic segmentation. The same product can have meaningfully different prices depending on the country the request originates from. Amazon, Expedia, and consumer electronics retailers all implement geo-based pricing. A scraper that ignores this collects incomplete data.
Session state. Many retailers adjust prices based on logged-in state, browsing history, or loyalty program membership. Prices seen by a fresh anonymous session differ from prices seen by a returning customer. Some retailers have been documented showing higher prices to users who previously searched for the same item.
Structural instability. E-commerce page layouts change frequently, especially around sale events. Hard-coded selectors break during high-traffic periods when retailers modify markup for performance or A/B testing.
Active bot detection. Google Shopping, Amazon, and major retailers invest heavily in anti-bot infrastructure. TLS fingerprinting, behavioral biometrics, IP reputation databases, and cookie chain validation are all active.
What Price Data Is Worth Extracting
Not all price data is equal. The most useful fields per product:
- Current price and currency — the offer price, not the crossed-out "was" price
- Merchant/seller name — for multi-seller marketplace comparisons
- Stock status — out-of-stock items distort price averages
- Price history timestamps — a single snapshot is nearly useless for trend detection
- Geographic price variants — the same product often has a different price in US, UK, and DE storefronts
- Sale or promotional flags — differentiate organic price movement from promotional markdowns
- Shipping cost — listed price without shipping is misleading for comparison
Google Shopping: JSON-LD Extraction
Google Shopping search result pages embed application/ld+json blocks and itemscope microdata. The JSON-LD approach is more reliable than CSS selectors because it is structurally enforced by Google's own schema requirements.
import httpx
import json
import re
import time
from bs4 import BeautifulSoup
from typing import Optional
def extract_google_shopping(
query: str,
proxy: str = None,
country: str = "us",
language: str = "en",
) -> list[dict]:
"""
Extract product listings from a Google Shopping search.
Returns list of dicts with name, price, currency, seller, and availability.
"""
url = "https://www.google.com/search"
params = {
"q": query,
"tbm": "shop",
"hl": language,
"gl": country,
"num": "40",
}
headers = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept-Language": f"{language}-{country.upper()},{language};q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.google.com/",
"DNT": "1",
}
transport_kwargs = {}
if proxy:
transport_kwargs["proxy"] = proxy
with httpx.Client(
headers=headers,
follow_redirects=True,
timeout=20,
**transport_kwargs,
) as client:
resp = client.get(url, params=params)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
results = []
# Primary: JSON-LD extraction
for tag in soup.find_all("script", type="application/ld+json"):
try:
data = json.loads(tag.string or "")
except (json.JSONDecodeError, TypeError):
continue
if not isinstance(data, dict):
continue
item_type = data.get("@type", "")
# ItemList containing products
if item_type == "ItemList":
for item in data.get("itemListElement", []):
product = item.get("item", {})
offers = product.get("offers", {})
if isinstance(offers, list):
offers = offers[0]
results.append({
"name": product.get("name"),
"url": product.get("url"),
"price": offers.get("price"),
"currency": offers.get("priceCurrency"),
"seller": (offers.get("seller") or {}).get("name"),
"availability": (offers.get("availability") or "").split("/")[-1],
"source": "json-ld-itemlist",
})
# Direct Product schema
elif item_type == "Product":
offers = data.get("offers", {})
if isinstance(offers, list):
for offer in offers:
results.append({
"name": data.get("name"),
"url": data.get("url") or data.get("@id"),
"price": offer.get("price"),
"currency": offer.get("priceCurrency"),
"seller": (offer.get("seller") or {}).get("name"),
"availability": (offer.get("availability") or "").split("/")[-1],
"source": "json-ld-product",
})
else:
results.append({
"name": data.get("name"),
"url": data.get("url") or data.get("@id"),
"price": offers.get("price"),
"currency": offers.get("priceCurrency"),
"seller": (offers.get("seller") or {}).get("name"),
"availability": (offers.get("availability") or "").split("/")[-1],
"source": "json-ld-product",
})
# Fallback: microdata extraction if JSON-LD was empty
if not results:
for item in soup.find_all(itemprop="offers"):
price_el = item.find(itemprop="price")
currency_el = item.find(itemprop="priceCurrency")
name_el = soup.find(itemprop="name")
if price_el:
results.append({
"name": name_el.get_text(strip=True) if name_el else None,
"price": price_el.get("content") or price_el.get_text(strip=True),
"currency": (currency_el.get("content") if currency_el else None),
"seller": None,
"availability": None,
"source": "microdata",
})
return results
def search_product_prices(
product_name: str,
regions: list[tuple[str, str]],
proxy: str = None,
) -> dict:
"""
Search Google Shopping in multiple regions and compare prices.
regions: list of (country_code, language) tuples
Returns dict mapping region to list of results.
"""
regional_results = {}
for country, language in regions:
results = extract_google_shopping(product_name, proxy, country, language)
regional_results[f"{country}_{language}"] = results
time.sleep(2.0) # avoid triggering Google's rate limits
return regional_results
CamelCamelCamel: Price History Extraction
CamelCamelCamel tracks Amazon price history and exposes chart data through predictable URL structures:
import httpx
import re
import json
from datetime import datetime, timezone
def get_camel_price_history(
asin: str,
proxy: str = None,
store: str = "com",
) -> dict:
"""
Extract price history data for an Amazon ASIN from CamelCamelCamel.
asin: Amazon ASIN (e.g., 'B0CHWRXH8B')
store: Amazon store suffix ('com', 'co.uk', 'de', etc.)
Returns dict mapping series name to list of {timestamp_ms, price} dicts.
"""
# Map Amazon store to CamelCamelCamel domain
domain_map = {
"com": "camelcamelcamel.com",
"co.uk": "uk.camelcamelcamel.com",
"de": "de.camelcamelcamel.com",
"co.jp": "jp.camelcamelcamel.com",
"ca": "ca.camelcamelcamel.com",
"com.au": "au.camelcamelcamel.com",
"fr": "fr.camelcamelcamel.com",
"it": "it.camelcamelcamel.com",
"es": "es.camelcamelcamel.com",
}
domain = domain_map.get(store, "camelcamelcamel.com")
url = f"https://{domain}/product/{asin}"
headers = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
"Referer": f"https://{domain}/",
"Accept": "text/html,application/xhtml+xml",
}
client_kwargs = {}
if proxy:
client_kwargs["transport"] = httpx.HTTPTransport(proxy=proxy)
with httpx.Client(
headers=headers,
follow_redirects=True,
timeout=20,
**client_kwargs,
) as client:
resp = client.get(url)
resp.raise_for_status()
# CamelCamelCamel embeds Highcharts data as inline JS
# Pattern: {"data":[[timestamp_ms, price], ...], "name": "label"}
chart_data_pattern = re.compile(
r'\{"data"\s*:\s*(\[\[.*?\]\])\s*,\s*.*?"name"\s*:\s*"([^"]+)"',
re.DOTALL,
)
# Fallback pattern for different JS structure
series_pattern = re.compile(
r'"series"\s*:\s*\[(.*?)\]',
re.DOTALL,
)
result = {}
html = resp.text
# Try direct data extraction
for match in chart_data_pattern.finditer(html):
data_json = match.group(1)
name = match.group(2)
try:
points = json.loads(data_json)
result[name] = [
{
"timestamp_ms": p[0],
"price": p[1],
"date": datetime.fromtimestamp(p[0] / 1000, tz=timezone.utc).isoformat()[:10],
}
for p in points
if len(p) == 2 and p[1] is not None
]
except (json.JSONDecodeError, IndexError, TypeError):
continue
# If nothing found, try to extract product name at minimum
if not result:
title_match = re.search(r'<title>([^<]+)</title>', html)
if title_match:
result["_product_title"] = title_match.group(1).strip()
result["_extraction_failed"] = True
return result
def extract_price_stats_from_history(history: dict) -> dict:
"""
Compute summary statistics from CamelCamelCamel price history.
Returns min, max, avg, current, and 90-day average prices per series.
"""
from datetime import datetime, timedelta, timezone
stats = {}
cutoff_90d = (datetime.now(timezone.utc) - timedelta(days=90)).timestamp() * 1000
for series_name, points in history.items():
if series_name.startswith("_") or not points:
continue
prices = [p["price"] for p in points if p["price"] is not None]
recent_prices = [
p["price"] for p in points
if p["price"] is not None and p["timestamp_ms"] >= cutoff_90d
]
if not prices:
continue
stats[series_name] = {
"min_all_time": min(prices),
"max_all_time": max(prices),
"avg_all_time": round(sum(prices) / len(prices), 2),
"current_price": prices[-1],
"avg_90_days": round(sum(recent_prices) / len(recent_prices), 2) if recent_prices else None,
"data_points": len(prices),
"first_recorded": points[0]["date"],
"last_recorded": points[-1]["date"],
}
return stats
Dynamic Pricing Detection with Playwright
Detect regional price differences by loading the same product from multiple proxy locations simultaneously:
import asyncio
import random
from playwright.async_api import async_playwright
async def get_price_in_context(
url: str,
proxy_server: str,
locale: str = "en-US",
timezone_id: str = "America/New_York",
currency_symbol: str = "$",
) -> dict:
"""
Load a product page from a specific proxy and extract price.
Returns dict with price, currency, and any detected variant info.
"""
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled", "--no-sandbox"],
proxy={"server": proxy_server},
)
context = await browser.new_context(
locale=locale,
timezone_id=timezone_id,
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
viewport={"width": 1440, "height": 900},
)
await context.add_init_script(
"Object.defineProperty(navigator, 'webdriver', {get: () => undefined});"
)
page = await context.new_page()
try:
await page.goto(url, wait_until="domcontentloaded", timeout=30000)
await asyncio.sleep(random.uniform(1.5, 3.0))
# Try JSON-LD first (most reliable)
price_data = await page.evaluate("""
() => {
const scripts = document.querySelectorAll('script[type="application/ld+json"]');
for (const s of scripts) {
try {
const d = JSON.parse(s.textContent);
if (d['@type'] === 'Product') {
const offers = Array.isArray(d.offers)
? d.offers[0] : d.offers;
if (offers && offers.price) {
return {
price: String(offers.price),
currency: offers.priceCurrency || null,
availability: offers.availability || null,
method: 'json-ld',
};
}
}
} catch (e) {}
}
return null;
}
""")
# Fallback to common price selectors
if not price_data:
price_data = await page.evaluate(f"""
() => {{
const selectors = [
'[data-testid="price"]',
'[class*="price-current"]',
'[class*="current-price"]',
'span[class*="price"]',
'meta[itemprop="price"]',
];
for (const sel of selectors) {{
const el = document.querySelector(sel);
if (el) {{
const text = el.getAttribute('content') || el.textContent;
const match = text.match(/[\\d,]+\\.?\\d*/);
if (match) {{
return {{
price: match[0].replace(',', ''),
currency: null,
method: 'dom-selector',
}};
}}
}}
}}
return null;
}}
""")
except Exception as e:
price_data = {"error": str(e)}
finally:
await browser.close()
return {
"locale": locale,
"timezone": timezone_id,
"proxy": proxy_server[:30] + "...",
"url": url,
**(price_data or {"price": None, "currency": None}),
}
async def detect_dynamic_pricing(
product_url: str,
proxy_configs: list[dict],
delay_between: float = 1.0,
) -> list[dict]:
"""
Check the same product URL from multiple geographic locations.
proxy_configs: list of dicts with 'proxy', 'locale', 'timezone' keys
Returns list of results, one per proxy config.
"""
tasks = []
for cfg in proxy_configs:
task = get_price_in_context(
product_url,
cfg["proxy"],
cfg.get("locale", "en-US"),
cfg.get("timezone", "UTC"),
)
tasks.append(task)
# Run all contexts simultaneously for a true snapshot comparison
results = await asyncio.gather(*tasks, return_exceptions=True)
clean = []
for i, r in enumerate(results):
if isinstance(r, Exception):
clean.append({"error": str(r), **proxy_configs[i]})
else:
clean.append(r)
return clean
def analyze_price_variance(results: list[dict]) -> dict:
"""
Analyze dynamic pricing across regional results.
Returns summary including min/max/spread and whether dynamic pricing is detected.
"""
prices = []
for r in results:
if r.get("price") and not r.get("error"):
try:
price_str = str(r["price"]).replace(",", "").replace("$", "").replace("£", "").strip()
prices.append(float(price_str))
except (ValueError, TypeError):
pass
if len(prices) < 2:
return {"insufficient_data": True, "results": results}
spread = max(prices) - min(prices)
spread_pct = (spread / min(prices)) * 100
return {
"min_price": min(prices),
"max_price": max(prices),
"spread": round(spread, 2),
"spread_pct": round(spread_pct, 2),
"dynamic_pricing_detected": spread_pct > 2.0,
"result_count": len(prices),
"results": results,
}
# Example usage
proxy_configs = [
{
"proxy": "http://USER:[email protected]:9000",
"locale": "en-US",
"timezone": "America/New_York",
"label": "US",
},
{
"proxy": "http://USER:[email protected]:9000",
"locale": "en-GB",
"timezone": "Europe/London",
"label": "UK",
},
{
"proxy": "http://USER:[email protected]:9000",
"locale": "de-DE",
"timezone": "Europe/Berlin",
"label": "DE",
},
]
# url = "https://www.amazon.com/dp/B0CHWRXH8B"
# regional_data = asyncio.run(detect_dynamic_pricing(url, proxy_configs))
# variance = analyze_price_variance(regional_data)
# print(f"Dynamic pricing: {variance['dynamic_pricing_detected']}, spread: {variance['spread_pct']:.1f}%")
Multi-Retailer Price Comparison
Compare the same product across multiple retailers:
import httpx
from bs4 import BeautifulSoup
import re
import time
def extract_bestbuy_price(product_url: str, proxy: str = None) -> dict:
"""Extract price from a Best Buy product page via JSON-LD."""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
}
client_kwargs = {}
if proxy:
client_kwargs["transport"] = httpx.HTTPTransport(proxy=proxy)
with httpx.Client(headers=headers, follow_redirects=True, timeout=20, **client_kwargs) as c:
resp = c.get(product_url)
soup = BeautifulSoup(resp.text, "html.parser")
for tag in soup.find_all("script", type="application/ld+json"):
try:
data = json.loads(tag.string or "")
if data.get("@type") == "Product":
offers = data.get("offers", {})
if isinstance(offers, list):
offers = offers[0]
return {
"retailer": "bestbuy",
"price": offers.get("price"),
"currency": offers.get("priceCurrency"),
"availability": (offers.get("availability") or "").split("/")[-1],
"url": product_url,
}
except Exception:
continue
return {"retailer": "bestbuy", "price": None, "url": product_url, "error": "not_found"}
def compare_retailers(
product_urls: dict,
proxy: str = None,
delay: float = 2.0,
) -> list[dict]:
"""
Fetch prices from multiple retailers for the same product.
product_urls: dict mapping retailer name to product URL
"""
results = []
for retailer, url in product_urls.items():
try:
if "bestbuy" in url:
result = extract_bestbuy_price(url, proxy)
else:
result = {"retailer": retailer, "url": url, "price": "manual_check"}
results.append(result)
time.sleep(delay)
except Exception as e:
results.append({"retailer": retailer, "url": url, "error": str(e)})
return sorted(
[r for r in results if r.get("price") and r.get("price") != "manual_check"],
key=lambda x: float(str(x["price"]).replace(",", "")) if x.get("price") else float("inf"),
)
SQLite Storage Schema
A minimal schema handling multi-region, multi-source, time-series price data:
import sqlite3
from datetime import datetime, timezone
def init_price_db(db_path: str = "prices.db") -> sqlite3.Connection:
"""Initialize the price tracking database."""
conn = sqlite3.connect(db_path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS products (
asin TEXT,
retailer TEXT,
name TEXT,
category TEXT,
url TEXT,
added_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (asin, retailer)
);
CREATE TABLE IF NOT EXISTS price_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
asin TEXT NOT NULL,
retailer TEXT NOT NULL,
region TEXT NOT NULL,
price REAL,
currency TEXT DEFAULT 'USD',
seller TEXT,
availability TEXT,
source TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS price_history_camel (
id INTEGER PRIMARY KEY AUTOINCREMENT,
asin TEXT NOT NULL,
store TEXT NOT NULL,
series TEXT NOT NULL,
price_date TEXT NOT NULL,
price REAL,
timestamp_ms INTEGER,
UNIQUE (asin, store, series, price_date)
);
CREATE TABLE IF NOT EXISTS dynamic_pricing_checks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
asin TEXT,
product_url TEXT,
checked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
min_price REAL,
max_price REAL,
spread_pct REAL,
dynamic_detected INTEGER,
raw_results TEXT -- JSON
);
CREATE INDEX IF NOT EXISTS idx_snapshots_asin ON price_snapshots(asin, scraped_at);
CREATE INDEX IF NOT EXISTS idx_snapshots_region ON price_snapshots(asin, region);
CREATE INDEX IF NOT EXISTS idx_camel_asin ON price_history_camel(asin, store);
CREATE INDEX IF NOT EXISTS idx_dynamic_asin ON dynamic_pricing_checks(asin);
""")
conn.commit()
return conn
def insert_snapshot(
conn: sqlite3.Connection,
asin: str,
retailer: str,
region: str,
price: float,
currency: str = "USD",
seller: str = None,
availability: str = None,
source: str = "live_scrape",
):
"""Insert a single price observation."""
conn.execute("""
INSERT INTO price_snapshots
(asin, retailer, region, price, currency, seller, availability, source, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
asin, retailer, region, price, currency, seller, availability, source,
datetime.now(timezone.utc).isoformat(),
))
conn.commit()
def bulk_insert_camel_history(
conn: sqlite3.Connection,
asin: str,
store: str,
history: dict,
):
"""Bulk insert CamelCamelCamel price history into the database."""
rows = []
for series, points in history.items():
if series.startswith("_"):
continue
for p in points:
if p.get("price") is not None:
rows.append((asin, store, series, p["date"], p["price"], p["timestamp_ms"]))
conn.executemany("""
INSERT OR IGNORE INTO price_history_camel
(asin, store, series, price_date, price, timestamp_ms)
VALUES (?, ?, ?, ?, ?, ?)
""", rows)
conn.commit()
return len(rows)
def get_price_variance_report(
conn: sqlite3.Connection,
asin: str,
days_back: int = 30,
) -> list:
"""Show price statistics per region for an ASIN over the last N days."""
return conn.execute("""
SELECT
region,
currency,
COUNT(*) as snapshots,
MIN(price) as min_price,
MAX(price) as max_price,
ROUND(AVG(price), 2) as avg_price,
MIN(scraped_at) as first_seen,
MAX(scraped_at) as last_seen
FROM price_snapshots
WHERE asin = ?
AND scraped_at >= datetime('now', ? || ' days')
AND price IS NOT NULL
GROUP BY region, currency
ORDER BY avg_price ASC
""", (asin, f"-{days_back}")).fetchall()
def get_price_drop_alerts(
conn: sqlite3.Connection,
threshold_pct: float = 10.0,
) -> list:
"""Find ASINs where latest price is below 90-day average by threshold."""
return conn.execute("""
WITH recent AS (
SELECT asin, region, AVG(price) as avg_90d
FROM price_snapshots
WHERE scraped_at >= datetime('now', '-90 days')
GROUP BY asin, region
),
latest AS (
SELECT asin, region, price, scraped_at,
ROW_NUMBER() OVER (PARTITION BY asin, region ORDER BY scraped_at DESC) as rn
FROM price_snapshots
)
SELECT l.asin, l.region, l.price as current_price, r.avg_90d,
ROUND((r.avg_90d - l.price) / r.avg_90d * 100, 1) as drop_pct
FROM latest l
JOIN recent r ON l.asin = r.asin AND l.region = r.region
WHERE l.rn = 1
AND r.avg_90d > 0
AND (r.avg_90d - l.price) / r.avg_90d * 100 >= ?
ORDER BY drop_pct DESC
""", (threshold_pct,)).fetchall()
Anti-Bot Measures and Proxy Strategy
Google Shopping runs behind Google's in-house bot detection. CamelCamelCamel uses Cloudflare. Major e-commerce sites layer TLS fingerprinting, behavioral analysis, and IP reputation scoring.
Practical countermeasures:
TLS fingerprint matching. httpx with default settings presents a Python TLS handshake, not a Chrome one. Tools like curl_cffi or using Playwright render with a real Chrome instance and automatically present a matching TLS fingerprint. For anything that checks TLS (Amazon, Google Shopping), use curl_cffi or Playwright.
Request header consistency. Set Accept-Language, Accept-Encoding, and Referer headers consistently. A request with Chrome User-Agent but no Accept-Language header is a red flag.
Retry-After compliance. Hammering a 429 response accelerates your IP into a block list. Read the Retry-After header and honor it.
IP rotation cadence. For light scraping (personal research, daily price checks), datacenter proxies work on CamelCamelCamel but get blocked on Google Shopping and Amazon within minutes. Residential proxies are required for those.
For price comparison specifically, residential proxy rotation is not optional — it is the primary mechanism for accurate multi-region price detection. A German Amazon price is meaningless if the request comes from a US datacenter IP, because Amazon's geo-detection will serve the US price regardless of locale headers.
ThorData maintains geo-segmented residential pools across 190+ countries, which means you can pin requests to specific countries or cities. For dynamic pricing detection, this lets you request the same product URL from a residential IP in New York, London, and Berlin simultaneously and compare responses with confidence that the IP origin is genuine.
import httpx
# httpx client with ThorData proxy
PROXY_URL = "http://USER:[email protected]:9000"
# Country-specific routing (check ThorData dashboard for exact suffix format)
US_PROXY = "http://USER-country-us:[email protected]:9000"
UK_PROXY = "http://USER-country-gb:[email protected]:9000"
DE_PROXY = "http://USER-country-de:[email protected]:9000"
# For httpx
client = httpx.Client(
transport=httpx.HTTPTransport(proxy=PROXY_URL),
timeout=20,
)
# For Playwright (per-context)
proxy_config = {
"server": "http://proxy.thordata.com:9000",
"username": "USER",
"password": "PASS",
}
Complete Pipeline Example
An end-to-end run for a single ASIN combining history and live regional prices:
import asyncio
import json
ASIN = "B0CHWRXH8B"
PROXY = "http://USER:[email protected]:9000"
conn = init_price_db("prices.db")
# 1. Pull CamelCamelCamel historical data
history = get_camel_price_history(ASIN, proxy=PROXY, store="com")
inserted = bulk_insert_camel_history(conn, ASIN, "com", history)
print(f"Historical: {inserted} data points stored")
camel_stats = extract_price_stats_from_history(history)
for series, stats in camel_stats.items():
print(f" {series}: ${stats['min_all_time']} - ${stats['max_all_time']} "
f"(avg ${stats['avg_all_time']})")
# 2. Detect current dynamic pricing
proxy_configs = [
{"proxy": PROXY, "locale": "en-US", "timezone": "America/New_York"},
{"proxy": PROXY, "locale": "en-GB", "timezone": "Europe/London"},
]
url = f"https://www.amazon.com/dp/{ASIN}"
live = asyncio.run(detect_dynamic_pricing(url, proxy_configs))
variance = analyze_price_variance(live)
# 3. Store live snapshots
for r in live:
if r.get("price") and not r.get("error"):
try:
insert_snapshot(
conn, ASIN, "amazon", r["locale"][:2].upper(),
float(str(r["price"]).replace(",", "")),
currency=r.get("currency", "USD"),
source="playwright",
)
except (ValueError, TypeError):
pass
# 4. Report
print(f"\nDynamic pricing detected: {variance.get('dynamic_pricing_detected', False)}")
print(f"Price spread: ${variance.get('spread', 0):.2f} ({variance.get('spread_pct', 0):.1f}%)")
for row in get_price_variance_report(conn, ASIN):
print(f" {row[0]:5s} | {row[4]:>8.2f} avg | {row[3]:>8.2f} min | {row[4]:>8.2f} max")
Legal Notes
Google's Terms of Service prohibit automated scraping of search results. Amazon's Conditions of Use prohibit scraping product data. CamelCamelCamel has its own ToS and itself scrapes Amazon. The techniques documented here are for educational purposes — personal research, price alerts for personal use, and academic analysis of pricing patterns. For commercial applications at scale, evaluate official data partnerships (Amazon Product Advertising API, Google Shopping API) alongside the scraping approach.