Scrape Google Shopping Prices with Python: Product Data & Price Comparison (2026)
Scrape Google Shopping Prices with Python: Product Data & Price Comparison (2026)
Google Shopping aggregates product listings from thousands of retailers into one searchable interface. It shows prices, seller ratings, shipping costs, and product specs — all structured data that is useful for price monitoring, competitive analysis, and market research.
The official route is the Google Shopping Content API, but that is designed for merchants uploading their own product feeds, not for extracting competitor pricing data. For actual price comparison scraping, you need to hit Google Shopping search results directly.
Here is how to do it reliably in 2026.
What You Can Extract
From Google Shopping search results:
- Product title and brief description
- Price (current price, original price if discounted)
- Seller name and rating
- Shipping cost and estimated delivery
- Product ratings and review count
- Product image URL
- Sponsored vs. organic listing labels
- Product identifiers (GTIN, MPN, model number)
- Comparison prices across multiple sellers for the same product
- Filter categories: brand, price range, condition (new/used/refurbished)
The Structure of Google Shopping Results
When you search Google Shopping, the URL follows this pattern:
https://www.google.com/search?q=sony+wh-1000xm5&tbm=shop
The tbm=shop parameter tells Google to return Shopping results. Each product card in the response contains the product title, price, seller name, rating, and a link to the product page.
The HTML structure changes periodically, but the data is also embedded in structured JSON within the page source — look for window.google.kEI and AF_initDataCallback script tags.
Key CSS selectors (as of 2026 — expect these to change periodically):
- .sh-dgr__gr-auto — product grid cards
- .sh-np__click-target — product cards in list view
- .a8Pemb — price element
- .aULzUe — seller/store name
- .Rsc7Yb — rating display
- .QIrs8 — sponsored label
Basic Scraper
import httpx
from selectolax.parser import HTMLParser
import json
import re
import time
import random
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
}
def is_blocked(html: str) -> bool:
"""Check if Google returned a CAPTCHA or block page."""
blocked_signals = [
"detected unusual traffic",
"captcha",
"sorry/index",
"recaptcha",
"/sorry/",
]
html_lower = html.lower()
return any(signal in html_lower for signal in blocked_signals)
def parse_price(price_str: str) -> int | None:
"""Convert '$1,299.99' to 129999 cents for storage."""
if not price_str:
return None
cleaned = re.sub(r"[^\d.]", "", price_str.replace(",", ""))
try:
return int(float(cleaned) * 100)
except (ValueError, AttributeError):
return None
def scrape_google_shopping(
query: str,
num_pages: int = 3,
tbs: str = None,
proxy_url: str = None,
) -> list:
"""
Scrape product listings from Google Shopping search results.
query: search term
num_pages: how many result pages to scrape (60 results per page)
tbs: additional filter string (e.g., "vw:l" for list view, "p_ord:p" for price ascending)
proxy_url: optional proxy for IP rotation
"""
products = []
proxies = {"http://": proxy_url, "https://": proxy_url} if proxy_url else None
for page in range(num_pages):
start = page * 60
params = {
"q": query,
"tbm": "shop",
"start": start,
"hl": "en",
"gl": "us",
}
if tbs:
params["tbs"] = tbs
# List view shows more data
if "vw:l" not in (tbs or ""):
params.setdefault("tbs", "vw:l")
try:
client = httpx.Client(
headers=HEADERS,
proxies=proxies,
timeout=20,
follow_redirects=True,
)
resp = client.get("https://www.google.com/search", params=params)
client.close()
except httpx.TimeoutException:
print(f"Page {page}: timeout")
continue
if resp.status_code != 200:
print(f"Page {page}: HTTP {resp.status_code}")
continue
if is_blocked(resp.text):
print(f"Page {page}: blocked — need fresh proxy/session")
break
tree = HTMLParser(resp.text)
page_products = []
# Parse product cards from DOM
for selector in [".sh-dgr__gr-auto", ".sh-np__click-target", ".u30d4"]:
cards = tree.css(selector)
if cards:
for card in cards:
product = extract_product_from_card(card)
if product.get("title") and product.get("price_raw"):
page_products.append(product)
if page_products:
break
# Fallback: extract from JSON in page source
if not page_products:
page_products = extract_from_page_json(resp.text, query)
products.extend(page_products)
print(f"Page {page + 1}: {len(page_products)} products (total: {len(products)})")
time.sleep(random.uniform(2, 5))
return products
def extract_product_from_card(card) -> dict:
"""Extract product data from a single result card node."""
product = {}
# Title
for selector in ["h3", ".tAxDx", ".rgHvZc"]:
title_el = card.css_first(selector)
if title_el:
product["title"] = title_el.text(strip=True)
break
# Price
for selector in [".a8Pemb", ".Ib8pOd .a8Pemb", ".T14wmb"]:
price_el = card.css_first(selector)
if price_el:
product["price_raw"] = price_el.text(strip=True)
product["price_cents"] = parse_price(product["price_raw"])
break
# Original price (if discounted)
orig_el = card.css_first(".pPDzDa, .RsH3le")
if orig_el:
product["original_price_raw"] = orig_el.text(strip=True)
product["original_price_cents"] = parse_price(product["original_price_raw"])
# Seller
for selector in [".aULzUe", ".LbUacb", ".E5ocAb"]:
seller_el = card.css_first(selector)
if seller_el:
product["seller"] = seller_el.text(strip=True)
break
# Rating
rating_el = card.css_first(".Rsc7Yb, .INziyb")
if rating_el:
product["rating"] = rating_el.text(strip=True)
# Review count
reviews_el = card.css_first(".kHxwFf, .riHy6e span")
if reviews_el:
text = reviews_el.text(strip=True)
match = re.search(r"([\d,]+)", text)
if match:
product["review_count"] = int(match.group(1).replace(",", ""))
# Shipping
for selector in [".vEjMR", ".XrAfOe", ".hf7bk"]:
shipping_el = card.css_first(selector)
if shipping_el:
product["shipping"] = shipping_el.text(strip=True)
break
# Sponsored flag
sponsored_el = card.css_first(".QIrs8, .mnr-c .eEe0Gc, [aria-label='Sponsored']")
product["sponsored"] = bool(sponsored_el)
# Product URL
link_el = card.css_first("a[href]")
if link_el:
href = link_el.attributes.get("href", "")
if href.startswith("/url"):
# Google redirect URL — extract real URL
url_match = re.search(r"url=([^&]+)", href)
if url_match:
from urllib.parse import unquote
product["url"] = unquote(url_match.group(1))
elif href.startswith("http"):
product["url"] = href
return product
def extract_from_page_json(html: str, query: str) -> list:
"""Extract product data from embedded JSON in page source as fallback."""
products = []
# Look for AF_initDataCallback with product data
pattern = r'AF_initDataCallback\(\{[^}]+data:(\[.*?\])\}\)'
for match in re.finditer(pattern, html, re.DOTALL):
try:
data_str = match.group(1)
# Simple regex extraction for price/title pairs
title_price_matches = re.findall(
r'"([A-Z][^"]{5,80})"[^"]*"\$[\d,]+\.\d{2}"',
data_str
)
for title in title_price_matches[:20]:
products.append({"title": title, "source": "json_extract"})
except Exception:
continue
return products
Handling Google Anti-Bot Detection
Google is aggressive about blocking scrapers. Here is what you are dealing with:
CAPTCHA challenges: After a handful of requests from the same IP, Google serves a CAPTCHA page instead of results. The response still returns 200, but the HTML contains a consent/challenge form.
Rate limiting: Too many requests too fast from one IP triggers temporary blocks. Google does not return a 429 — it just stops serving real results.
TLS fingerprinting: Google checks the TLS ClientHello fingerprint. Python httpx generates a different TLS fingerprint than Chrome. Advanced detection catches this.
Behavioral analysis: Perfectly timed requests with identical headers look robotic.
Proxy Rotation Strategy
The single most effective countermeasure is rotating your IP address per request. Residential proxies work best against Google because the IPs come from real ISPs, not datacenter ranges that Google has already flagged.
ThorData provides residential proxy pools with geo-targeting — useful when you need prices for a specific country, since Google Shopping results vary significantly by location.
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
def get_shopping_proxy(country="US"):
"""Get a geo-targeted ThorData proxy for Google Shopping."""
return f"http://{THORDATA_USER}-country-{country}:{THORDATA_PASS}@proxy.thordata.com:9000"
def scrape_with_proxy_rotation(
query: str,
country: str = "US",
max_retries: int = 5,
) -> list:
"""Scrape Google Shopping using rotating residential proxies."""
proxy_url = get_shopping_proxy(country=country)
for attempt in range(max_retries):
try:
products = scrape_google_shopping(
query,
num_pages=1,
proxy_url=proxy_url,
)
if products:
return products
print(f"Attempt {attempt + 1}: no products returned, rotating IP...")
proxy_url = get_shopping_proxy(country=country)
time.sleep(random.uniform(3, 8))
except httpx.TimeoutException:
print(f"Attempt {attempt + 1}: timeout, retrying...")
time.sleep(5)
continue
except Exception as e:
print(f"Attempt {attempt + 1}: error {e}")
time.sleep(random.uniform(3, 8))
return []
Price Comparison Tracker
The real value is in tracking prices over time. Here is a SQLite-backed tracker:
import sqlite3
from datetime import datetime
def init_price_db(db_path: str = "prices.db") -> sqlite3.Connection:
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("""
CREATE TABLE IF NOT EXISTS price_checks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query TEXT NOT NULL,
check_date TEXT NOT NULL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS prices (
id INTEGER PRIMARY KEY AUTOINCREMENT,
check_id INTEGER,
query TEXT,
title TEXT,
price_raw TEXT,
price_cents INTEGER,
original_price_cents INTEGER,
seller TEXT,
rating TEXT,
review_count INTEGER,
shipping TEXT,
sponsored INTEGER,
url TEXT,
scraped_at TEXT,
FOREIGN KEY (check_id) REFERENCES price_checks(id)
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_query ON prices(query)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_date ON prices(scraped_at)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_cents ON prices(price_cents)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_title ON prices(title)")
conn.commit()
return conn
def save_price_check(conn, query: str, products: list) -> int:
"""Save a complete price check run to the database."""
now = datetime.utcnow().isoformat()
cursor = conn.execute(
"INSERT INTO price_checks (query, check_date) VALUES (?, ?)",
(query, now)
)
check_id = cursor.lastrowid
conn.executemany("""
INSERT INTO prices
(check_id, query, title, price_raw, price_cents, original_price_cents,
seller, rating, review_count, shipping, sponsored, url, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", [
(
check_id,
query,
p.get("title"),
p.get("price_raw"),
p.get("price_cents"),
p.get("original_price_cents"),
p.get("seller"),
p.get("rating"),
p.get("review_count"),
p.get("shipping"),
1 if p.get("sponsored") else 0,
p.get("url"),
now,
)
for p in products
])
conn.commit()
print(f"Saved {len(products)} prices for query '{query}' (check_id={check_id})")
return check_id
def get_price_history(query: str, conn: sqlite3.Connection) -> list:
"""Get min/max/avg price per day for a search query."""
rows = conn.execute("""
SELECT
DATE(scraped_at) as day,
MIN(price_cents) / 100.0 as min_price,
MAX(price_cents) / 100.0 as max_price,
AVG(price_cents) / 100.0 as avg_price,
COUNT(*) as listings,
SUM(CASE WHEN sponsored = 0 THEN 1 ELSE 0 END) as organic_count
FROM prices
WHERE query = ? AND price_cents IS NOT NULL AND price_cents > 0
GROUP BY DATE(scraped_at)
ORDER BY day
""", (query,)).fetchall()
return [
{
"date": r[0],
"min": r[1],
"max": r[2],
"avg": round(r[3], 2),
"listings": r[4],
"organic_count": r[5],
}
for r in rows
]
def find_price_drops(query: str, conn: sqlite3.Connection, threshold_pct: float = 10.0) -> list:
"""Find products with significant price drops compared to historical average."""
rows = conn.execute("""
SELECT
title,
seller,
MIN(price_cents) as current_min,
AVG(price_cents) as historical_avg,
COUNT(DISTINCT DATE(scraped_at)) as days_tracked
FROM prices
WHERE query = ?
AND price_cents IS NOT NULL
AND price_cents > 0
GROUP BY title
HAVING days_tracked >= 3
""", (query,)).fetchall()
drops = []
for title, seller, current_min, avg, days in rows:
if avg > 0:
drop_pct = (avg - current_min) / avg * 100
if drop_pct >= threshold_pct:
drops.append({
"title": title,
"seller": seller,
"current_price": current_min / 100,
"avg_price": round(avg / 100, 2),
"drop_pct": round(drop_pct, 1),
"days_tracked": days,
})
return sorted(drops, key=lambda x: x["drop_pct"], reverse=True)
# Example: track laptop prices over multiple days
db = init_price_db("laptop_prices.db")
products = scrape_with_proxy_rotation("gaming laptop RTX 4070", country="US")
check_id = save_price_check(db, "gaming laptop RTX 4070", products)
history = get_price_history("gaming laptop RTX 4070", db)
print("\nPrice history:")
for day in history:
print(f" {day['date']}: ${day['min']:.2f} - ${day['max']:.2f} "
f"(avg: ${day['avg']:.2f}, {day['listings']} listings)")
Multi-Product Monitoring Pipeline
For tracking dozens of products automatically:
from pathlib import Path
def run_monitoring_pipeline(
queries: list,
db_path: str = "price_monitor.db",
country: str = "US",
output_json: str = None,
):
"""
Run a complete price monitoring cycle for multiple queries.
Saves all results to SQLite and optionally exports to JSON.
"""
conn = init_price_db(db_path)
all_results = {}
for i, query in enumerate(queries):
print(f"\n[{i+1}/{len(queries)}] Scraping: {query}")
products = scrape_with_proxy_rotation(query, country=country)
if products:
check_id = save_price_check(conn, query, products)
all_results[query] = {
"count": len(products),
"min_price": min(
(p["price_cents"] for p in products if p.get("price_cents")),
default=None
),
"max_price": max(
(p["price_cents"] for p in products if p.get("price_cents")),
default=None
),
}
if all_results[query]["min_price"]:
min_p = all_results[query]["min_price"] / 100
max_p = all_results[query]["max_price"] / 100
print(f" Found {len(products)} listings | "
f"${min_p:.2f} - ${max_p:.2f}")
else:
print(f" No products found")
# Wait between queries
if i < len(queries) - 1:
wait = random.uniform(15, 30)
print(f" Waiting {wait:.0f}s before next query...")
time.sleep(wait)
if output_json:
import json
summary = {
"run_date": datetime.utcnow().isoformat(),
"country": country,
"queries": all_results,
}
Path(output_json).write_text(json.dumps(summary, indent=2))
print(f"\nSummary saved to {output_json}")
conn.close()
return all_results
# Monitor consumer electronics prices
queries = [
"sony wh-1000xm5",
"apple airpods pro 2",
"samsung galaxy s25 ultra",
"nvidia rtx 5080",
"macbook pro m4",
]
results = run_monitoring_pipeline(
queries=queries,
db_path="electronics_prices.db",
country="US",
output_json="price_monitor_run.json",
)
Extracting Structured Product Data from JSON-LD
Many Google Shopping product pages include JSON-LD structured data, making extraction reliable even when CSS selectors change.
def extract_product_jsonld(html: str) -> dict:
"""Extract product data from JSON-LD structured data in product pages."""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "lxml")
product = {}
for script in soup.select('script[type="application/ld+json"]'):
try:
data = json.loads(script.string)
if data.get("@type") == "Product":
product["name"] = data.get("name", "")
product["brand"] = data.get("brand", {}).get("name", "")
product["description"] = data.get("description", "")[:500]
product["sku"] = data.get("sku", "")
product["gtin"] = data.get("gtin13") or data.get("gtin") or ""
offers = data.get("offers", {})
if isinstance(offers, dict):
product["price"] = offers.get("price")
product["currency"] = offers.get("priceCurrency")
product["availability"] = offers.get("availability", "")
product["seller"] = offers.get("seller", {}).get("name", "")
elif isinstance(offers, list) and offers:
prices = [o.get("price") for o in offers if o.get("price")]
if prices:
product["price_min"] = min(prices)
product["price_max"] = max(prices)
product["price"] = prices[0]
agg_rating = data.get("aggregateRating", {})
product["rating"] = agg_rating.get("ratingValue")
product["review_count"] = agg_rating.get("reviewCount")
break
except (json.JSONDecodeError, AttributeError):
continue
return product
Practical Tips
Geo-targeting matters. Google Shopping prices vary dramatically by country. If you are tracking US prices, make sure your proxy exits in the US. ThorData handles country-targeted routing automatically.
Check for consent pages. In the EU, Google shows a cookie consent page that blocks the actual results. Add a check for consent.google.com redirects and handle them by passing the consent cookie.
Use list view. Adding tbs=vw:l to the URL gives you list view instead of grid view. List view contains more data per result — including seller names and shipping info that grid view sometimes hides.
Rate limit yourself. Even with proxies, do not hammer Google. 1 request every 3-5 seconds is reasonable for sustained monitoring.
Validate your data. Google Shopping results include sponsored listings mixed with organic results. Check for the sponsored flag to separate paid from organic results — sponsored listings often have inflated prices from sellers bidding on visibility.
Handle price formats. Prices appear in many formats: "$1,299.99", "$1299", "From $899", "$899.00 - $1,299.00". Your price parser needs to handle all of these, and you should store the raw string alongside the parsed integer.
Monitor selector stability. Google changes Shopping CSS classes regularly. Set up a simple canary check that verifies your scraper is returning expected data, and alert yourself when extraction rates drop significantly.
Store raw HTML. For debugging, save the raw response HTML for a sample of requests. When selectors break, you need the actual HTML to figure out what changed.
Google Shopping is one of the harder targets to scrape reliably at scale, but the data is worth it for price comparison tools and market pricing analysis. Start with small batches, rotate your infrastructure with ThorData residential proxies, and always check that you are getting real results rather than CAPTCHAs.