← Back to blog

Scrape Google Shopping Prices with Python: Product Data & Price Comparison (2026)

Scrape Google Shopping Prices with Python: Product Data & Price Comparison (2026)

Google Shopping aggregates product listings from thousands of retailers into one searchable interface. It shows prices, seller ratings, shipping costs, and product specs — all structured data that is useful for price monitoring, competitive analysis, and market research.

The official route is the Google Shopping Content API, but that is designed for merchants uploading their own product feeds, not for extracting competitor pricing data. For actual price comparison scraping, you need to hit Google Shopping search results directly.

Here is how to do it reliably in 2026.


What You Can Extract

From Google Shopping search results:


The Structure of Google Shopping Results

When you search Google Shopping, the URL follows this pattern:

https://www.google.com/search?q=sony+wh-1000xm5&tbm=shop

The tbm=shop parameter tells Google to return Shopping results. Each product card in the response contains the product title, price, seller name, rating, and a link to the product page.

The HTML structure changes periodically, but the data is also embedded in structured JSON within the page source — look for window.google.kEI and AF_initDataCallback script tags.

Key CSS selectors (as of 2026 — expect these to change periodically): - .sh-dgr__gr-auto — product grid cards - .sh-np__click-target — product cards in list view
- .a8Pemb — price element - .aULzUe — seller/store name - .Rsc7Yb — rating display - .QIrs8 — sponsored label


Basic Scraper

import httpx
from selectolax.parser import HTMLParser
import json
import re
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                  "(KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate, br",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
}

def is_blocked(html: str) -> bool:
    """Check if Google returned a CAPTCHA or block page."""
    blocked_signals = [
        "detected unusual traffic",
        "captcha",
        "sorry/index",
        "recaptcha",
        "/sorry/",
    ]
    html_lower = html.lower()
    return any(signal in html_lower for signal in blocked_signals)

def parse_price(price_str: str) -> int | None:
    """Convert '$1,299.99' to 129999 cents for storage."""
    if not price_str:
        return None
    cleaned = re.sub(r"[^\d.]", "", price_str.replace(",", ""))
    try:
        return int(float(cleaned) * 100)
    except (ValueError, AttributeError):
        return None

def scrape_google_shopping(
    query: str,
    num_pages: int = 3,
    tbs: str = None,
    proxy_url: str = None,
) -> list:
    """
    Scrape product listings from Google Shopping search results.

    query: search term
    num_pages: how many result pages to scrape (60 results per page)
    tbs: additional filter string (e.g., "vw:l" for list view, "p_ord:p" for price ascending)
    proxy_url: optional proxy for IP rotation
    """
    products = []
    proxies = {"http://": proxy_url, "https://": proxy_url} if proxy_url else None

    for page in range(num_pages):
        start = page * 60
        params = {
            "q": query,
            "tbm": "shop",
            "start": start,
            "hl": "en",
            "gl": "us",
        }
        if tbs:
            params["tbs"] = tbs

        # List view shows more data
        if "vw:l" not in (tbs or ""):
            params.setdefault("tbs", "vw:l")

        try:
            client = httpx.Client(
                headers=HEADERS,
                proxies=proxies,
                timeout=20,
                follow_redirects=True,
            )
            resp = client.get("https://www.google.com/search", params=params)
            client.close()
        except httpx.TimeoutException:
            print(f"Page {page}: timeout")
            continue

        if resp.status_code != 200:
            print(f"Page {page}: HTTP {resp.status_code}")
            continue

        if is_blocked(resp.text):
            print(f"Page {page}: blocked — need fresh proxy/session")
            break

        tree = HTMLParser(resp.text)
        page_products = []

        # Parse product cards from DOM
        for selector in [".sh-dgr__gr-auto", ".sh-np__click-target", ".u30d4"]:
            cards = tree.css(selector)
            if cards:
                for card in cards:
                    product = extract_product_from_card(card)
                    if product.get("title") and product.get("price_raw"):
                        page_products.append(product)
                if page_products:
                    break

        # Fallback: extract from JSON in page source
        if not page_products:
            page_products = extract_from_page_json(resp.text, query)

        products.extend(page_products)
        print(f"Page {page + 1}: {len(page_products)} products (total: {len(products)})")

        time.sleep(random.uniform(2, 5))

    return products

def extract_product_from_card(card) -> dict:
    """Extract product data from a single result card node."""
    product = {}

    # Title
    for selector in ["h3", ".tAxDx", ".rgHvZc"]:
        title_el = card.css_first(selector)
        if title_el:
            product["title"] = title_el.text(strip=True)
            break

    # Price
    for selector in [".a8Pemb", ".Ib8pOd .a8Pemb", ".T14wmb"]:
        price_el = card.css_first(selector)
        if price_el:
            product["price_raw"] = price_el.text(strip=True)
            product["price_cents"] = parse_price(product["price_raw"])
            break

    # Original price (if discounted)
    orig_el = card.css_first(".pPDzDa, .RsH3le")
    if orig_el:
        product["original_price_raw"] = orig_el.text(strip=True)
        product["original_price_cents"] = parse_price(product["original_price_raw"])

    # Seller
    for selector in [".aULzUe", ".LbUacb", ".E5ocAb"]:
        seller_el = card.css_first(selector)
        if seller_el:
            product["seller"] = seller_el.text(strip=True)
            break

    # Rating
    rating_el = card.css_first(".Rsc7Yb, .INziyb")
    if rating_el:
        product["rating"] = rating_el.text(strip=True)

    # Review count
    reviews_el = card.css_first(".kHxwFf, .riHy6e span")
    if reviews_el:
        text = reviews_el.text(strip=True)
        match = re.search(r"([\d,]+)", text)
        if match:
            product["review_count"] = int(match.group(1).replace(",", ""))

    # Shipping
    for selector in [".vEjMR", ".XrAfOe", ".hf7bk"]:
        shipping_el = card.css_first(selector)
        if shipping_el:
            product["shipping"] = shipping_el.text(strip=True)
            break

    # Sponsored flag
    sponsored_el = card.css_first(".QIrs8, .mnr-c .eEe0Gc, [aria-label='Sponsored']")
    product["sponsored"] = bool(sponsored_el)

    # Product URL
    link_el = card.css_first("a[href]")
    if link_el:
        href = link_el.attributes.get("href", "")
        if href.startswith("/url"):
            # Google redirect URL — extract real URL
            url_match = re.search(r"url=([^&]+)", href)
            if url_match:
                from urllib.parse import unquote
                product["url"] = unquote(url_match.group(1))
        elif href.startswith("http"):
            product["url"] = href

    return product

def extract_from_page_json(html: str, query: str) -> list:
    """Extract product data from embedded JSON in page source as fallback."""
    products = []

    # Look for AF_initDataCallback with product data
    pattern = r'AF_initDataCallback\(\{[^}]+data:(\[.*?\])\}\)'
    for match in re.finditer(pattern, html, re.DOTALL):
        try:
            data_str = match.group(1)
            # Simple regex extraction for price/title pairs
            title_price_matches = re.findall(
                r'"([A-Z][^"]{5,80})"[^"]*"\$[\d,]+\.\d{2}"',
                data_str
            )
            for title in title_price_matches[:20]:
                products.append({"title": title, "source": "json_extract"})
        except Exception:
            continue

    return products

Handling Google Anti-Bot Detection

Google is aggressive about blocking scrapers. Here is what you are dealing with:

CAPTCHA challenges: After a handful of requests from the same IP, Google serves a CAPTCHA page instead of results. The response still returns 200, but the HTML contains a consent/challenge form.

Rate limiting: Too many requests too fast from one IP triggers temporary blocks. Google does not return a 429 — it just stops serving real results.

TLS fingerprinting: Google checks the TLS ClientHello fingerprint. Python httpx generates a different TLS fingerprint than Chrome. Advanced detection catches this.

Behavioral analysis: Perfectly timed requests with identical headers look robotic.

Proxy Rotation Strategy

The single most effective countermeasure is rotating your IP address per request. Residential proxies work best against Google because the IPs come from real ISPs, not datacenter ranges that Google has already flagged.

ThorData provides residential proxy pools with geo-targeting — useful when you need prices for a specific country, since Google Shopping results vary significantly by location.

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"

def get_shopping_proxy(country="US"):
    """Get a geo-targeted ThorData proxy for Google Shopping."""
    return f"http://{THORDATA_USER}-country-{country}:{THORDATA_PASS}@proxy.thordata.com:9000"

def scrape_with_proxy_rotation(
    query: str,
    country: str = "US",
    max_retries: int = 5,
) -> list:
    """Scrape Google Shopping using rotating residential proxies."""
    proxy_url = get_shopping_proxy(country=country)

    for attempt in range(max_retries):
        try:
            products = scrape_google_shopping(
                query,
                num_pages=1,
                proxy_url=proxy_url,
            )

            if products:
                return products

            print(f"Attempt {attempt + 1}: no products returned, rotating IP...")
            proxy_url = get_shopping_proxy(country=country)
            time.sleep(random.uniform(3, 8))

        except httpx.TimeoutException:
            print(f"Attempt {attempt + 1}: timeout, retrying...")
            time.sleep(5)
            continue

        except Exception as e:
            print(f"Attempt {attempt + 1}: error {e}")
            time.sleep(random.uniform(3, 8))

    return []

Price Comparison Tracker

The real value is in tracking prices over time. Here is a SQLite-backed tracker:

import sqlite3
from datetime import datetime

def init_price_db(db_path: str = "prices.db") -> sqlite3.Connection:
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")

    conn.execute("""
        CREATE TABLE IF NOT EXISTS price_checks (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            query TEXT NOT NULL,
            check_date TEXT NOT NULL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS prices (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            check_id INTEGER,
            query TEXT,
            title TEXT,
            price_raw TEXT,
            price_cents INTEGER,
            original_price_cents INTEGER,
            seller TEXT,
            rating TEXT,
            review_count INTEGER,
            shipping TEXT,
            sponsored INTEGER,
            url TEXT,
            scraped_at TEXT,
            FOREIGN KEY (check_id) REFERENCES price_checks(id)
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_query ON prices(query)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_date ON prices(scraped_at)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_cents ON prices(price_cents)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_prices_title ON prices(title)")

    conn.commit()
    return conn

def save_price_check(conn, query: str, products: list) -> int:
    """Save a complete price check run to the database."""
    now = datetime.utcnow().isoformat()

    cursor = conn.execute(
        "INSERT INTO price_checks (query, check_date) VALUES (?, ?)",
        (query, now)
    )
    check_id = cursor.lastrowid

    conn.executemany("""
        INSERT INTO prices
        (check_id, query, title, price_raw, price_cents, original_price_cents,
         seller, rating, review_count, shipping, sponsored, url, scraped_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, [
        (
            check_id,
            query,
            p.get("title"),
            p.get("price_raw"),
            p.get("price_cents"),
            p.get("original_price_cents"),
            p.get("seller"),
            p.get("rating"),
            p.get("review_count"),
            p.get("shipping"),
            1 if p.get("sponsored") else 0,
            p.get("url"),
            now,
        )
        for p in products
    ])

    conn.commit()
    print(f"Saved {len(products)} prices for query '{query}' (check_id={check_id})")
    return check_id

def get_price_history(query: str, conn: sqlite3.Connection) -> list:
    """Get min/max/avg price per day for a search query."""
    rows = conn.execute("""
        SELECT
            DATE(scraped_at) as day,
            MIN(price_cents) / 100.0 as min_price,
            MAX(price_cents) / 100.0 as max_price,
            AVG(price_cents) / 100.0 as avg_price,
            COUNT(*) as listings,
            SUM(CASE WHEN sponsored = 0 THEN 1 ELSE 0 END) as organic_count
        FROM prices
        WHERE query = ? AND price_cents IS NOT NULL AND price_cents > 0
        GROUP BY DATE(scraped_at)
        ORDER BY day
    """, (query,)).fetchall()

    return [
        {
            "date": r[0],
            "min": r[1],
            "max": r[2],
            "avg": round(r[3], 2),
            "listings": r[4],
            "organic_count": r[5],
        }
        for r in rows
    ]

def find_price_drops(query: str, conn: sqlite3.Connection, threshold_pct: float = 10.0) -> list:
    """Find products with significant price drops compared to historical average."""
    rows = conn.execute("""
        SELECT
            title,
            seller,
            MIN(price_cents) as current_min,
            AVG(price_cents) as historical_avg,
            COUNT(DISTINCT DATE(scraped_at)) as days_tracked
        FROM prices
        WHERE query = ?
            AND price_cents IS NOT NULL
            AND price_cents > 0
        GROUP BY title
        HAVING days_tracked >= 3
    """, (query,)).fetchall()

    drops = []
    for title, seller, current_min, avg, days in rows:
        if avg > 0:
            drop_pct = (avg - current_min) / avg * 100
            if drop_pct >= threshold_pct:
                drops.append({
                    "title": title,
                    "seller": seller,
                    "current_price": current_min / 100,
                    "avg_price": round(avg / 100, 2),
                    "drop_pct": round(drop_pct, 1),
                    "days_tracked": days,
                })

    return sorted(drops, key=lambda x: x["drop_pct"], reverse=True)

# Example: track laptop prices over multiple days
db = init_price_db("laptop_prices.db")
products = scrape_with_proxy_rotation("gaming laptop RTX 4070", country="US")
check_id = save_price_check(db, "gaming laptop RTX 4070", products)

history = get_price_history("gaming laptop RTX 4070", db)
print("\nPrice history:")
for day in history:
    print(f"  {day['date']}: ${day['min']:.2f} - ${day['max']:.2f} "
          f"(avg: ${day['avg']:.2f}, {day['listings']} listings)")

Multi-Product Monitoring Pipeline

For tracking dozens of products automatically:

from pathlib import Path

def run_monitoring_pipeline(
    queries: list,
    db_path: str = "price_monitor.db",
    country: str = "US",
    output_json: str = None,
):
    """
    Run a complete price monitoring cycle for multiple queries.
    Saves all results to SQLite and optionally exports to JSON.
    """
    conn = init_price_db(db_path)
    all_results = {}

    for i, query in enumerate(queries):
        print(f"\n[{i+1}/{len(queries)}] Scraping: {query}")

        products = scrape_with_proxy_rotation(query, country=country)

        if products:
            check_id = save_price_check(conn, query, products)
            all_results[query] = {
                "count": len(products),
                "min_price": min(
                    (p["price_cents"] for p in products if p.get("price_cents")),
                    default=None
                ),
                "max_price": max(
                    (p["price_cents"] for p in products if p.get("price_cents")),
                    default=None
                ),
            }

            if all_results[query]["min_price"]:
                min_p = all_results[query]["min_price"] / 100
                max_p = all_results[query]["max_price"] / 100
                print(f"  Found {len(products)} listings | "
                      f"${min_p:.2f} - ${max_p:.2f}")
        else:
            print(f"  No products found")

        # Wait between queries
        if i < len(queries) - 1:
            wait = random.uniform(15, 30)
            print(f"  Waiting {wait:.0f}s before next query...")
            time.sleep(wait)

    if output_json:
        import json
        summary = {
            "run_date": datetime.utcnow().isoformat(),
            "country": country,
            "queries": all_results,
        }
        Path(output_json).write_text(json.dumps(summary, indent=2))
        print(f"\nSummary saved to {output_json}")

    conn.close()
    return all_results

# Monitor consumer electronics prices
queries = [
    "sony wh-1000xm5",
    "apple airpods pro 2",
    "samsung galaxy s25 ultra",
    "nvidia rtx 5080",
    "macbook pro m4",
]

results = run_monitoring_pipeline(
    queries=queries,
    db_path="electronics_prices.db",
    country="US",
    output_json="price_monitor_run.json",
)

Extracting Structured Product Data from JSON-LD

Many Google Shopping product pages include JSON-LD structured data, making extraction reliable even when CSS selectors change.

def extract_product_jsonld(html: str) -> dict:
    """Extract product data from JSON-LD structured data in product pages."""
    from bs4 import BeautifulSoup

    soup = BeautifulSoup(html, "lxml")
    product = {}

    for script in soup.select('script[type="application/ld+json"]'):
        try:
            data = json.loads(script.string)
            if data.get("@type") == "Product":
                product["name"] = data.get("name", "")
                product["brand"] = data.get("brand", {}).get("name", "")
                product["description"] = data.get("description", "")[:500]
                product["sku"] = data.get("sku", "")
                product["gtin"] = data.get("gtin13") or data.get("gtin") or ""

                offers = data.get("offers", {})
                if isinstance(offers, dict):
                    product["price"] = offers.get("price")
                    product["currency"] = offers.get("priceCurrency")
                    product["availability"] = offers.get("availability", "")
                    product["seller"] = offers.get("seller", {}).get("name", "")
                elif isinstance(offers, list) and offers:
                    prices = [o.get("price") for o in offers if o.get("price")]
                    if prices:
                        product["price_min"] = min(prices)
                        product["price_max"] = max(prices)
                        product["price"] = prices[0]

                agg_rating = data.get("aggregateRating", {})
                product["rating"] = agg_rating.get("ratingValue")
                product["review_count"] = agg_rating.get("reviewCount")

                break
        except (json.JSONDecodeError, AttributeError):
            continue

    return product

Practical Tips

Geo-targeting matters. Google Shopping prices vary dramatically by country. If you are tracking US prices, make sure your proxy exits in the US. ThorData handles country-targeted routing automatically.

Check for consent pages. In the EU, Google shows a cookie consent page that blocks the actual results. Add a check for consent.google.com redirects and handle them by passing the consent cookie.

Use list view. Adding tbs=vw:l to the URL gives you list view instead of grid view. List view contains more data per result — including seller names and shipping info that grid view sometimes hides.

Rate limit yourself. Even with proxies, do not hammer Google. 1 request every 3-5 seconds is reasonable for sustained monitoring.

Validate your data. Google Shopping results include sponsored listings mixed with organic results. Check for the sponsored flag to separate paid from organic results — sponsored listings often have inflated prices from sellers bidding on visibility.

Handle price formats. Prices appear in many formats: "$1,299.99", "$1299", "From $899", "$899.00 - $1,299.00". Your price parser needs to handle all of these, and you should store the raw string alongside the parsed integer.

Monitor selector stability. Google changes Shopping CSS classes regularly. Set up a simple canary check that verifies your scraper is returning expected data, and alert yourself when extraction rates drop significantly.

Store raw HTML. For debugging, save the raw response HTML for a sample of requests. When selectors break, you need the actual HTML to figure out what changed.

Google Shopping is one of the harder targets to scrape reliably at scale, but the data is worth it for price comparison tools and market pricing analysis. Start with small batches, rotate your infrastructure with ThorData residential proxies, and always check that you are getting real results rather than CAPTCHAs.