Scrape Google Maps Reviews & Business Data with Python and Playwright (2026)

2026-04-09 [python google-maps scraping playwright reviews]

Scrape Google Maps Reviews & Business Data with Python and Playwright (2026)

Google Maps is a goldmine for business data — reviews, ratings, operating hours, photos, pricing info, and geolocation coordinates for millions of places. But Google protects it aggressively. Their anti-bot detection in 2026 is some of the most sophisticated on the web. The Maps interface is heavily JavaScript-driven, data loads dynamically through scroll events, and they fingerprint browsers at multiple layers.

This guide covers what actually works: Playwright-based automation for dynamic content, DOM selector strategies, review scrolling, multi-place pipelines, and the proxy infrastructure needed for any meaningful scale.

What You Can Extract

From a Google Maps place listing:

Business name, address, phone number, website
Overall rating and total review count
Individual reviews — text, star rating, date, reviewer name, helpful vote count, photos
Business hours by day of week, plus holiday hours
Popular times data (hourly visit patterns by day)
Photos and their categories (exterior, interior, food, menu, etc.)
Price level ($, $$, $$$, $$$$) and business categories
Plus Code and exact coordinates (from URL)
Accessibility features, amenities, and service options

The Google Places API vs. Scraping

Google offers a legitimate Places API. The problems:

Charges $0.017 per request for basic data, $0.02-0.04 for details
The Places API only returns the 5 most relevant reviews per place — not useful for comprehensive review data
For 10,000 businesses with reviews, you are looking at $400-700+

Scraping gives you everything, for free, but you are fighting their bot detection. For small datasets (under a few hundred places), the scraping approach is fine. For larger pipelines, budget for proxy infrastructure.

Setup

pip install playwright selectolax
playwright install chromium

Playwright's Python bindings are async-first. All examples below use asyncio.

Basic Place Scraper

import asyncio
import json
import re
from playwright.async_api import async_playwright

async def scrape_place(url: str, proxy: dict = None) -> dict:
    """Scrape a Google Maps place listing for business data."""
    async with async_playwright() as p:
        launch_args = {
            "headless": True,
            "args": [
                "--disable-blink-features=AutomationControlled",
                "--disable-dev-shm-usage",
                "--no-sandbox",
            ],
        }
        if proxy:
            launch_args["proxy"] = proxy

        browser = await p.chromium.launch(**launch_args)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1280, "height": 900},
            locale="en-US",
            geolocation={"latitude": 40.7128, "longitude": -74.0060},
            permissions=["geolocation"],
        )

        page = await context.new_page()

        # Block images to speed up loading (reviews don't need photos)
        await page.route(
            "**/*.{png,jpg,jpeg,gif,webp,svg}",
            lambda route: route.abort()
        )

        await page.goto(url, wait_until="domcontentloaded", timeout=30000)
        await asyncio.sleep(3)

        # Handle EU cookie consent popup
        try:
            accept_btn = await page.query_selector("button[aria-label*='Accept all']")
            if not accept_btn:
                accept_btn = await page.query_selector("form[action*='consent'] button")
            if accept_btn:
                await accept_btn.click()
                await asyncio.sleep(1)
        except Exception:
            pass

        place = {"url": url}

        # Business name
        for selector in ["h1", "h1.DUwDvf", "h1.fontHeadlineLarge"]:
            name_el = await page.query_selector(selector)
            if name_el:
                name = (await name_el.inner_text()).strip()
                if name:
                    place["name"] = name
                    break

        # Overall rating
        rating_el = await page.query_selector("div.F7nice span[aria-hidden='true']")
        if rating_el:
            try:
                place["rating"] = float(await rating_el.inner_text())
            except ValueError:
                pass

        # Review count
        review_el = await page.query_selector("button[aria-label*='reviews']")
        if review_el:
            aria = await review_el.get_attribute("aria-label")
            match = re.search(r"([\d,]+)\s+review", aria or "")
            if match:
                place["review_count"] = int(match.group(1).replace(",", ""))

        # Address
        addr_el = await page.query_selector("button[data-item-id='address'] div.Io6YTe")
        if addr_el:
            place["address"] = (await addr_el.inner_text()).strip()

        # Phone
        phone_el = await page.query_selector("button[data-item-id*='phone'] div.Io6YTe")
        if phone_el:
            place["phone"] = (await phone_el.inner_text()).strip()

        # Website
        website_el = await page.query_selector("a[data-item-id='authority'] div.Io6YTe")
        if website_el:
            place["website"] = (await website_el.inner_text()).strip()

        # Categories
        category_els = await page.query_selector_all("button[jsaction*='category']")
        if not category_els:
            category_els = await page.query_selector_all("span.DkEaL")
        categories = []
        for el in category_els[:5]:
            text = (await el.inner_text()).strip()
            if text:
                categories.append(text)
        if categories:
            place["categories"] = categories

        # Price level
        price_el = await page.query_selector("span.ZDu9vd")
        if price_el:
            place["price_level"] = (await price_el.inner_text()).strip()

        # Business hours
        hours = await extract_hours(page)
        if hours:
            place["hours"] = hours

        # Coordinates from URL
        current_url = page.url
        coord_match = re.search(r"@(-?\d+\.\d+),(-?\d+\.\d+)", current_url)
        if coord_match:
            place["latitude"] = float(coord_match.group(1))
            place["longitude"] = float(coord_match.group(2))

        await browser.close()
        return place

async def extract_hours(page) -> dict:
    """Extract business hours from a Maps page."""
    hours = {}

    # Try the hours table (appears after clicking hours section)
    hours_btn = await page.query_selector("button[data-item-id='oh']")
    if hours_btn:
        await hours_btn.click()
        await asyncio.sleep(1)

    rows = await page.query_selector_all("table.eK4R0e tr, tr.y0skZc")
    for row in rows:
        cells = await row.query_selector_all("td, th")
        if len(cells) >= 2:
            day = (await cells[0].inner_text()).strip()
            hours_text = (await cells[1].inner_text()).strip()
            if day:
                hours[day] = hours_text

    return hours

Scrolling Reviews

The reviews section loads lazily — Google shows 3-5 initially and loads more as you scroll. Each scroll triggers an XHR for the next batch.

async def scrape_reviews(page, max_reviews: int = 100) -> list:
    """Scroll through and extract reviews from a Maps place page."""
    # Navigate to Reviews tab
    for selector in [
        "button[aria-label*='Reviews']",
        "button[aria-label*='review']",
        "div[data-tab-index='1']",
    ]:
        reviews_btn = await page.query_selector(selector)
        if reviews_btn:
            await reviews_btn.click()
            await asyncio.sleep(2)
            break

    # Sort by Newest for chronological collection
    sort_btn = await page.query_selector("button[aria-label='Sort reviews'], button[data-value='Sort']")
    if sort_btn:
        await sort_btn.click()
        await asyncio.sleep(1)
        # Select Newest option
        for option_selector in ["li[data-index='1']", "li[role='menuitemradio']:nth-child(2)"]:
            newest = await page.query_selector(option_selector)
            if newest:
                await newest.click()
                await asyncio.sleep(2)
                break

    reviews = []
    last_count = 0
    stalled_rounds = 0
    max_stalled = 8

    # Find the scrollable reviews container
    scrollable_selectors = [
        "div.m6QErb.DxyBCb.kA9KIf.dS8AEf",
        "div[jsaction*='scrollable']",
        "div.m6QErb",
    ]
    scrollable = None
    for sel in scrollable_selectors:
        scrollable = await page.query_selector(sel)
        if scrollable:
            break

    while len(reviews) < max_reviews and stalled_rounds < max_stalled:
        # Expand "More" buttons before extracting
        more_btns = await page.query_selector_all("button.w8nwRe, button[aria-label*='See more']")
        for btn in more_btns:
            try:
                await btn.click()
                await asyncio.sleep(0.2)
            except Exception:
                pass

        # Extract all visible review elements
        review_els = await page.query_selector_all("div.jftiEf, div[data-review-id]")

        for el in review_els[len(reviews):]:
            review = {}

            name_el = await el.query_selector("div.d4r55, .WNxzHc a")
            if name_el:
                review["reviewer"] = (await name_el.inner_text()).strip()

            stars_el = await el.query_selector("span.kvMYJc, span[aria-label*='star']")
            if stars_el:
                aria = await stars_el.get_attribute("aria-label")
                match = re.search(r"(\d)", aria or "")
                if match:
                    review["stars"] = int(match.group(1))

            text_el = await el.query_selector("span.wiI7pd, div.MyEned span")
            if text_el:
                review["text"] = (await text_el.inner_text()).strip()

            date_el = await el.query_selector("span.rsqaWe, span[class*='date']")
            if date_el:
                review["date"] = (await date_el.inner_text()).strip()

            # Photo count
            photos_el = await el.query_selector("button[aria-label*='photo']")
            if photos_el:
                aria = await photos_el.get_attribute("aria-label")
                match = re.search(r"(\d+)", aria or "")
                if match:
                    review["photo_count"] = int(match.group(1))

            # Owner response
            response_el = await el.query_selector("div.wiI7pd ~ div.wiI7pd")
            if response_el:
                review["owner_response"] = (await response_el.inner_text()).strip()[:200]

            if review.get("reviewer") or review.get("text"):
                reviews.append(review)

        if len(reviews) == last_count:
            stalled_rounds += 1
        else:
            stalled_rounds = 0
            last_count = len(reviews)

        # Scroll down in reviews container
        if scrollable:
            await scrollable.evaluate("el => el.scrollTop = el.scrollHeight")
        else:
            await page.keyboard.press("End")
        await asyncio.sleep(1.5)

    return reviews[:max_reviews]

# Full place scrape with reviews
async def scrape_place_with_reviews(url: str, max_reviews: int = 50, proxy: dict = None) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled", "--no-sandbox"],
            proxy=proxy,
        )
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
            viewport={"width": 1280, "height": 900},
            locale="en-US",
        )
        page = await context.new_page()
        await page.route("**/*.{png,jpg,jpeg,gif,webp}", lambda r: r.abort())

        await page.goto(url, wait_until="domcontentloaded")
        await asyncio.sleep(3)

        place = await scrape_place(url, proxy=proxy)
        reviews = await scrape_reviews(page, max_reviews=max_reviews)
        place["reviews"] = reviews

        await browser.close()
        return place

Searching for Places

To build a dataset, start with a category search and collect all results.

async def search_places(query: str, max_results: int = 20, proxy: dict = None) -> list:
    """Search Google Maps and return place URLs."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled"],
            proxy=proxy,
        )
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
            viewport={"width": 1280, "height": 900},
        )
        page = await context.new_page()

        search_url = f"https://www.google.com/maps/search/{query.replace(' ', '+')}"
        await page.goto(search_url, wait_until="domcontentloaded")
        await asyncio.sleep(3)

        urls = set()
        scroll_count = 0

        feed = await page.query_selector("div[role='feed']")

        while len(urls) < max_results and scroll_count < 20:
            # Extract place links
            items = await page.query_selector_all("a[href*='/maps/place/']")
            for item in items:
                href = await item.get_attribute("href")
                if href and "maps/place" in href:
                    # Normalize to canonical URL
                    match = re.search(r"(/maps/place/[^@?]+@[^/]+)", href)
                    if match:
                        canonical = f"https://www.google.com{match.group(1)}"
                        urls.add(canonical)

            if len(urls) >= max_results:
                break

            # Scroll the results feed
            if feed:
                await feed.evaluate("el => el.scrollTop = el.scrollHeight")
            await asyncio.sleep(2)
            scroll_count += 1

        await browser.close()
        return list(urls)[:max_results]

# Find coffee shops in NYC
urls = asyncio.run(search_places("coffee shops Manhattan New York", max_results=20))
for url in urls[:5]:
    print(url)

Google Anti-Bot Detection

Google Maps uses multiple detection layers:

reCAPTCHA v3 runs silently, scoring every session based on behavioral signals including mouse movement, scroll patterns, typing rhythm, and time on page. Low scores trigger challenges or silently return degraded results.

Browser fingerprinting checks WebGL renderer, canvas fingerprint, screen resolution, installed fonts, navigator properties, and JavaScript timing. Vanilla Playwright gets flagged quickly because the default configuration exposes known automation artifacts.

Request pattern analysis detects automated scrolling (perfectly uniform intervals), high-frequency page loads, and unusual referer/navigation patterns.

IP reputation scoring — datacenter IP ranges (AWS, GCP, Azure, DigitalOcean) are blocked almost immediately on Maps. Google maintains comprehensive IP range blocklists.

Mitigations

Hide Playwright automation signals:

async def create_stealth_context(browser):
    """Create a browser context with automation signals hidden."""
    context = await browser.new_context(
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
        viewport={"width": 1366, "height": 768},
        locale="en-US",
        timezone_id="America/New_York",
        geolocation={"latitude": 40.7128, "longitude": -74.0060},
        permissions=["geolocation"],
    )

    # Override navigator.webdriver
    await context.add_init_script("""
        Object.defineProperty(navigator, 'webdriver', {
            get: () => false,
        });
        Object.defineProperty(navigator, 'languages', {
            get: () => ['en-US', 'en'],
        });
        Object.defineProperty(navigator, 'plugins', {
            get: () => [1, 2, 3, 4, 5],
        });
    """)

    return context

Residential proxy rotation:

Residential proxies are the single most effective countermeasure against Google Maps blocking. ThorData provides 90M+ residential IPs across 190+ countries with per-request rotation. For Maps specifically, geo-targeting is important: searching for businesses in Chicago from a Japanese IP looks suspicious — use IPs from the same city/region as your target businesses.

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"

def get_proxy(country=None, city=None):
    """Build a geo-targeted ThorData proxy for Maps scraping."""
    user = THORDATA_USER
    if country:
        user = f"{user}-country-{country.upper()}"
    if city:
        user = f"{user}-city-{city}"
    return {
        "server": f"http://proxy.thordata.com:9000",
        "username": user,
        "password": THORDATA_PASS,
    }

# Scrape NYC businesses with US/NY residential IP
proxy = get_proxy(country="US")
result = asyncio.run(scrape_place(
    "https://www.google.com/maps/place/Joe's+Pizza/@40.7305,-73.9969,17z",
    proxy=proxy,
))
print(f"{result.get('name')}: {result.get('rating')}/5 ({result.get('review_count')} reviews)")

Rate limiting:

Even with proxies, Google Maps requires slow scraping. 2-3 places per minute is a safe pace. Going faster — even with different proxy IPs — triggers behavioral detection because the request patterns look like automation.

import random
import time

async def scrape_places_batch(urls: list, max_reviews_per_place: int = 30) -> list:
    """Scrape a batch of places with appropriate delays."""
    results = []

    for i, url in enumerate(urls):
        proxy = get_proxy(country="US")

        try:
            place = await scrape_place_with_reviews(
                url,
                max_reviews=max_reviews_per_place,
                proxy=proxy,
            )
            results.append(place)
            print(f"[{i+1}/{len(urls)}] {place.get('name', 'Unknown')}: "
                  f"{place.get('rating', '?')}/5, "
                  f"{len(place.get('reviews', []))} reviews")
        except Exception as e:
            print(f"[{i+1}/{len(urls)}] Failed: {e}")

        # 20-40 seconds between places
        await asyncio.sleep(random.uniform(20, 40))

    return results

Saving to SQLite

import sqlite3
from datetime import datetime

def save_places_to_db(places: list, db_path: str = "maps_data.db"):
    """Save scraped place data and reviews to SQLite."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")

    conn.execute("""
        CREATE TABLE IF NOT EXISTS places (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            name TEXT,
            address TEXT,
            phone TEXT,
            website TEXT,
            rating REAL,
            review_count INTEGER,
            categories TEXT,
            price_level TEXT,
            latitude REAL,
            longitude REAL,
            hours TEXT,
            url TEXT,
            scraped_at TEXT
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS reviews (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            place_id INTEGER,
            place_name TEXT,
            reviewer TEXT,
            stars INTEGER,
            text TEXT,
            date TEXT,
            photo_count INTEGER,
            owner_response TEXT,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (place_id) REFERENCES places(id)
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_places_rating ON places(rating)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_reviews_place ON reviews(place_id)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_reviews_stars ON reviews(stars)")

    now = datetime.utcnow().isoformat()

    for place in places:
        cursor = conn.execute("""
            INSERT INTO places
            (name, address, phone, website, rating, review_count,
             categories, price_level, latitude, longitude, hours, url, scraped_at)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            place.get("name"),
            place.get("address"),
            place.get("phone"),
            place.get("website"),
            place.get("rating"),
            place.get("review_count"),
            ",".join(place.get("categories", [])),
            place.get("price_level"),
            place.get("latitude"),
            place.get("longitude"),
            json.dumps(place.get("hours", {})),
            place.get("url"),
            now,
        ))
        place_id = cursor.lastrowid

        for review in place.get("reviews", []):
            conn.execute("""
                INSERT INTO reviews
                (place_id, place_name, reviewer, stars, text, date, photo_count, owner_response)
                VALUES (?, ?, ?, ?, ?, ?, ?, ?)
            """, (
                place_id,
                place.get("name"),
                review.get("reviewer"),
                review.get("stars"),
                review.get("text"),
                review.get("date"),
                review.get("photo_count"),
                review.get("owner_response"),
            ))

    conn.commit()
    conn.close()
    print(f"Saved {len(places)} places to {db_path}")

Practical Tips

Run in headed mode during development. Google Maps behavior is much easier to debug when you can see what is happening. Set headless=False while building and testing selectors.

Block images via route interception. Reviews do not need photos to load, and blocking image requests cuts page weight by 60-70%, speeding up each scrape significantly.

Handle the EU consent screen. In Europe, Google shows a cookie consent popup that blocks everything until you accept. Detect it by checking for consent.google.com in the URL and click accept.

Cache Place IDs instead of URLs. Google Place IDs are stable identifiers like ChIJmQJIxlVYwokRLgeuocVOGVQ. URLs can change format, but Place IDs persist. Extract the Place ID from the URL if present.

Handle "Closed permanently" and redirects. Some places have closed or been rebranded. Check for redirect URLs and "Closed permanently" indicators in the response.

The selectors will break. Google changes Maps class names regularly — sometimes every few weeks. When your selector stops working, open DevTools on a live Maps page and find the updated structure. This is unavoidable with Google products.

Google Maps scraping is a constant cat-and-mouse game, but the data is extremely valuable for local SEO, market research, competitive analysis, and training datasets for geographic AI models. With ThorData residential proxies, careful rate limiting, and stealth Playwright configuration, you can collect comprehensive local business data at scale.