Scrape Google Maps Reviews & Business Data with Python and Playwright (2026)
Scrape Google Maps Reviews & Business Data with Python and Playwright (2026)
Google Maps is a goldmine for business data — reviews, ratings, operating hours, photos, pricing info, and geolocation coordinates for millions of places. But Google protects it aggressively. Their anti-bot detection in 2026 is some of the most sophisticated on the web. The Maps interface is heavily JavaScript-driven, data loads dynamically through scroll events, and they fingerprint browsers at multiple layers.
This guide covers what actually works: Playwright-based automation for dynamic content, DOM selector strategies, review scrolling, multi-place pipelines, and the proxy infrastructure needed for any meaningful scale.
What You Can Extract
From a Google Maps place listing:
- Business name, address, phone number, website
- Overall rating and total review count
- Individual reviews — text, star rating, date, reviewer name, helpful vote count, photos
- Business hours by day of week, plus holiday hours
- Popular times data (hourly visit patterns by day)
- Photos and their categories (exterior, interior, food, menu, etc.)
- Price level ($, $$, $$$, $$$$) and business categories
- Plus Code and exact coordinates (from URL)
- Accessibility features, amenities, and service options
The Google Places API vs. Scraping
Google offers a legitimate Places API. The problems:
- Charges $0.017 per request for basic data, $0.02-0.04 for details
- The Places API only returns the 5 most relevant reviews per place — not useful for comprehensive review data
- For 10,000 businesses with reviews, you are looking at $400-700+
Scraping gives you everything, for free, but you are fighting their bot detection. For small datasets (under a few hundred places), the scraping approach is fine. For larger pipelines, budget for proxy infrastructure.
Setup
pip install playwright selectolax
playwright install chromium
Playwright's Python bindings are async-first. All examples below use asyncio.
Basic Place Scraper
import asyncio
import json
import re
from playwright.async_api import async_playwright
async def scrape_place(url: str, proxy: dict = None) -> dict:
"""Scrape a Google Maps place listing for business data."""
async with async_playwright() as p:
launch_args = {
"headless": True,
"args": [
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--no-sandbox",
],
}
if proxy:
launch_args["proxy"] = proxy
browser = await p.chromium.launch(**launch_args)
context = await browser.new_context(
user_agent=(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
viewport={"width": 1280, "height": 900},
locale="en-US",
geolocation={"latitude": 40.7128, "longitude": -74.0060},
permissions=["geolocation"],
)
page = await context.new_page()
# Block images to speed up loading (reviews don't need photos)
await page.route(
"**/*.{png,jpg,jpeg,gif,webp,svg}",
lambda route: route.abort()
)
await page.goto(url, wait_until="domcontentloaded", timeout=30000)
await asyncio.sleep(3)
# Handle EU cookie consent popup
try:
accept_btn = await page.query_selector("button[aria-label*='Accept all']")
if not accept_btn:
accept_btn = await page.query_selector("form[action*='consent'] button")
if accept_btn:
await accept_btn.click()
await asyncio.sleep(1)
except Exception:
pass
place = {"url": url}
# Business name
for selector in ["h1", "h1.DUwDvf", "h1.fontHeadlineLarge"]:
name_el = await page.query_selector(selector)
if name_el:
name = (await name_el.inner_text()).strip()
if name:
place["name"] = name
break
# Overall rating
rating_el = await page.query_selector("div.F7nice span[aria-hidden='true']")
if rating_el:
try:
place["rating"] = float(await rating_el.inner_text())
except ValueError:
pass
# Review count
review_el = await page.query_selector("button[aria-label*='reviews']")
if review_el:
aria = await review_el.get_attribute("aria-label")
match = re.search(r"([\d,]+)\s+review", aria or "")
if match:
place["review_count"] = int(match.group(1).replace(",", ""))
# Address
addr_el = await page.query_selector("button[data-item-id='address'] div.Io6YTe")
if addr_el:
place["address"] = (await addr_el.inner_text()).strip()
# Phone
phone_el = await page.query_selector("button[data-item-id*='phone'] div.Io6YTe")
if phone_el:
place["phone"] = (await phone_el.inner_text()).strip()
# Website
website_el = await page.query_selector("a[data-item-id='authority'] div.Io6YTe")
if website_el:
place["website"] = (await website_el.inner_text()).strip()
# Categories
category_els = await page.query_selector_all("button[jsaction*='category']")
if not category_els:
category_els = await page.query_selector_all("span.DkEaL")
categories = []
for el in category_els[:5]:
text = (await el.inner_text()).strip()
if text:
categories.append(text)
if categories:
place["categories"] = categories
# Price level
price_el = await page.query_selector("span.ZDu9vd")
if price_el:
place["price_level"] = (await price_el.inner_text()).strip()
# Business hours
hours = await extract_hours(page)
if hours:
place["hours"] = hours
# Coordinates from URL
current_url = page.url
coord_match = re.search(r"@(-?\d+\.\d+),(-?\d+\.\d+)", current_url)
if coord_match:
place["latitude"] = float(coord_match.group(1))
place["longitude"] = float(coord_match.group(2))
await browser.close()
return place
async def extract_hours(page) -> dict:
"""Extract business hours from a Maps page."""
hours = {}
# Try the hours table (appears after clicking hours section)
hours_btn = await page.query_selector("button[data-item-id='oh']")
if hours_btn:
await hours_btn.click()
await asyncio.sleep(1)
rows = await page.query_selector_all("table.eK4R0e tr, tr.y0skZc")
for row in rows:
cells = await row.query_selector_all("td, th")
if len(cells) >= 2:
day = (await cells[0].inner_text()).strip()
hours_text = (await cells[1].inner_text()).strip()
if day:
hours[day] = hours_text
return hours
Scrolling Reviews
The reviews section loads lazily — Google shows 3-5 initially and loads more as you scroll. Each scroll triggers an XHR for the next batch.
async def scrape_reviews(page, max_reviews: int = 100) -> list:
"""Scroll through and extract reviews from a Maps place page."""
# Navigate to Reviews tab
for selector in [
"button[aria-label*='Reviews']",
"button[aria-label*='review']",
"div[data-tab-index='1']",
]:
reviews_btn = await page.query_selector(selector)
if reviews_btn:
await reviews_btn.click()
await asyncio.sleep(2)
break
# Sort by Newest for chronological collection
sort_btn = await page.query_selector("button[aria-label='Sort reviews'], button[data-value='Sort']")
if sort_btn:
await sort_btn.click()
await asyncio.sleep(1)
# Select Newest option
for option_selector in ["li[data-index='1']", "li[role='menuitemradio']:nth-child(2)"]:
newest = await page.query_selector(option_selector)
if newest:
await newest.click()
await asyncio.sleep(2)
break
reviews = []
last_count = 0
stalled_rounds = 0
max_stalled = 8
# Find the scrollable reviews container
scrollable_selectors = [
"div.m6QErb.DxyBCb.kA9KIf.dS8AEf",
"div[jsaction*='scrollable']",
"div.m6QErb",
]
scrollable = None
for sel in scrollable_selectors:
scrollable = await page.query_selector(sel)
if scrollable:
break
while len(reviews) < max_reviews and stalled_rounds < max_stalled:
# Expand "More" buttons before extracting
more_btns = await page.query_selector_all("button.w8nwRe, button[aria-label*='See more']")
for btn in more_btns:
try:
await btn.click()
await asyncio.sleep(0.2)
except Exception:
pass
# Extract all visible review elements
review_els = await page.query_selector_all("div.jftiEf, div[data-review-id]")
for el in review_els[len(reviews):]:
review = {}
name_el = await el.query_selector("div.d4r55, .WNxzHc a")
if name_el:
review["reviewer"] = (await name_el.inner_text()).strip()
stars_el = await el.query_selector("span.kvMYJc, span[aria-label*='star']")
if stars_el:
aria = await stars_el.get_attribute("aria-label")
match = re.search(r"(\d)", aria or "")
if match:
review["stars"] = int(match.group(1))
text_el = await el.query_selector("span.wiI7pd, div.MyEned span")
if text_el:
review["text"] = (await text_el.inner_text()).strip()
date_el = await el.query_selector("span.rsqaWe, span[class*='date']")
if date_el:
review["date"] = (await date_el.inner_text()).strip()
# Photo count
photos_el = await el.query_selector("button[aria-label*='photo']")
if photos_el:
aria = await photos_el.get_attribute("aria-label")
match = re.search(r"(\d+)", aria or "")
if match:
review["photo_count"] = int(match.group(1))
# Owner response
response_el = await el.query_selector("div.wiI7pd ~ div.wiI7pd")
if response_el:
review["owner_response"] = (await response_el.inner_text()).strip()[:200]
if review.get("reviewer") or review.get("text"):
reviews.append(review)
if len(reviews) == last_count:
stalled_rounds += 1
else:
stalled_rounds = 0
last_count = len(reviews)
# Scroll down in reviews container
if scrollable:
await scrollable.evaluate("el => el.scrollTop = el.scrollHeight")
else:
await page.keyboard.press("End")
await asyncio.sleep(1.5)
return reviews[:max_reviews]
# Full place scrape with reviews
async def scrape_place_with_reviews(url: str, max_reviews: int = 50, proxy: dict = None) -> dict:
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled", "--no-sandbox"],
proxy=proxy,
)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
viewport={"width": 1280, "height": 900},
locale="en-US",
)
page = await context.new_page()
await page.route("**/*.{png,jpg,jpeg,gif,webp}", lambda r: r.abort())
await page.goto(url, wait_until="domcontentloaded")
await asyncio.sleep(3)
place = await scrape_place(url, proxy=proxy)
reviews = await scrape_reviews(page, max_reviews=max_reviews)
place["reviews"] = reviews
await browser.close()
return place
Searching for Places
To build a dataset, start with a category search and collect all results.
async def search_places(query: str, max_results: int = 20, proxy: dict = None) -> list:
"""Search Google Maps and return place URLs."""
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
args=["--disable-blink-features=AutomationControlled"],
proxy=proxy,
)
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
viewport={"width": 1280, "height": 900},
)
page = await context.new_page()
search_url = f"https://www.google.com/maps/search/{query.replace(' ', '+')}"
await page.goto(search_url, wait_until="domcontentloaded")
await asyncio.sleep(3)
urls = set()
scroll_count = 0
feed = await page.query_selector("div[role='feed']")
while len(urls) < max_results and scroll_count < 20:
# Extract place links
items = await page.query_selector_all("a[href*='/maps/place/']")
for item in items:
href = await item.get_attribute("href")
if href and "maps/place" in href:
# Normalize to canonical URL
match = re.search(r"(/maps/place/[^@?]+@[^/]+)", href)
if match:
canonical = f"https://www.google.com{match.group(1)}"
urls.add(canonical)
if len(urls) >= max_results:
break
# Scroll the results feed
if feed:
await feed.evaluate("el => el.scrollTop = el.scrollHeight")
await asyncio.sleep(2)
scroll_count += 1
await browser.close()
return list(urls)[:max_results]
# Find coffee shops in NYC
urls = asyncio.run(search_places("coffee shops Manhattan New York", max_results=20))
for url in urls[:5]:
print(url)
Google Anti-Bot Detection
Google Maps uses multiple detection layers:
reCAPTCHA v3 runs silently, scoring every session based on behavioral signals including mouse movement, scroll patterns, typing rhythm, and time on page. Low scores trigger challenges or silently return degraded results.
Browser fingerprinting checks WebGL renderer, canvas fingerprint, screen resolution, installed fonts, navigator properties, and JavaScript timing. Vanilla Playwright gets flagged quickly because the default configuration exposes known automation artifacts.
Request pattern analysis detects automated scrolling (perfectly uniform intervals), high-frequency page loads, and unusual referer/navigation patterns.
IP reputation scoring — datacenter IP ranges (AWS, GCP, Azure, DigitalOcean) are blocked almost immediately on Maps. Google maintains comprehensive IP range blocklists.
Mitigations
Hide Playwright automation signals:
async def create_stealth_context(browser):
"""Create a browser context with automation signals hidden."""
context = await browser.new_context(
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
viewport={"width": 1366, "height": 768},
locale="en-US",
timezone_id="America/New_York",
geolocation={"latitude": 40.7128, "longitude": -74.0060},
permissions=["geolocation"],
)
# Override navigator.webdriver
await context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => false,
});
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en'],
});
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
""")
return context
Residential proxy rotation:
Residential proxies are the single most effective countermeasure against Google Maps blocking. ThorData provides 90M+ residential IPs across 190+ countries with per-request rotation. For Maps specifically, geo-targeting is important: searching for businesses in Chicago from a Japanese IP looks suspicious — use IPs from the same city/region as your target businesses.
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
def get_proxy(country=None, city=None):
"""Build a geo-targeted ThorData proxy for Maps scraping."""
user = THORDATA_USER
if country:
user = f"{user}-country-{country.upper()}"
if city:
user = f"{user}-city-{city}"
return {
"server": f"http://proxy.thordata.com:9000",
"username": user,
"password": THORDATA_PASS,
}
# Scrape NYC businesses with US/NY residential IP
proxy = get_proxy(country="US")
result = asyncio.run(scrape_place(
"https://www.google.com/maps/place/Joe's+Pizza/@40.7305,-73.9969,17z",
proxy=proxy,
))
print(f"{result.get('name')}: {result.get('rating')}/5 ({result.get('review_count')} reviews)")
Rate limiting:
Even with proxies, Google Maps requires slow scraping. 2-3 places per minute is a safe pace. Going faster — even with different proxy IPs — triggers behavioral detection because the request patterns look like automation.
import random
import time
async def scrape_places_batch(urls: list, max_reviews_per_place: int = 30) -> list:
"""Scrape a batch of places with appropriate delays."""
results = []
for i, url in enumerate(urls):
proxy = get_proxy(country="US")
try:
place = await scrape_place_with_reviews(
url,
max_reviews=max_reviews_per_place,
proxy=proxy,
)
results.append(place)
print(f"[{i+1}/{len(urls)}] {place.get('name', 'Unknown')}: "
f"{place.get('rating', '?')}/5, "
f"{len(place.get('reviews', []))} reviews")
except Exception as e:
print(f"[{i+1}/{len(urls)}] Failed: {e}")
# 20-40 seconds between places
await asyncio.sleep(random.uniform(20, 40))
return results
Saving to SQLite
import sqlite3
from datetime import datetime
def save_places_to_db(places: list, db_path: str = "maps_data.db"):
"""Save scraped place data and reviews to SQLite."""
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("""
CREATE TABLE IF NOT EXISTS places (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
address TEXT,
phone TEXT,
website TEXT,
rating REAL,
review_count INTEGER,
categories TEXT,
price_level TEXT,
latitude REAL,
longitude REAL,
hours TEXT,
url TEXT,
scraped_at TEXT
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS reviews (
id INTEGER PRIMARY KEY AUTOINCREMENT,
place_id INTEGER,
place_name TEXT,
reviewer TEXT,
stars INTEGER,
text TEXT,
date TEXT,
photo_count INTEGER,
owner_response TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (place_id) REFERENCES places(id)
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_places_rating ON places(rating)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_reviews_place ON reviews(place_id)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_reviews_stars ON reviews(stars)")
now = datetime.utcnow().isoformat()
for place in places:
cursor = conn.execute("""
INSERT INTO places
(name, address, phone, website, rating, review_count,
categories, price_level, latitude, longitude, hours, url, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
place.get("name"),
place.get("address"),
place.get("phone"),
place.get("website"),
place.get("rating"),
place.get("review_count"),
",".join(place.get("categories", [])),
place.get("price_level"),
place.get("latitude"),
place.get("longitude"),
json.dumps(place.get("hours", {})),
place.get("url"),
now,
))
place_id = cursor.lastrowid
for review in place.get("reviews", []):
conn.execute("""
INSERT INTO reviews
(place_id, place_name, reviewer, stars, text, date, photo_count, owner_response)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
place_id,
place.get("name"),
review.get("reviewer"),
review.get("stars"),
review.get("text"),
review.get("date"),
review.get("photo_count"),
review.get("owner_response"),
))
conn.commit()
conn.close()
print(f"Saved {len(places)} places to {db_path}")
Practical Tips
Run in headed mode during development. Google Maps behavior is much easier to debug when you can see what is happening. Set headless=False while building and testing selectors.
Block images via route interception. Reviews do not need photos to load, and blocking image requests cuts page weight by 60-70%, speeding up each scrape significantly.
Handle the EU consent screen. In Europe, Google shows a cookie consent popup that blocks everything until you accept. Detect it by checking for consent.google.com in the URL and click accept.
Cache Place IDs instead of URLs. Google Place IDs are stable identifiers like ChIJmQJIxlVYwokRLgeuocVOGVQ. URLs can change format, but Place IDs persist. Extract the Place ID from the URL if present.
Handle "Closed permanently" and redirects. Some places have closed or been rebranded. Check for redirect URLs and "Closed permanently" indicators in the response.
The selectors will break. Google changes Maps class names regularly — sometimes every few weeks. When your selector stops working, open DevTools on a live Maps page and find the updated structure. This is unavoidable with Google products.
Google Maps scraping is a constant cat-and-mouse game, but the data is extremely valuable for local SEO, market research, competitive analysis, and training datasets for geographic AI models. With ThorData residential proxies, careful rate limiting, and stealth Playwright configuration, you can collect comprehensive local business data at scale.