← All posts

How to Scrape Google Search Results With Python (2026 Guide)

March 29, 2026 · 24 min read
Contents Introduction Why scraping Google is hard Environment setup Approach 1: Raw HTTP requests Approach 2: Headless browser with Playwright Approach 3: SERP APIs and managed scrapers Anti-detection deep dive Proxy rotation: the missing piece Output format and data schema Parsing SERP features (PAA, snippets, local) Rate limiting strategies Common errors and fixes Real-world use cases Comparison table What I actually use

Introduction

Google processes over 8.5 billion searches per day. The data contained in those search results -- rankings, snippets, featured answers, People Also Ask boxes, local results, and knowledge panels -- is some of the most valuable web data available. Whether you are building an SEO monitoring tool, conducting competitive research, tracking brand mentions, or feeding data into a market analysis pipeline, programmatic access to Google search results is frequently the starting point.

The problem is that Google really does not want you scraping their results. They offer official APIs, but those APIs return results from a Custom Search Engine that does not match the actual Google SERP. The real search results -- the ones your customers see, the ones that determine whether your SEO strategy is working -- are only available by actually querying google.com and parsing the HTML response.

This guide covers every practical approach to getting Google SERP data in 2026, from the simplest free method that works for a handful of queries to production-grade solutions that can handle thousands of queries per day. I have tested each approach over the past year while building data pipelines, and I will be honest about where each one breaks down. There is no magic solution that gives you unlimited free SERP data -- every approach involves trade-offs between cost, reliability, volume, and maintenance effort.

By the end of this guide, you will know exactly which approach fits your use case, have working Python code for each method, understand the anti-detection techniques that keep your scraper running, and know how to structure the extracted data for analysis.

Why scraping Google is hard

Google's anti-scraping defenses are among the most sophisticated on the web. Unlike most websites that rely on simple rate limiting, Google uses multiple detection layers that work together:

The honest truth: there is no free, reliable, zero-maintenance way to scrape Google at scale. Every approach involves cost -- either your time maintaining a scraper, money for proxies or APIs, or both. The question is which cost structure makes sense for your specific use case.

Environment setup

Before you start writing scraping code, set up a clean Python environment with the tools you will need across all approaches:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Project structure

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Approach 1: Raw HTTP requests (works for small volumes)

The simplest approach. Send a GET request with a browser-like User-Agent and parse the HTML response. This works for a small number of queries from a residential IP address.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Limitations of raw HTTP

Legal note: Google's Terms of Service prohibit automated scraping. This article is for educational purposes. For production SERP data needs, consider the official API or managed services covered in Approach 3.

Approach 2: Headless browser with Playwright

A headless browser executes JavaScript, handles cookies properly, and presents a real browser fingerprint. This gets past most basic bot detection and renders SERP features that require JS.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Why Playwright still fails at scale

Even with stealth patches, headless Chrome is detectable through several signals that are hard to fully mask:

Libraries like playwright-stealth or undetected-chromedriver patch many of these, but Google updates their detection regularly. You will spend time maintaining your stealth patches, and some percentage of requests will always fail.

Approach 3: SERP APIs and managed scrapers

If you need reliable SERP data for a product or ongoing project, managed services handle the cat-and-mouse game for you. Here are the main options:

Option A: Google Custom Search JSON API (official)

Google offers a Programmable Search Engine API. It gives you 100 queries per day free, then $5 per 1,000 queries.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

The catch: results come from a Custom Search Engine, not the main Google index. They are close but not identical to what users see on google.com. Featured snippets, People Also Ask, and local results are not included. For SEO monitoring, this is usually insufficient because you need to know exactly what the real SERP looks like.

Option B: Apify Google Search Scraper

Apify actors run managed scraper infrastructure. You define search queries, they handle proxies, browser fingerprinting, and CAPTCHA solving. The results match the actual SERP, including featured snippets, PAA boxes, and local results. Pricing is per compute unit used, which maps roughly to per-query cost.

# apify_serp.py
from apify_client import ApifyClient
import json

def search_via_apify(queries: list[str], max_results: int = 10) -> list[dict]:
    """Run Google Search Scraper on Apify."""
    client = ApifyClient("your-apify-api-token")

    run = client.actor("cryptosignals/google-search-scraper").call(
        run_input={
            "queries": queries,
            "maxPagesPerQuery": 1,
            "resultsPerPage": max_results,
            "languageCode": "en",
            "countryCode": "us",
        }
    )

    results = []
    for item in client.dataset(run["defaultDatasetId"]).iterate_items():
        results.append(item)

    return results

Option C: SerpAPI, ScraperAPI, Bright Data

Several dedicated services specialize in SERP data. They typically charge $50-100/month for a few thousand queries. Each has trade-offs:

Anti-detection deep dive

Whether you use raw HTTP or Playwright, these techniques improve your success rate against Google's bot detection:

1. TLS fingerprint rotation

Google checks TLS fingerprints (JA3 hashes) to identify client software. Python's requests library has a distinctive fingerprint. To mitigate this:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

2. Cookie persistence

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

3. Query pattern randomization

# Avoid machine-like patterns
import random

def randomize_query_params(query: str) -> dict:
    """Add natural variation to search parameters."""
    params = {"q": query}

    # Randomly include/exclude optional parameters
    if random.random() > 0.5:
        params["num"] = random.choice([10, 20])
    if random.random() > 0.7:
        params["safe"] = "off"
    if random.random() > 0.6:
        params["hl"] = "en"

    return params

4. Request header completeness

Missing headers are a red flag. A real Chrome browser sends 10+ headers with every request. Your scraper should too:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Proxy rotation: the missing piece

Regardless of which scraping approach you choose, you will hit IP-based rate limits fast without proxies. Here is what you need to know about proxy types and how to use them effectively:

Proxy types compared

TypeCostGoogle Success RateBest For
Datacenter$1-5/GB5-15%Non-Google targets
Residential rotating$5-15/GB60-80%Google scraping
ISP (static residential)$15-30/GB80-95%High-value queries
Mobile$20-40/GB90%+Maximum stealth

Implementing proxy rotation

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

For proxy providers, I have had good results with ThorData for residential proxy rotation. Their rotating residential pool works well for search engine scraping specifically, and the per-GB pricing is competitive. The geo-targeting feature is particularly useful because Google serves different results based on the requester's location -- if you are monitoring US SERPs, you need US exit IPs.

Tip: Whatever proxy provider you use, test with a small batch first. Rotate user agents alongside IP rotation. Add random delays of 3-10 seconds between requests. Human browsing is irregular -- your scraper should be too.

Output format and data schema

Here is the complete JSON schema for a parsed Google SERP. This covers organic results, featured snippets, People Also Ask, and local results:

{
  "query": "best python web frameworks 2026",
  "search_metadata": {
    "timestamp": "2026-03-29T14:30:00Z",
    "language": "en",
    "country": "us",
    "device": "desktop",
    "total_results_estimate": "About 2,340,000 results"
  },
  "featured_snippet": {
    "type": "paragraph",
    "text": "Django remains the most popular Python web framework in 2026...",
    "source_url": "https://example.com/python-frameworks",
    "source_title": "Top Python Web Frameworks"
  },
  "people_also_ask": [
    "What is the fastest Python web framework?",
    "Is Django still relevant in 2026?",
    "What is the difference between Flask and FastAPI?"
  ],
  "organic_results": [
    {
      "position": 1,
      "title": "Top 10 Python Web Frameworks for 2026",
      "url": "https://example.com/top-python-frameworks",
      "displayed_url": "example.com > python > frameworks",
      "snippet": "Comprehensive comparison of Django, FastAPI, Flask, and more...",
      "date": "Mar 15, 2026",
      "sitelinks": []
    },
    {
      "position": 2,
      "title": "FastAPI vs Django in 2026: Which Should You Choose?",
      "url": "https://blog.example.com/fastapi-vs-django",
      "displayed_url": "blog.example.com",
      "snippet": "A detailed comparison covering performance, ecosystem...",
      "date": "",
      "sitelinks": []
    }
  ],
  "local_results": [],
  "related_searches": [
    "python web framework benchmark 2026",
    "fastapi tutorial beginner",
    "django vs flask performance"
  ]
}

Parsing SERP features (PAA, snippets, local packs)

Modern Google SERPs contain far more than just blue links. Here is how to extract the most important SERP features:

People Also Ask (PAA)

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Featured snippets

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Related searches

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Rate limiting strategies

Google's rate limits vary by IP type and query pattern. Here are the practical thresholds I have observed:

IP TypeQueries Before CAPTCHARecovery Time
Datacenter (no proxy)2-54-12 hours
Residential (single IP)20-40 per hour30-60 minutes
Residential (rotating pool)200+ per hourPer-IP cooldown
Mobile proxy50-80 per hour15-30 minutes

Best practices for staying under the radar:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/google-search-scraper').call(
    run_input={'queries': ['web scraping tools'], 'maxPagesPerQuery': 2, 'countryCode': 'us'}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Common errors and fixes

HTTP 429 Too Many Requests

Cause: You exceeded Google's rate limit for your IP.

Fix: Switch to a different proxy immediately. The current IP needs 30-60 minutes of cooldown. Increase delays between requests.

CAPTCHA / "Unusual traffic" page

Cause: Google suspects automated access. Can be triggered by rate, fingerprint, or behavioral signals.

Fix: Rotate proxy and user agent. Add stealth patches if using Playwright. Ensure you are sending complete headers including Sec-Fetch-* headers.

Empty results despite 200 OK

Cause: Google served a JavaScript-dependent page. Raw HTTP requests cannot execute JS.

Fix: Switch to Playwright (Approach 2) or use curl_cffi with browser impersonation for better page rendering.

Results do not match manual search

Cause: Google personalizes results based on location, search history, and language. Your scraper's IP location differs from your manual search location.

Fix: Set gl and hl parameters explicitly. Use a proxy from the same geographic region as your target audience. Add &pws=0 to disable personalization (not always effective).

Selectors break after a few weeks

Cause: Google A/B tests different HTML structures constantly. Class names like .VwiC3b are generated and change during frontend deployments.

Fix: Use multiple fallback selectors. Prefer data attributes ([data-sncf]) over class names where possible. Build a monitoring system that alerts you when extraction rates drop below a threshold.

Real-world use cases

1. SEO rank tracking

The most common use case for SERP scraping. Track where your website ranks for target keywords over time. Compare your positions against competitors. Monitor for ranking drops that need investigation. Key data: position, url, featured_snippet presence.

2. Content gap analysis

Scrape SERPs for keywords in your niche and analyze what types of content rank. Are the top results how-to guides, listicles, tools, or reference docs? What questions appear in PAA? This tells you what content to create. Key data: title, snippet, people_also_ask.

3. SERP feature monitoring

Track which queries trigger featured snippets, knowledge panels, video carousels, or local packs. Changes in SERP features affect click-through rates dramatically. Key data: all SERP feature types, featured_snippet.source_url.

4. Competitor monitoring

Track competitor domains across hundreds of keywords to understand their SEO strategy. Identify keywords where they rank but you do not. Monitor new pages they publish that start ranking. Key data: url, position, displayed_url.

5. Lead generation

Search for business-related queries (e.g., "plumber in Chicago") and extract the URLs and business names from local results and organic listings. Useful for building B2B prospecting lists. Key data: url, title, local_results.

6. Academic research

Researchers study search engine bias, information quality, and algorithmic curation by analyzing SERP composition across different queries and regions. Systematic SERP data collection enables large-scale empirical studies. Key data: full SERP structure including all feature types.

Comparison: what works when

Method Cost Volume Reliability Maintenance
Raw requests (no proxy) Free 20-50/day Low High
Raw requests + residential proxies $5-15/GB 500-2k/day Medium Medium
Playwright + stealth + proxies $10-20/GB 200-1k/day Medium-High Medium
Google CSE API $5/1k queries Unlimited High Low
SERP API (SerpAPI, etc.) $50-100/mo 5k-50k/mo High Low
Apify managed scraper Pay per result Unlimited High Low

What I actually use

For quick one-off research: raw requests with curl_cffi for TLS impersonation plus a residential proxy. Good enough for grabbing a few pages of results without setting up any infrastructure.

For production pipelines: a managed SERP API. I started by maintaining my own Playwright scraper with proxy rotation and stealth patches, but I was spending more time fixing breakage than building features. Google's detection evolves weekly. The managed services cost money, but they cost less than your time debugging at 2am when your rank tracker stops working.

For ad-hoc data collection where I need actual Google results at moderate scale, Apify's scraper actors hit the sweet spot -- you can customize exactly what data you extract and only pay for successful results.

The general rule: if you are scraping Google fewer than 50 times a day, raw requests with a good user agent and residential proxy are fine. Beyond that, you need a proper proxy rotation setup. Beyond a few hundred queries a day, just pay for a managed service -- your time is worth more than the subscription cost, and the reliability difference is significant.

Key takeaway: Start simple. Use raw requests for small-scale needs. Graduate to Playwright when you need SERP features. Switch to managed APIs when maintenance cost exceeds subscription cost. Do not over-engineer your first version.

Built by Crypto Volume Signal Scanner -- tools for developers who work with web data. See also: Scraping AliExpress Products | LinkedIn Data Without the API | YouTube Stats Without the API


Try Apify free — the platform powering these scrapers. Get started →