← Back to blog

How to Scrape DoorDash Restaurant Data in 2026 (Menus, Delivery Zones, ETAs)

How to Scrape DoorDash Restaurant Data in 2026 (Menus, Delivery Zones, ETAs)

DoorDash has no public API for third-party developers. If you need restaurant data for market research, price comparison, or delivery analytics, you have to extract it yourself.

The good news: DoorDash's frontend relies on a GraphQL API that returns structured JSON. Once you understand the endpoint, you can pull menus, delivery fees, ETAs, and ratings without parsing HTML.

What Data You Can Extract

DoorDash exposes a lot through its internal API:

What you won't get without an account: order history, driver data, or real-time driver locations.

Table of Contents

  1. Understanding DoorDash's Architecture
  2. Finding the GraphQL Endpoint
  3. Anti-Bot Defenses and How to Handle Them
  4. Setting Up Your Scraping Environment
  5. Core Scraping Code
  6. Finding Store IDs at Scale
  7. Handling Rate Limits and Blocks
  8. Storing and Analyzing the Data
  9. Building a Multi-City Scraper
  10. Proxy Strategy with ThorData
  11. Playwright-Based Fallback
  12. Real-World Use Cases
  13. Legal Considerations
  14. Performance Optimization

Understanding DoorDash's Architecture {#architecture}

Before writing any code, it helps to understand how DoorDash actually works under the hood. The website is a React single-page application. When you visit a restaurant page, the browser loads a mostly empty HTML shell and then fires off API calls to populate the content.

These API calls go to DoorDash's internal GraphQL endpoint. GraphQL is a query language where the client specifies exactly what fields it wants in the response. This is actually great for scraping: the API returns clean, structured JSON without any HTML parsing.

The key insight is that these API calls use the same endpoint your browser uses. If you replicate the same HTTP requests with the right headers, you get the same data.

DoorDash runs on: - CloudFront CDN for static assets and some API responses - Amazon WAF (Web Application Firewall) for bot detection - Internal rate limiting per IP and per session - TLS fingerprinting via their CDN layer

Understanding this stack tells you what defenses you need to bypass.


Finding the GraphQL Endpoint {#graphql}

Open DoorDash in Chrome, navigate to any restaurant page, and open DevTools (F12). Go to the Network tab and filter by "Fetch/XHR".

Reload the page. You'll see a flood of POST requests to:

https://www.doordash.com/graphql

Each request carries: 1. An operationName field identifying what it's fetching 2. A variables object with query parameters 3. A query string with the GraphQL query definition

Click on any of these requests to inspect the payload and response. The two operations most useful for restaurant data:

You can also find the full schema by looking at the __schema introspection queries that DoorDash's own frontend makes. This reveals every available field.


Anti-Bot Defenses and How to Handle Them {#anti-bot}

DoorDash runs aggressive bot detection. A naive requests.get() call returns a 403 or redirects to a CAPTCHA page. Here's what you're actually up against:

TLS Fingerprinting

DoorDash checks your TLS handshake characteristics. Standard Python requests and even httpx use a recognizable TLS fingerprint that gets flagged. Real browsers (Chrome, Firefox) negotiate TLS differently — different cipher suites, different extensions, different ALPN protocols.

The fix: use curl-cffi, a Python library that wraps libcurl and lets you impersonate specific browser TLS fingerprints.

from curl_cffi import requests as cffi_requests

session = cffi_requests.Session(impersonate="chrome120")
resp = session.get("https://www.doordash.com/graphql")

This makes your TLS handshake look identical to Chrome 120's.

HTTP/2 Fingerprinting

Beyond TLS, HTTP/2 frames have a fingerprint too — window sizes, header ordering, SETTINGS frames. curl-cffi handles this correctly when you set the impersonation target.

CloudFront Bot Detection

Amazon's WAF analyzes request patterns: request timing, header ordering, missing browser-specific headers. You need to send headers in the right order with the right values.

Rate Limiting

More than ~30-40 requests per minute from a single IP triggers soft blocks (increasing delays) and eventually hard blocks (403s that don't resolve).

Session Tracking

DoorDash sets cookies when you first visit. Subsequent requests without those cookies look bot-like. Always carry cookies across your session.


Setting Up Your Scraping Environment {#setup}

Install the required packages:

pip install curl-cffi httpx requests beautifulsoup4 pandas sqlite3 asyncio aiohttp

For the full production setup that handles all DoorDash defenses:

pip install curl-cffi asyncio pandas sqlite3

Basic session setup with TLS impersonation:

from curl_cffi import requests as cffi_requests
import json
import time
import random

# Proxy configuration (residential proxy recommended)
PROXY_URL = "http://USERNAME:[email protected]:9000"

# Browser-like headers - these must be in the right order
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept": "application/json",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Content-Type": "application/json",
    "Referer": "https://www.doordash.com/",
    "Origin": "https://www.doordash.com",
    "x-channel-id": "marketplace",
    "x-client-version": "24.0.0",
    "x-experience-id": "doordash",
    "Sec-Ch-Ua": '"Not/A)Brand";v="8", "Chromium";v="126", "Google Chrome";v="126"',
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": '"macOS"',
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
}

GRAPHQL_URL = "https://www.doordash.com/graphql"


def create_session():
    """Create a session with Chrome TLS fingerprint."""
    session = cffi_requests.Session(impersonate="chrome120")
    session.headers.update(HEADERS)
    if PROXY_URL:
        session.proxies = {"https": PROXY_URL, "http": PROXY_URL}
    return session


def warm_up_session(session):
    """Visit homepage first to get cookies, like a real browser would."""
    try:
        session.get("https://www.doordash.com/", timeout=15)
        time.sleep(random.uniform(1.5, 3.0))
    except Exception as e:
        print(f"Warmup failed (continuing anyway): {e}")

Core Scraping Code {#core-code}

Fetching Restaurant Details

def get_store_details(session, store_id: int) -> dict:
    """Fetch restaurant metadata: name, rating, delivery info, hours."""
    payload = {
        "operationName": "getStoreDetails",
        "variables": {
            "storeId": store_id,
            "consumerAddressId": None,
            "fetchMenuCategories": False,
        },
        "query": """
        query getStoreDetails($storeId: Int!, $consumerAddressId: BigInt) {
            storeDetails: storeV2(storeId: $storeId) {
                id
                name
                phoneNumber
                description
                coverImgUrl
                averageRating
                numRatings
                priceRange
                address {
                    street
                    city
                    state
                    zipCode
                    lat
                    lng
                    countryCode
                }
                businessHours {
                    dayOfWeek
                    openTime
                    closeTime
                    isCurrentlyOpen
                }
                deliveryFee
                serviceFee
                minOrderAmount
                estimatedDeliveryTime
                deliveryRadius
                isDashPassEligible
                headerImgUrl
                cuisineType
                tags
                isOpen
                nextOpenTime
                businessType
            }
        }"""
    }

    resp = session.post(GRAPHQL_URL, json=payload, timeout=20)
    resp.raise_for_status()
    data = resp.json()

    if "errors" in data:
        raise ValueError(f"GraphQL errors: {data['errors']}")

    return data.get("data", {}).get("storeDetails", {})


def get_store_menu(session, store_id: int) -> list[dict]:
    """Fetch full menu with categories, items, prices, and modifiers."""
    payload = {
        "operationName": "getStoreMenu",
        "variables": {
            "storeId": store_id,
        },
        "query": """
        query getStoreMenu($storeId: Int!) {
            storeMenu(storeId: $storeId) {
                categories {
                    id
                    name
                    description
                    isPopular
                    items {
                        id
                        name
                        description
                        price
                        originalPrice
                        imageUrl
                        isAvailable
                        isPopular
                        alcoholic
                        portionSizeInfo
                        extras {
                            id
                            name
                            minNumOptions
                            maxNumOptions
                            options {
                                id
                                name
                                price
                                default
                            }
                        }
                    }
                }
            }
        }"""
    }

    resp = session.post(GRAPHQL_URL, json=payload, timeout=20)
    resp.raise_for_status()
    data = resp.json()

    if "errors" in data:
        raise ValueError(f"GraphQL errors: {data['errors']}")

    menu_data = data.get("data", {}).get("storeMenu", {})
    return menu_data.get("categories", [])


def scrape_restaurant(store_id: int) -> dict:
    """Full restaurant scrape: details + menu."""
    session = create_session()
    warm_up_session(session)

    print(f"Scraping store {store_id}...")

    # Get details
    time.sleep(random.uniform(1, 2))
    details = get_store_details(session, store_id)

    # Get menu
    time.sleep(random.uniform(2, 4))
    menu = get_store_menu(session, store_id)

    return {
        "details": details,
        "menu": menu,
        "scraped_at": time.strftime("%Y-%m-%dT%H:%M:%S"),
    }

Usage Example

if __name__ == "__main__":
    store_id = 61092  # Pizza Hut NYC

    result = scrape_restaurant(store_id)
    details = result["details"]
    menu = result["menu"]

    print(f"\nRestaurant: {details.get('name')}")
    print(f"Rating: {details.get('averageRating')} ({details.get('numRatings')} reviews)")
    print(f"Delivery fee: ${(details.get('deliveryFee') or 0) / 100:.2f}")
    print(f"ETA: {details.get('estimatedDeliveryTime')} min")
    print(f"DashPass eligible: {details.get('isDashPassEligible')}")

    print(f"\nMenu ({len(menu)} categories):")
    for category in menu[:3]:  # show first 3 categories
        items = category.get("items", [])
        print(f"\n  {category['name']} ({len(items)} items):")
        for item in items[:5]:  # show first 5 items
            price = (item.get("price") or 0) / 100
            popular = " [Popular]" if item.get("isPopular") else ""
            print(f"    ${price:.2f} — {item['name']}{popular}")

Finding Store IDs at Scale {#store-ids}

Store IDs appear in DoorDash URLs. doordash.com/store/pizza-hut-new-york-61092/ has store ID 61092. But to build a comprehensive database, you need to discover stores systematically.

Search by Location

def search_stores_by_location(session, lat: float, lng: float,
                               query: str = "", limit: int = 50) -> list[dict]:
    """Discover stores near a geographic coordinate."""
    payload = {
        "operationName": "searchStoresV3",
        "variables": {
            "latitude": lat,
            "longitude": lng,
            "query": query,
            "limit": limit,
            "offset": 0,
            "filters": {
                "sortOrder": "RELEVANCE",
            }
        },
        "query": """
        query searchStoresV3($latitude: Float!, $longitude: Float!,
                             $query: String, $limit: Int, $offset: Int) {
            searchStoresV3(
                latitude: $latitude
                longitude: $longitude
                query: $query
                limit: $limit
                offset: $offset
            ) {
                stores {
                    id
                    name
                    averageRating
                    numRatings
                    estimatedDeliveryTime
                    deliveryFee
                    cuisineType
                    address {
                        city
                        state
                    }
                    isDashPassEligible
                    isOpen
                }
                hasMore
                totalCount
            }
        }"""
    }

    resp = session.post(GRAPHQL_URL, json=payload, timeout=20)
    resp.raise_for_status()
    data = resp.json()
    return data.get("data", {}).get("searchStoresV3", {})


def discover_stores_in_city(city_lat: float, city_lng: float,
                             radius_km: float = 5.0) -> list[int]:
    """
    Grid-search a city to discover all store IDs.
    Uses a grid of GPS coordinates spaced ~2km apart.
    """
    import math

    session = create_session()
    warm_up_session(session)

    # Calculate grid steps (approximately 0.018 degrees = 2km)
    step = 0.018
    lat_steps = int(radius_km / 2) + 1
    lng_steps = int(radius_km / 2) + 1

    all_store_ids = set()
    requests_made = 0

    for lat_offset in range(-lat_steps, lat_steps + 1):
        for lng_offset in range(-lng_steps, lng_steps + 1):
            lat = city_lat + (lat_offset * step)
            lng = city_lng + (lng_offset * step)

            try:
                result = search_stores_by_location(session, lat, lng, limit=50)
                stores = result.get("stores", [])

                for store in stores:
                    store_id = store.get("id")
                    if store_id:
                        all_store_ids.add(store_id)

                requests_made += 1
                print(f"Grid point ({lat:.3f}, {lng:.3f}): "
                      f"{len(stores)} stores, total unique: {len(all_store_ids)}")

                # Polite delay between grid points
                time.sleep(random.uniform(2, 5))

            except Exception as e:
                print(f"Failed at ({lat:.3f}, {lng:.3f}): {e}")
                time.sleep(10)  # longer backoff on error

    print(f"\nDiscovered {len(all_store_ids)} unique stores "
          f"in {requests_made} requests")
    return list(all_store_ids)


# Example: discover restaurants in Manhattan
# manhattan_stores = discover_stores_in_city(40.7580, -73.9855, radius_km=5)

Pagination Through Search Results

def get_all_stores_for_query(session, lat: float, lng: float,
                              query: str, max_results: int = 500) -> list[dict]:
    """Paginate through search results to get more than 50 stores."""
    all_stores = []
    offset = 0
    limit = 50

    while offset < max_results:
        payload = {
            "operationName": "searchStoresV3",
            "variables": {
                "latitude": lat,
                "longitude": lng,
                "query": query,
                "limit": limit,
                "offset": offset,
            },
            "query": """
            query searchStoresV3($latitude: Float!, $longitude: Float!,
                                 $query: String, $limit: Int, $offset: Int) {
                searchStoresV3(lat: $latitude, lng: $longitude,
                               query: $query, limit: $limit, offset: $offset) {
                    stores { id name cuisineType averageRating }
                    hasMore
                }
            }"""
        }

        resp = session.post(GRAPHQL_URL, json=payload, timeout=20)
        data = resp.json()
        result = data.get("data", {}).get("searchStoresV3", {})

        stores = result.get("stores", [])
        all_stores.extend(stores)

        if not result.get("hasMore") or not stores:
            break

        offset += limit
        time.sleep(random.uniform(1, 3))

    return all_stores

Handling Rate Limits and Blocks {#rate-limits}

Detecting Different Block Types

import re

def check_response(resp) -> str:
    """Classify the response to detect blocks."""
    if resp.status_code == 200:
        data = resp.json()
        if "errors" in data:
            errors = data["errors"]
            error_msg = str(errors)
            if "UNAUTHENTICATED" in error_msg:
                return "auth_required"
            if "RATE_LIMITED" in error_msg:
                return "rate_limited"
            return "graphql_error"
        return "success"

    elif resp.status_code == 403:
        # Check if it's CloudFront or application-level
        server = resp.headers.get("server", "")
        if "CloudFront" in server:
            return "cloudfront_block"
        return "forbidden"

    elif resp.status_code == 429:
        return "rate_limited"

    elif resp.status_code == 503:
        return "service_unavailable"

    return f"http_error_{resp.status_code}"


def robust_graphql_request(session, payload: dict,
                            max_retries: int = 5) -> dict:
    """Make a GraphQL request with exponential backoff and block detection."""
    for attempt in range(max_retries):
        try:
            resp = session.post(GRAPHQL_URL, json=payload, timeout=20)
            status = check_response(resp)

            if status == "success":
                return resp.json()

            elif status == "rate_limited":
                retry_after = int(resp.headers.get("Retry-After", 60))
                wait = max(retry_after, 30 * (2 ** attempt))
                print(f"Rate limited. Waiting {wait}s (attempt {attempt+1})")
                time.sleep(wait)

            elif status == "cloudfront_block":
                # CloudFront block - need fresh IP and session
                wait = 120 * (2 ** attempt)
                print(f"CloudFront block detected. Waiting {wait}s")
                time.sleep(wait)
                session = create_session()  # fresh session with new proxy
                warm_up_session(session)

            elif status == "auth_required":
                # Some endpoints need authentication
                raise PermissionError("Endpoint requires authentication")

            else:
                wait = 10 * (2 ** attempt)
                print(f"Status: {status}. Waiting {wait}s")
                time.sleep(wait)

        except (ConnectionError, TimeoutError) as e:
            wait = 15 * (2 ** attempt)
            print(f"Network error: {e}. Waiting {wait}s")
            time.sleep(wait)

    raise RuntimeError(f"All {max_retries} attempts failed")

Request Pacing

import threading
from collections import deque

class RequestPacer:
    """Enforces a minimum delay between requests with jitter."""

    def __init__(self, min_delay: float = 2.0, max_delay: float = 5.0,
                 burst_limit: int = 10):
        self.min_delay = min_delay
        self.max_delay = max_delay
        self.burst_limit = burst_limit
        self.request_times = deque(maxlen=burst_limit)
        self.lock = threading.Lock()

    def wait(self):
        """Block until it's safe to make another request."""
        with self.lock:
            now = time.time()

            # Enforce burst limit: if we've made `burst_limit` requests
            # in the last 60 seconds, wait
            if len(self.request_times) == self.burst_limit:
                oldest = self.request_times[0]
                elapsed = now - oldest
                if elapsed < 60:
                    sleep_time = 60 - elapsed + random.uniform(0, 5)
                    print(f"Burst limit reached. Waiting {sleep_time:.1f}s")
                    time.sleep(sleep_time)

            # Enforce minimum delay from last request
            if self.request_times:
                last = self.request_times[-1]
                min_next = last + self.min_delay
                if time.time() < min_next:
                    delay = min_next - time.time() + random.uniform(0, self.max_delay - self.min_delay)
                    time.sleep(delay)

            self.request_times.append(time.time())


pacer = RequestPacer(min_delay=2, max_delay=5, burst_limit=20)

def paced_scrape_restaurant(store_id: int) -> dict:
    pacer.wait()
    return scrape_restaurant(store_id)

Storing and Analyzing the Data {#storage}

SQLite Schema

import sqlite3

def create_database(db_path: str = "doordash.db"):
    """Create database schema for DoorDash data."""
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.executescript("""
        CREATE TABLE IF NOT EXISTS restaurants (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            phone TEXT,
            description TEXT,
            cuisine_type TEXT,
            price_range INTEGER,
            average_rating REAL,
            num_ratings INTEGER,
            delivery_fee INTEGER,
            service_fee INTEGER,
            min_order_amount INTEGER,
            estimated_delivery_time INTEGER,
            delivery_radius REAL,
            is_dashpass_eligible BOOLEAN,
            is_open BOOLEAN,
            street TEXT,
            city TEXT,
            state TEXT,
            zip_code TEXT,
            lat REAL,
            lng REAL,
            scraped_at TEXT,
            updated_at TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS menu_categories (
            id INTEGER PRIMARY KEY,
            restaurant_id INTEGER REFERENCES restaurants(id),
            name TEXT NOT NULL,
            description TEXT,
            is_popular BOOLEAN,
            sort_order INTEGER
        );

        CREATE TABLE IF NOT EXISTS menu_items (
            id INTEGER PRIMARY KEY,
            category_id INTEGER REFERENCES menu_categories(id),
            restaurant_id INTEGER REFERENCES restaurants(id),
            name TEXT NOT NULL,
            description TEXT,
            price INTEGER,
            original_price INTEGER,
            image_url TEXT,
            is_available BOOLEAN,
            is_popular BOOLEAN,
            is_alcoholic BOOLEAN,
            scraped_at TEXT
        );

        CREATE TABLE IF NOT EXISTS scrape_log (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            store_id INTEGER,
            status TEXT,
            error_message TEXT,
            scraped_at TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE INDEX IF NOT EXISTS idx_restaurants_city ON restaurants(city, state);
        CREATE INDEX IF NOT EXISTS idx_restaurants_cuisine ON restaurants(cuisine_type);
        CREATE INDEX IF NOT EXISTS idx_items_restaurant ON menu_items(restaurant_id);
    """)

    conn.commit()
    return conn


def save_restaurant(conn: sqlite3.Connection, data: dict):
    """Save restaurant and menu to database (upsert)."""
    c = conn.cursor()
    details = data.get("details", {})
    menu = data.get("menu", [])
    scraped_at = data.get("scraped_at")

    address = details.get("address") or {}

    # Upsert restaurant
    c.execute("""
        INSERT INTO restaurants (
            id, name, phone, description, cuisine_type, price_range,
            average_rating, num_ratings, delivery_fee, service_fee,
            min_order_amount, estimated_delivery_time, delivery_radius,
            is_dashpass_eligible, is_open, street, city, state, zip_code,
            lat, lng, scraped_at
        ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
        ON CONFLICT(id) DO UPDATE SET
            average_rating=excluded.average_rating,
            num_ratings=excluded.num_ratings,
            delivery_fee=excluded.delivery_fee,
            estimated_delivery_time=excluded.estimated_delivery_time,
            is_open=excluded.is_open,
            scraped_at=excluded.scraped_at,
            updated_at=CURRENT_TIMESTAMP
    """, (
        details.get("id"),
        details.get("name"),
        details.get("phoneNumber"),
        details.get("description"),
        details.get("cuisineType"),
        details.get("priceRange"),
        details.get("averageRating"),
        details.get("numRatings"),
        details.get("deliveryFee"),
        details.get("serviceFee"),
        details.get("minOrderAmount"),
        details.get("estimatedDeliveryTime"),
        details.get("deliveryRadius"),
        details.get("isDashPassEligible"),
        details.get("isOpen"),
        address.get("street"),
        address.get("city"),
        address.get("state"),
        address.get("zipCode"),
        address.get("lat"),
        address.get("lng"),
        scraped_at,
    ))

    restaurant_id = details.get("id")

    # Clear existing menu (fresh scrape)
    c.execute("DELETE FROM menu_categories WHERE restaurant_id=?", (restaurant_id,))
    c.execute("DELETE FROM menu_items WHERE restaurant_id=?", (restaurant_id,))

    # Insert menu
    for sort_order, category in enumerate(menu):
        cat_id = category.get("id")
        c.execute("""
            INSERT OR REPLACE INTO menu_categories
                (id, restaurant_id, name, description, is_popular, sort_order)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (
            cat_id, restaurant_id,
            category.get("name"), category.get("description"),
            category.get("isPopular"), sort_order
        ))

        for item in category.get("items", []):
            c.execute("""
                INSERT OR REPLACE INTO menu_items
                    (id, category_id, restaurant_id, name, description,
                     price, original_price, image_url, is_available,
                     is_popular, is_alcoholic, scraped_at)
                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            """, (
                item.get("id"), cat_id, restaurant_id,
                item.get("name"), item.get("description"),
                item.get("price"), item.get("originalPrice"),
                item.get("imageUrl"), item.get("isAvailable"),
                item.get("isPopular"), item.get("alcoholic"),
                scraped_at
            ))

    conn.commit()

Analysis Queries

def analyze_market(conn: sqlite3.Connection, city: str):
    """Run market analysis queries on the collected data."""
    c = conn.cursor()

    print(f"\n=== DoorDash Market Analysis: {city} ===\n")

    # Top cuisines
    print("Top 10 cuisines by restaurant count:")
    c.execute("""
        SELECT cuisine_type, COUNT(*) as count,
               AVG(average_rating) as avg_rating,
               AVG(delivery_fee / 100.0) as avg_fee
        FROM restaurants
        WHERE city = ? AND cuisine_type IS NOT NULL
        GROUP BY cuisine_type
        ORDER BY count DESC
        LIMIT 10
    """, (city,))
    for row in c.fetchall():
        print(f"  {row[0]}: {row[1]} restaurants, "
              f"avg rating {row[2]:.2f}, avg fee ${row[3]:.2f}")

    # Price distribution
    print("\nDelivery fee distribution:")
    c.execute("""
        SELECT
            CASE
                WHEN delivery_fee = 0 THEN 'Free'
                WHEN delivery_fee < 200 THEN '$0.01-$1.99'
                WHEN delivery_fee < 400 THEN '$2.00-$3.99'
                WHEN delivery_fee < 600 THEN '$4.00-$5.99'
                ELSE '$6.00+'
            END as fee_range,
            COUNT(*) as count
        FROM restaurants
        WHERE city = ?
        GROUP BY fee_range
        ORDER BY MIN(delivery_fee)
    """, (city,))
    for row in c.fetchall():
        print(f"  {row[0]}: {row[1]} restaurants")

    # Most popular menu items across restaurants
    print("\nMost common popular items across all restaurants:")
    c.execute("""
        SELECT name, COUNT(*) as frequency
        FROM menu_items
        WHERE is_popular = 1
          AND restaurant_id IN (SELECT id FROM restaurants WHERE city = ?)
        GROUP BY LOWER(name)
        ORDER BY frequency DESC
        LIMIT 15
    """, (city,))
    for row in c.fetchall():
        print(f"  '{row[0]}': appears in {row[1]} restaurants")

Building a Multi-City Scraper {#multi-city}

import concurrent.futures
import json

# Major US cities with coordinates
CITIES = {
    "New York": (40.7580, -73.9855),
    "Los Angeles": (34.0522, -118.2437),
    "Chicago": (41.8781, -87.6298),
    "Houston": (29.7604, -95.3698),
    "Phoenix": (33.4484, -112.0740),
    "Philadelphia": (39.9526, -75.1652),
    "San Antonio": (29.4241, -98.4936),
    "San Diego": (32.7157, -117.1611),
    "Dallas": (32.7767, -96.7970),
    "Austin": (30.2672, -97.7431),
}


def scrape_city(city_name: str, lat: float, lng: float,
                db_path: str = "doordash_multi.db") -> dict:
    """Scrape all restaurants in a city."""
    print(f"\nStarting scrape for {city_name}...")

    session = create_session()
    warm_up_session(session)

    # Discover store IDs
    result = search_stores_by_location(session, lat, lng, limit=50)
    stores = result.get("stores", [])
    store_ids = [s["id"] for s in stores if s.get("id")]

    conn = create_database(db_path)
    scraped = 0
    failed = 0
    pacer = RequestPacer(min_delay=3, max_delay=7)

    for store_id in store_ids:
        try:
            pacer.wait()
            data = scrape_restaurant(store_id)
            save_restaurant(conn, data)
            scraped += 1
            print(f"  [{city_name}] Scraped {scraped}/{len(store_ids)}: "
                  f"{data['details'].get('name', store_id)}")

        except Exception as e:
            failed += 1
            c = conn.cursor()
            c.execute("""
                INSERT INTO scrape_log (store_id, status, error_message)
                VALUES (?, 'error', ?)
            """, (store_id, str(e)))
            conn.commit()
            print(f"  [{city_name}] Failed {store_id}: {e}")
            time.sleep(random.uniform(5, 15))

    conn.close()
    return {
        "city": city_name,
        "total": len(store_ids),
        "scraped": scraped,
        "failed": failed,
    }


def run_multi_city_scraper(cities: dict, db_path: str = "doordash_multi.db"):
    """Run the scraper across multiple cities sequentially (be polite)."""
    results = []

    for city_name, (lat, lng) in cities.items():
        result = scrape_city(city_name, lat, lng, db_path)
        results.append(result)

        # Rest between cities
        rest_time = random.uniform(30, 60)
        print(f"Resting {rest_time:.0f}s before next city...")
        time.sleep(rest_time)

    print("\n=== Multi-City Scrape Complete ===")
    for r in results:
        print(f"  {r['city']}: {r['scraped']}/{r['total']} scraped, "
              f"{r['failed']} failed")

    return results


if __name__ == "__main__":
    run_multi_city_scraper(
        {k: v for k, v in list(CITIES.items())[:3]},  # start with 3 cities
        db_path="doordash_multi.db"
    )

Proxy Strategy with ThorData {#proxies}

Residential proxies are essential for DoorDash scraping at scale. Datacenter IPs get flagged almost immediately because DoorDash's WAF recognizes the IP ranges belonging to cloud providers like AWS, GCP, and Azure.

ThorData provides rotating residential proxies sourced from real consumer internet connections. Each request routes through a different household IP, making your traffic indistinguishable from thousands of real DoorDash users in different locations.

Setting Up ThorData Rotation

import itertools
import threading

class ThorDataProxyPool:
    """
    Manages a rotating pool of ThorData residential proxies.
    ThorData supports sticky sessions (same IP per session) and
    rotating sessions (new IP per request).
    """

    def __init__(self, username: str, password: str,
                 host: str = "proxy.thordata.com", port: int = 9000,
                 rotate_per_request: bool = True):
        self.username = username
        self.password = password
        self.host = host
        self.port = port
        self.rotate_per_request = rotate_per_request
        self._lock = threading.Lock()
        self._request_count = 0

    def get_proxy_url(self, country: str = "US",
                      session_id: str = None) -> str:
        """
        Generate a proxy URL.

        For rotating proxies (new IP per request), omit session_id.
        For sticky proxies (same IP per session), provide a session_id.
        """
        with self._lock:
            self._request_count += 1

        if self.rotate_per_request:
            # Each call gets a new IP from ThorData's pool
            user = f"{self.username}-country-{country}"
        else:
            # Sticky session: same IP for all requests with this session_id
            sid = session_id or f"session-{self._request_count // 10}"
            user = f"{self.username}-country-{country}-session-{sid}"

        return f"http://{user}:{self.password}@{self.host}:{self.port}"

    def get_country_specific_proxy(self, country: str) -> str:
        """Get a proxy with a specific country's IP."""
        return self.get_proxy_url(country=country)

    def create_session(self, country: str = "US") -> cffi_requests.Session:
        """Create a curl-cffi session with a ThorData proxy."""
        proxy_url = self.get_proxy_url(country=country)
        session = cffi_requests.Session(impersonate="chrome120")
        session.headers.update(HEADERS)
        session.proxies = {"https": proxy_url, "http": proxy_url}
        return session


# Initialize the proxy pool
# Get your credentials at https://thordata.partnerstack.com/partner/0a0x4nzh
proxy_pool = ThorDataProxyPool(
    username="YOUR_THORDATA_USERNAME",
    password="YOUR_THORDATA_PASSWORD",
    rotate_per_request=True,
)

def scrape_with_rotating_proxies(store_ids: list[int]) -> list[dict]:
    """Scrape multiple stores, rotating proxies between each."""
    results = []

    for store_id in store_ids:
        # New session = new proxy IP from ThorData's pool
        session = proxy_pool.create_session(country="US")
        warm_up_session(session)

        try:
            pacer = RequestPacer(min_delay=2, max_delay=5)
            pacer.wait()

            details = get_store_details(session, store_id)
            menu = get_store_menu(session, store_id)

            results.append({
                "details": details,
                "menu": menu,
                "scraped_at": time.strftime("%Y-%m-%dT%H:%M:%S"),
            })
            print(f"Scraped: {details.get('name', store_id)}")

        except Exception as e:
            print(f"Failed {store_id}: {e}")
            results.append({"store_id": store_id, "error": str(e)})

    return results

Geographic Targeting

When scraping DoorDash for specific cities, use ThorData to send requests from IPs in those cities. DoorDash personalizes delivery times and fees based on IP location, so city-matched proxies give you more accurate local data:

CITY_COUNTRY_MAP = {
    "New York": "US",
    "London": "GB",
    "Toronto": "CA",
    "Sydney": "AU",
    "Berlin": "DE",
}

def scrape_city_with_local_ip(city: str, lat: float, lng: float) -> list[dict]:
    country = CITY_COUNTRY_MAP.get(city, "US")
    session = proxy_pool.create_session(country=country)
    warm_up_session(session)

    result = search_stores_by_location(session, lat, lng, limit=50)
    return result.get("stores", [])

Playwright-Based Fallback {#playwright}

Sometimes DoorDash updates their bot detection and the API approach stops working temporarily. In that case, Playwright driving a real browser is the fallback:

from playwright.async_api import async_playwright
import asyncio
import json

async def scrape_with_playwright(store_id: int,
                                  proxy_url: str = None) -> dict:
    """
    Full browser scrape using Playwright.
    Intercepts network requests to capture GraphQL responses.
    """
    async with async_playwright() as p:
        # Launch with optional proxy
        launch_options = {
            "headless": True,  # set False for debugging
        }
        if proxy_url:
            launch_options["proxy"] = {"server": proxy_url}

        browser = await p.chromium.launch(**launch_options)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                      "AppleWebKit/537.36 (KHTML, like Gecko) "
                      "Chrome/126.0.0.0 Safari/537.36",
            viewport={"width": 1280, "height": 800},
            locale="en-US",
        )

        page = await context.new_page()
        captured_data = {}

        # Intercept GraphQL responses
        async def handle_response(response):
            if "graphql" in response.url and response.status == 200:
                try:
                    body = await response.json()
                    if "data" in body:
                        op = "unknown"
                        # Try to identify the operation
                        req_post_data = response.request.post_data
                        if req_post_data:
                            req_json = json.loads(req_post_data)
                            op = req_json.get("operationName", "unknown")
                        captured_data[op] = body["data"]
                except Exception:
                    pass

        page.on("response", handle_response)

        # Navigate to restaurant page
        url = f"https://www.doordash.com/store/{store_id}/"
        await page.goto(url, wait_until="networkidle", timeout=30000)

        # Wait for menu to load
        try:
            await page.wait_for_selector("[data-testid='menu-category']",
                                         timeout=10000)
        except Exception:
            pass  # Menu selector might have changed

        await browser.close()
        return captured_data


async def run_playwright_scraper(store_ids: list[int]) -> list[dict]:
    results = []
    for store_id in store_ids:
        data = await scrape_with_playwright(store_id)
        results.append({"store_id": store_id, "data": data})
        await asyncio.sleep(random.uniform(3, 7))
    return results


# Run it
# results = asyncio.run(run_playwright_scraper([61092, 12345, 67890]))

Real-World Use Cases {#use-cases}

1. Competitive Price Intelligence

Track how delivery fees and minimum orders change across neighborhoods:

def track_price_changes(db_path: str, store_id: int, interval_hours: int = 24):
    """Monitor delivery fee changes over time."""
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("""
        SELECT scraped_at, delivery_fee, estimated_delivery_time
        FROM restaurants
        WHERE id = ?
        ORDER BY scraped_at DESC
        LIMIT 30
    """, (store_id,))

    history = c.fetchall()
    if len(history) > 1:
        current_fee = history[0][1]
        prev_fee = history[1][1]
        if current_fee != prev_fee:
            change = (current_fee - prev_fee) / 100
            print(f"Store {store_id}: delivery fee changed by ${change:+.2f}")

    conn.close()

2. Restaurant Opening Detection

Find restaurants that just appeared on DoorDash in your city:

def find_new_restaurants(db_path: str, city: str, days_ago: int = 7) -> list:
    """Find restaurants added to DoorDash in the last N days."""
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("""
        SELECT id, name, cuisine_type, average_rating, delivery_fee
        FROM restaurants
        WHERE city = ?
          AND date(scraped_at) >= date('now', ?)
        ORDER BY scraped_at DESC
    """, (city, f"-{days_ago} days"))

    results = c.fetchall()
    conn.close()
    return results

3. Cuisine Gap Analysis

Find underserved cuisine types in a neighborhood:

def find_cuisine_gaps(db_path: str, city: str) -> list:
    """Identify cuisines with few options relative to demand (based on ratings)."""
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("""
        SELECT
            cuisine_type,
            COUNT(*) as count,
            AVG(average_rating) as avg_rating,
            AVG(num_ratings) as avg_reviews,
            -- High avg_reviews relative to count = high demand, low supply
            AVG(num_ratings) / COUNT(*) as demand_supply_ratio
        FROM restaurants
        WHERE city = ?
          AND cuisine_type IS NOT NULL
          AND average_rating >= 4.0
        GROUP BY cuisine_type
        HAVING count < 5  -- few options
        ORDER BY demand_supply_ratio DESC
    """, (city,))

    results = c.fetchall()
    conn.close()
    return results

4. Menu Item Price Benchmarking

Compare prices for the same item across multiple restaurants:

def benchmark_item(db_path: str, item_name: str, city: str) -> list:
    """Find the same menu item across restaurants and compare prices."""
    conn = sqlite3.connect(db_path)
    c = conn.cursor()

    c.execute("""
        SELECT
            r.name as restaurant,
            mi.name as item,
            mi.price / 100.0 as price,
            r.average_rating,
            r.delivery_fee / 100.0 as delivery_fee
        FROM menu_items mi
        JOIN restaurants r ON mi.restaurant_id = r.id
        WHERE r.city = ?
          AND LOWER(mi.name) LIKE LOWER(?)
          AND mi.is_available = 1
        ORDER BY mi.price
    """, (city, f"%{item_name}%"))

    results = c.fetchall()
    conn.close()

    if results:
        prices = [r[2] for r in results]
        print(f"\n'{item_name}' prices in {city}:")
        print(f"  Min: ${min(prices):.2f}")
        print(f"  Max: ${max(prices):.2f}")
        print(f"  Avg: ${sum(prices)/len(prices):.2f}")
        print(f"\n  Cheapest option: {results[0][0]} at ${results[0][2]:.2f}")

    return results

DoorDash's Terms of Service prohibit automated access. Scraping publicly visible data (restaurant names, menus, prices) generally falls under fair use for research and analysis, but selling or republishing the data commercially could create legal exposure.

Key principles to stay on the right side:

  1. Only scrape publicly visible data — nothing behind authentication, no user data, no payment information
  2. Don't overwhelm the server — keep request rates low, respect the spirit of robots.txt
  3. Don't republish raw data commercially — aggregated analysis is safer than raw data dumps
  4. Cache aggressively — scrape each page as infrequently as you need to, not as often as technically possible
  5. Identify yourself — some organizations set a custom User-Agent with contact info for good-faith scrapers

The Computer Fraud and Abuse Act (CFAA) in the US has been interpreted narrowly since the hiQ v. LinkedIn ruling — scraping publicly accessible data is generally protected. But this area of law is still evolving.


Performance Optimization {#performance}

Async Concurrent Scraping

For maximum throughput while staying polite:

import asyncio
from curl_cffi.requests import AsyncSession

async def async_scrape_store(session: AsyncSession,
                              store_id: int, semaphore: asyncio.Semaphore) -> dict:
    """Async version for concurrent scraping."""
    async with semaphore:
        await asyncio.sleep(random.uniform(1, 3))  # polite delay

        payload = {
            "operationName": "getStoreDetails",
            "variables": {"storeId": store_id},
            "query": "query getStoreDetails($storeId: Int!) { ... }",
        }

        resp = await session.post(GRAPHQL_URL, json=payload, timeout=20)
        return resp.json()


async def async_batch_scrape(store_ids: list[int],
                              max_concurrent: int = 5) -> list[dict]:
    """Scrape multiple stores concurrently."""
    semaphore = asyncio.Semaphore(max_concurrent)
    proxy_url = proxy_pool.get_proxy_url()

    async with AsyncSession(impersonate="chrome120",
                            proxies={"https": proxy_url}) as session:
        session.headers.update(HEADERS)
        tasks = [async_scrape_store(session, sid, semaphore)
                 for sid in store_ids]
        results = await asyncio.gather(*tasks, return_exceptions=True)

    return [r for r in results if not isinstance(r, Exception)]


# Scrape 100 stores with max 5 concurrent requests
# results = asyncio.run(async_batch_scrape(store_ids[:100], max_concurrent=5))

Estimated Performance

With proper configuration:

Setup Stores/hour Notes
Single thread, no proxy 15-20 Gets blocked quickly
Single thread + residential proxy 60-80 Reliable, slow
5 concurrent + rotating proxies 200-300 Good balance
10 concurrent + dedicated proxy pool 400-500 For large operations

The limiting factor is almost always proxy cost and rate limit avoidance, not your hardware.


Summary

Scraping DoorDash requires handling several layers of defense: TLS fingerprinting, CloudFront WAF, and rate limiting. The core approach is:

  1. Use curl-cffi with Chrome impersonation for TLS fingerprinting
  2. Route through residential proxies (ThorData) for IP diversity
  3. Warm up each session by visiting the homepage first
  4. Keep request rates low with randomized delays
  5. Store in SQLite for easy analysis

The GraphQL API returns clean JSON, so there's no HTML parsing complexity — the hard part is the bot detection layer, not the data extraction.

For a working proxy setup that handles DoorDash specifically, ThorData's residential network is the most reliable option — their rotating pool ensures each request looks like a different household consumer.