← Back to blog

How to Scrape Etsy Listings in 2026 (Product Data Without the API)

How to Scrape Etsy Listings in 2026 (Product Data Without the API)

Etsy's official API is a dead end for most use cases. To get real-time pricing, listings, or seller data at scale, you need a different approach. This guide walks through everything: the data available, how Etsy's protection works, a complete scraping toolkit, and practical storage and analysis patterns.

Why the Etsy API Falls Short

The Etsy Open API v3 requires application approval and imposes strict rate limits: 10 requests per second with a daily cap. For product research, price monitoring, or competitive analysis, that's not enough.

The approval process can take weeks. Even after approval, you're limited to reading data from your own shop unless you get elevated permissions. For competitive intelligence — tracking other sellers' prices, monitoring trending products, analyzing review sentiment — the API simply doesn't give you access.

The more practical alternative is scraping Etsy's public listing pages directly.

What Data You Can Get

From a standard Etsy search or product page, you can extract: - Product title, price, currency - Rating and review count - Seller username and shop name - Shop URL and product URL - Thumbnail images - Listing tags and categories - Shipping costs and estimated delivery - Number of sales (shown on shop pages) - Favorited count (hearts) - Listing creation date and last updated - Material and dimensions (when sellers include them) - Processing time

What you can't get without an account: messages, order data, private listings, purchase history.

The Challenge: Anti-Bot Detection

Etsy uses Cloudflare for bot detection. A raw requests call will get you a 403 or a JS challenge page. You need one of three approaches:

Option 1: Playwright/Selenium — Full browser automation bypasses JS challenges but is slow and resource-heavy. Best for small-scale scraping or when you need to interact with the page (clicking filters, loading more results).

Option 2: curl_cffi — Python library that impersonates Chrome's exact TLS fingerprint. Works well against Cloudflare's TLS-based detection and is faster than running a full browser. Best for medium-scale scraping where speed matters.

Option 3: Residential proxies + stealth headers — Route requests through real residential IPs. Cloudflare can't distinguish these from real users. Combined with proper headers and TLS fingerprinting via curl_cffi, this is the most reliable approach for production use. ThorData's residential proxy network provides rotating IPs with city-level targeting at reasonable cost for this kind of work.

Dependencies

pip install httpx curl-cffi beautifulsoup4 lxml selectolax playwright
playwright install chromium

Method 1: curl_cffi (Fastest for Simple Scraping)

from curl_cffi import requests as curl_requests
from bs4 import BeautifulSoup
import json
import re
import time
import random

HEADERS = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Referer": "https://www.etsy.com/",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "same-origin",
    "Sec-Fetch-Dest": "document",
    "sec-ch-ua": '"Chromium";v="124", "Google Chrome";v="124"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"macOS"',
}

PROXY = "http://user:[email protected]:9000"  # ThorData residential

def get_etsy_session():
    """Create a curl_cffi session that impersonates Chrome's TLS fingerprint."""
    session = curl_requests.Session()
    # Warm up the session by hitting the homepage first
    session.get(
        "https://www.etsy.com/",
        impersonate="chrome124",
        headers=HEADERS,
        proxies={"https": PROXY},
        timeout=20,
    )
    time.sleep(random.uniform(1.5, 3.0))
    return session

def search_etsy_curl(query: str, max_pages: int = 3, session=None) -> list[dict]:
    """Scrape Etsy search results using TLS impersonation."""
    if session is None:
        session = get_etsy_session()

    all_results = []

    for page in range(1, max_pages + 1):
        params = {"q": query, "explicit": "1", "page": str(page)}
        param_str = "&".join(f"{k}={v}" for k, v in params.items())
        url = f"https://www.etsy.com/search?{param_str}"

        try:
            resp = session.get(
                url,
                impersonate="chrome124",
                headers=HEADERS,
                proxies={"https": PROXY},
                timeout=30,
            )
            resp.raise_for_status()
        except Exception as e:
            print(f"Page {page} error: {e}")
            break

        listings = parse_search_results(resp.text)
        all_results.extend(listings)
        print(f"Page {page}: {len(listings)} listings")

        time.sleep(random.uniform(2.0, 4.5))

    return all_results

Method 2: Full Playwright (Best for Dynamic Content)

import asyncio
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async

async def scrape_etsy_playwright(
    query: str,
    max_pages: int = 3,
    proxy_url: str = None,
) -> list[dict]:
    """Scrape Etsy using full browser automation — best for complex filtering."""
    all_results = []

    async with async_playwright() as p:
        launch_kwargs = {
            "headless": True,
            "args": [
                "--no-sandbox",
                "--disable-blink-features=AutomationControlled",
                "--window-size=1440,900",
            ],
        }
        if proxy_url:
            launch_kwargs["proxy"] = {"server": proxy_url}

        browser = await p.chromium.launch(**launch_kwargs)
        context = await browser.new_context(
            viewport={"width": 1440, "height": 900},
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/124.0.0.0 Safari/537.36"
            ),
            locale="en-US",
        )

        page = await context.new_page()
        await stealth_async(page)

        # Warm up — visit homepage first
        await page.goto("https://www.etsy.com/", wait_until="domcontentloaded")
        await page.wait_for_timeout(random.randint(2000, 4000))

        for page_num in range(1, max_pages + 1):
            url = f"https://www.etsy.com/search?q={query.replace(' ', '+')}&page={page_num}"

            await page.goto(url, wait_until="networkidle", timeout=45000)
            await page.wait_for_timeout(2000)

            # Scroll to trigger lazy-loaded content
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.5)")
            await page.wait_for_timeout(1000)
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(1500)

            html = await page.content()
            listings = parse_search_results(html)
            all_results.extend(listings)
            print(f"Page {page_num}: {len(listings)} listings")

            await asyncio.sleep(random.uniform(3.0, 6.0))

        await browser.close()

    return all_results

Parsing Search Results

from bs4 import BeautifulSoup
import re

def parse_search_results(html: str) -> list[dict]:
    """Parse Etsy search results HTML into structured data."""
    soup = BeautifulSoup(html, "lxml")
    results = []

    # Etsy uses multiple possible container selectors — try in order
    container_selectors = [
        "[data-search-results-lg] .wt-grid__item-xs-6",
        "li[data-palette-listing-id]",
        ".v2-listing-card",
        "[data-listing-id]",
    ]

    cards = []
    for selector in container_selectors:
        cards = soup.select(selector)
        if cards:
            break

    if not cards:
        # Try JSON-LD embedded in the page
        return extract_json_ld_listings(soup)

    for card in cards:
        try:
            result = parse_listing_card(card)
            if result and result.get("title"):
                results.append(result)
        except Exception as e:
            continue

    return results

def parse_listing_card(card) -> dict:
    """Parse a single Etsy listing card."""
    # Title
    title_el = card.select_one("h3, .wt-text-caption, [data-search-results]")
    title = title_el.get_text(strip=True) if title_el else None

    # Price - Etsy shows prices with a specific class structure
    price_el = card.select_one(
        ".currency-value, .wt-text-title-01, [data-wt-currency-value]"
    )
    price = price_el.get_text(strip=True) if price_el else None

    # Currency
    currency_el = card.select_one(".currency-symbol, [data-currency-symbol]")
    currency = currency_el.get_text(strip=True) if currency_el else "USD"

    # Product URL
    link_el = card.select_one("a.listing-link, a[href*='/listing/']")
    url = None
    listing_id = None
    if link_el:
        href = link_el.get("href", "")
        # Extract clean URL without query params
        url = href.split("?")[0]
        # Extract listing ID
        match = re.search(r"/listing/(\d+)/", href)
        if match:
            listing_id = match.group(1)

    # Shop name
    shop_el = card.select_one(".shop-name, .wt-text-body-01 .a-link-subdued, a[href*='/shop/']")
    shop_name = shop_el.get_text(strip=True) if shop_el else None

    # Rating
    rating_el = card.select_one("[aria-label*='star'], [aria-label*='rating'], .wt-icon-star")
    rating = None
    if rating_el:
        aria = rating_el.get("aria-label", "")
        match = re.search(r"(\d+\.?\d*)", aria)
        rating = float(match.group(1)) if match else None

    # Review count
    reviews_el = card.select_one(".text-body-smaller, .wt-text-caption-xs:last-child")
    review_count = None
    if reviews_el:
        text = reviews_el.get_text(strip=True)
        match = re.search(r"([\d,]+)", text)
        if match:
            review_count = int(match.group(1).replace(",", ""))

    # Thumbnail
    img_el = card.select_one("img")
    thumbnail = None
    if img_el:
        thumbnail = img_el.get("src") or img_el.get("data-src")

    return {
        "listing_id": listing_id,
        "title": title,
        "price": price,
        "currency": currency,
        "url": url,
        "shop_name": shop_name,
        "rating": rating,
        "review_count": review_count,
        "thumbnail": thumbnail,
    }

def extract_json_ld_listings(soup) -> list[dict]:
    """Fallback: extract listings from JSON-LD structured data."""
    results = []
    for script in soup.select("script[type='application/ld+json']"):
        try:
            data = json.loads(script.string)
            if isinstance(data, list):
                items = data
            elif data.get("@type") == "ItemList":
                items = data.get("itemListElement", [])
            else:
                continue
            for item in items:
                if item.get("@type") in ("Product", "ListItem"):
                    offer = item.get("offers", {})
                    results.append({
                        "title": item.get("name"),
                        "price": offer.get("price"),
                        "currency": offer.get("priceCurrency"),
                        "url": item.get("url"),
                        "rating": item.get("aggregateRating", {}).get("ratingValue"),
                        "review_count": item.get("aggregateRating", {}).get("reviewCount"),
                    })
        except Exception:
            continue
    return results

Parsing Individual Product Pages

Individual listing pages contain far more data than search results. The most reliable extraction method is the embedded JSON-LD structured data:

def parse_listing_page(html: str) -> dict:
    """Extract full product data from an Etsy listing page."""
    soup = BeautifulSoup(html, "lxml")

    # Method 1: JSON-LD structured data (most reliable)
    product = {}
    json_scripts = soup.select("script[type='application/ld+json']")
    for script in json_scripts:
        try:
            data = json.loads(script.string)
            if data.get("@type") == "Product":
                product = {
                    "name": data.get("name"),
                    "price": data.get("offers", {}).get("price"),
                    "currency": data.get("offers", {}).get("priceCurrency"),
                    "availability": data.get("offers", {}).get("availability", "").split("/")[-1],
                    "rating": data.get("aggregateRating", {}).get("ratingValue"),
                    "review_count": data.get("aggregateRating", {}).get("reviewCount"),
                    "seller": data.get("seller", {}).get("name"),
                    "description": data.get("description"),
                    "url": data.get("url"),
                    "images": data.get("image", []) if isinstance(data.get("image"), list) else [data.get("image")],
                }
                break
        except Exception:
            continue

    # Method 2: Supplement with HTML parsing for data not in JSON-LD
    if not product:
        product = {}
        title_el = soup.select_one("h1[data-buy-box-listing-title], h1.wt-text-body-03")
        product["name"] = title_el.get_text(strip=True) if title_el else None

        price_el = soup.select_one(".wt-text-title-03.wt-mr-xs-1, p[data-buy-box-region='price']")
        product["price"] = price_el.get_text(strip=True) if price_el else None

    # Shipping info
    shipping_el = soup.select_one("[data-buy-box-region='shipping'] .wt-text-caption")
    if shipping_el:
        product["shipping"] = shipping_el.get_text(strip=True)

    # Processing time
    processing_el = soup.select_one("[class*='processing-time'], [data-region='processing_time']")
    if processing_el:
        product["processing_time"] = processing_el.get_text(strip=True)

    # Tags
    tag_els = soup.select("a[href*='/search?q='][class*='tag'], .wt-badge--tag a")
    product["tags"] = [t.get_text(strip=True) for t in tag_els]

    # Breadcrumbs / categories
    breadcrumbs = soup.select("#breadcrumbs a, nav[aria-label='breadcrumb'] a")
    product["categories"] = [b.get_text(strip=True) for b in breadcrumbs]

    # All product images
    image_els = soup.select("ul.carousel-pane-list img, [data-carousel-item] img")
    product["images"] = list(set([
        img.get("src", img.get("data-src"))
        for img in image_els
        if img.get("src") or img.get("data-src")
    ]))

    # Number of sales (sometimes shown on listing page)
    sales_el = soup.select_one("[data-shop-sales-count], .wt-text-body-01:contains('sales')")
    if sales_el:
        product["shop_sales"] = sales_el.get_text(strip=True)

    # Variations/options (colors, sizes)
    variations = {}
    for select_el in soup.select("select[id*='variation'], select[data-feature-key]"):
        label = select_el.get("aria-label") or select_el.get("name", "option")
        options = [opt.get_text(strip=True) for opt in select_el.select("option") if opt.get("value")]
        if options:
            variations[label] = options
    product["variations"] = variations

    return product

Scraping Seller Reviews

Reviews are paginated and require a separate request. Etsy's review system uses an internal API endpoint:

import httpx

def scrape_reviews(listing_id: str, max_pages: int = 5, proxy: str = None) -> list[dict]:
    """Scrape reviews for a specific Etsy listing via Etsy's internal API."""
    reviews = []
    proxies = {"https://": proxy} if proxy else None

    for page in range(1, max_pages + 1):
        url = "https://www.etsy.com/api/v3/ajax/bespoke/member/neu/specs/reviews"
        params = {
            "listing_id": listing_id,
            "page": str(page),
            "should_show_translations": "false",
            "target_currency": "USD",
        }
        headers = {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
            "Accept": "application/json, text/javascript, */*; q=0.01",
            "X-Requested-With": "XMLHttpRequest",
            "Referer": f"https://www.etsy.com/listing/{listing_id}",
        }

        try:
            resp = httpx.get(url, params=params, headers=headers,
                           proxies=proxies, timeout=20)
            if resp.status_code != 200:
                break
            data = resp.json()
        except Exception as e:
            print(f"Review page {page} error: {e}")
            break

        html_content = data.get("output", {}).get("reviews", "")
        review_soup = BeautifulSoup(html_content, "lxml")

        page_reviews = []
        for review in review_soup.select(".review-item, li[data-review-id]"):
            reviewer = review.select_one(".reviewer-name, .wt-text-body-01")
            text = review.select_one(".review-text, .wt-body-text")
            stars = review.select("span.icon-star-solid, .wt-icon-star-full")
            date_el = review.select_one(".review-date, span[title]")
            helpful_el = review.select_one(".review-helpfulness-count")

            page_reviews.append({
                "reviewer": reviewer.get_text(strip=True) if reviewer else None,
                "text": text.get_text(strip=True) if text else None,
                "rating": len(stars),
                "date": date_el.get_text(strip=True) if date_el else None,
                "helpful_votes": helpful_el.get_text(strip=True) if helpful_el else "0",
            })

        if not page_reviews:
            break

        reviews.extend(page_reviews)
        print(f"  Reviews page {page}: {len(page_reviews)} reviews")
        time.sleep(random.uniform(1.0, 2.5))

    return reviews

Scraping Shop Pages

Seller shop pages list all their products and show shop-level metrics:

def scrape_shop(shop_name: str, max_pages: int = 5, proxy: str = None) -> dict:
    """Scrape an Etsy shop page for all listings and shop metrics."""
    proxies = {"https": proxy} if proxy else None
    session = curl_requests.Session()

    shop_data = {"shop_name": shop_name, "listings": []}

    for page in range(1, max_pages + 1):
        url = f"https://www.etsy.com/shop/{shop_name}?page={page}"

        try:
            resp = session.get(
                url,
                impersonate="chrome124",
                headers=HEADERS,
                proxies=proxies,
                timeout=30,
            )
            if resp.status_code != 200:
                break
        except Exception as e:
            print(f"Shop page {page} error: {e}")
            break

        soup = BeautifulSoup(resp.text, "lxml")

        # Shop stats (first page only)
        if page == 1:
            sales_el = soup.select_one("[data-shop-sales-count], .shop-sales-count")
            if sales_el:
                shop_data["total_sales"] = sales_el.get_text(strip=True)

            admirers_el = soup.select_one("[data-shop-admirers-count]")
            if admirers_el:
                shop_data["admirers"] = admirers_el.get_text(strip=True)

            description_el = soup.select_one(".shop-home-description, [data-shop-description]")
            if description_el:
                shop_data["description"] = description_el.get_text(strip=True)

            location_el = soup.select_one(".shop-location, [data-shop-location]")
            if location_el:
                shop_data["location"] = location_el.get_text(strip=True)

            rating_el = soup.select_one("[data-shop-rating], .shop-review-count")
            if rating_el:
                shop_data["rating"] = rating_el.get_text(strip=True)

        # Listings on this page
        listings = parse_search_results(resp.text)
        if not listings:
            break

        shop_data["listings"].extend(listings)
        print(f"Shop {shop_name} page {page}: {len(listings)} listings")

        time.sleep(random.uniform(2.0, 4.0))

    return shop_data

Pagination Handling

Etsy search supports pagination via the page parameter. Handle it robustly:

def scrape_all_pages(
    query: str,
    max_results: int = 500,
    proxy: str = None,
) -> list[dict]:
    """Scrape all pages of Etsy search results up to max_results."""
    all_listings = []
    page = 1
    seen_urls = set()

    session = get_etsy_session()

    while len(all_listings) < max_results:
        listings = search_etsy_curl(query, max_pages=1, session=session)

        # Detect end of results
        new_listings = [l for l in listings if l.get("url") not in seen_urls]
        if not new_listings:
            print(f"No new listings on page {page}, stopping")
            break

        for l in new_listings:
            seen_urls.add(l.get("url"))

        all_listings.extend(new_listings)
        print(f"Page {page}: {len(new_listings)} new, {len(all_listings)} total")

        # Etsy limits to ~30 pages (about 600-900 listings per search)
        if page >= 30:
            break

        page += 1
        time.sleep(random.uniform(2.5, 5.0))

    return all_listings[:max_results]

Complete Pipeline: Search, Extract, and Store

Here's a full pipeline that ties search, product parsing, and data storage together:

import sqlite3
from datetime import datetime
from pathlib import Path

def create_db(db_path: str = "etsy_data.db") -> sqlite3.Connection:
    """Set up SQLite database for storing scraped data."""
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS products (
            listing_id TEXT PRIMARY KEY,
            title TEXT,
            price REAL,
            currency TEXT DEFAULT 'USD',
            rating REAL,
            review_count INTEGER,
            seller TEXT,
            url TEXT UNIQUE,
            categories TEXT,
            tags TEXT,
            shipping TEXT,
            thumbnail TEXT,
            scraped_at TEXT
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS reviews (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            listing_id TEXT,
            reviewer TEXT,
            rating INTEGER,
            review_text TEXT,
            review_date TEXT,
            helpful_votes TEXT,
            scraped_at TEXT,
            UNIQUE(listing_id, reviewer, review_text)
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS shops (
            shop_name TEXT PRIMARY KEY,
            total_sales TEXT,
            admirers TEXT,
            rating TEXT,
            location TEXT,
            description TEXT,
            scraped_at TEXT
        )
    """)
    conn.commit()
    return conn

def save_product(conn: sqlite3.Connection, product: dict):
    """Save a parsed product to the database."""
    price_str = product.get("price", "") or ""
    price_match = re.search(r"[\d.]+", price_str.replace(",", ""))
    price_float = float(price_match.group()) if price_match else None

    conn.execute("""
        INSERT OR REPLACE INTO products
        (listing_id, title, price, currency, rating, review_count,
         seller, url, categories, tags, shipping, thumbnail, scraped_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        product.get("listing_id"),
        product.get("name") or product.get("title"),
        price_float,
        product.get("currency", "USD"),
        product.get("rating"),
        product.get("review_count"),
        product.get("seller") or product.get("shop_name"),
        product.get("url"),
        json.dumps(product.get("categories", [])),
        json.dumps(product.get("tags", [])),
        product.get("shipping"),
        product.get("thumbnail"),
        datetime.utcnow().isoformat(),
    ))
    conn.commit()

def run_pipeline(
    query: str,
    max_pages: int = 3,
    scrape_detail_pages: bool = True,
    scrape_reviews_flag: bool = False,
    proxy: str = None,
) -> int:
    """Full Etsy scraping pipeline. Returns number of products saved."""
    conn = create_db()
    saved = 0

    # Step 1: Search
    results = search_etsy_curl(query, max_pages=max_pages)
    print(f"Found {len(results)} listings for '{query}'")

    # Step 2: Optionally fetch detail pages
    for i, result in enumerate(results):
        if not result.get("url"):
            continue

        if scrape_detail_pages:
            try:
                session = curl_requests.Session()
                resp = session.get(
                    result["url"],
                    impersonate="chrome124",
                    headers=HEADERS,
                    proxies={"https": proxy} if proxy else None,
                    timeout=30,
                )
                if resp.status_code == 200:
                    product = parse_listing_page(resp.text)
                    product["listing_id"] = result.get("listing_id")
                    save_product(conn, product)
                    saved += 1

                    # Optionally scrape reviews
                    if scrape_reviews_flag and result.get("listing_id"):
                        reviews = scrape_reviews(result["listing_id"], proxy=proxy)
                        for rev in reviews:
                            conn.execute("""
                                INSERT OR IGNORE INTO reviews
                                (listing_id, reviewer, rating, review_text, review_date, helpful_votes, scraped_at)
                                VALUES (?, ?, ?, ?, ?, ?, ?)
                            """, (
                                result["listing_id"],
                                rev.get("reviewer"), rev.get("rating"),
                                rev.get("text"), rev.get("date"),
                                rev.get("helpful_votes"),
                                datetime.utcnow().isoformat(),
                            ))
                        conn.commit()

            except Exception as e:
                print(f"  [{i+1}] Error: {e}")
        else:
            save_product(conn, result)
            saved += 1

        print(f"  [{i+1}/{len(results)}] {result.get('title', 'N/A')[:60]}")
        time.sleep(random.uniform(2.0, 4.5))

    conn.close()
    print(f"Pipeline complete. {saved} products saved.")
    return saved

# Usage
run_pipeline(
    "handmade leather wallet",
    max_pages=3,
    scrape_detail_pages=True,
    proxy="http://user:[email protected]:9000"
)

Rate Limiting and Politeness

Even with residential proxies, respect Etsy's infrastructure: - Add 1-2 second delays between requests (randomized to look natural) - Rotate user agents from a pool of 10-15 real browser strings - Don't scrape the same seller repeatedly in short windows - Cache results — product data doesn't change minute-to-minute - Run scrapers during off-peak hours (2-6 AM EST) - Cap your concurrency at 2-3 simultaneous requests max

USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:125.0) Gecko/20100101 Firefox/125.0",
]

def get_random_headers() -> dict:
    """Return headers with a rotated user agent."""
    return {**HEADERS, "User-Agent": random.choice(USER_AGENTS)}

Error Handling and Retry Logic

Anti-bot systems occasionally trip even legitimate-looking requests. Build in retries:

import time
from functools import wraps

def with_retry(max_attempts: int = 3, backoff_base: float = 2.0):
    """Decorator for retrying scraping functions with exponential backoff."""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            last_error = None
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    last_error = e
                    if attempt < max_attempts - 1:
                        wait = backoff_base ** attempt + random.uniform(0, 1)
                        print(f"Attempt {attempt+1} failed: {e}. Retrying in {wait:.1f}s")
                        time.sleep(wait)
            raise last_error
        return wrapper
    return decorator

@with_retry(max_attempts=3)
def fetch_listing_with_retry(url: str, proxy: str = None) -> str:
    """Fetch a listing page with automatic retry on failure."""
    session = curl_requests.Session()
    resp = session.get(
        url,
        impersonate="chrome124",
        headers=get_random_headers(),
        proxies={"https": proxy} if proxy else None,
        timeout=30,
    )
    if resp.status_code == 403:
        raise Exception(f"403 blocked — rotate proxy")
    if resp.status_code != 200:
        raise Exception(f"HTTP {resp.status_code}")
    return resp.text

Business Use Cases

Price Monitoring

Track competitor pricing across product categories. Set up daily scrapes and alert on price drops greater than 10%. Useful for sellers who want to stay competitive without manually checking hundreds of listings.

Market Research

Identify trending products by scraping search results for broad categories ("home decor", "personalized gifts") and tracking which listings appear consistently in top positions. Cross-reference with review counts to find products with high demand but few competitors.

def analyze_category_competition(query: str, listings: list[dict]) -> dict:
    """Analyze competitive landscape for a search query."""
    prices = [
        float(re.search(r"[\d.]+", l.get("price", "") or "").group())
        for l in listings
        if l.get("price") and re.search(r"[\d.]+", l.get("price", ""))
    ]
    ratings = [l["rating"] for l in listings if l.get("rating")]
    review_counts = [l["review_count"] for l in listings if l.get("review_count")]

    return {
        "query": query,
        "total_listings": len(listings),
        "avg_price": sum(prices) / len(prices) if prices else 0,
        "min_price": min(prices) if prices else 0,
        "max_price": max(prices) if prices else 0,
        "avg_rating": sum(ratings) / len(ratings) if ratings else 0,
        "avg_reviews": sum(review_counts) / len(review_counts) if review_counts else 0,
        "high_review_products": len([r for r in review_counts if r > 100]),
    }

SEO and Tag Analysis

Extract tags and categories from top-performing listings in your niche. Etsy's search algorithm weighs listing tags heavily — knowing what tags successful competitors use gives you a direct optimization strategy for your own shop.

Review Sentiment Analysis

Aggregate customer reviews across similar products to identify common complaints and feature requests. If 30% of reviews for "leather wallets" mention "card slots too tight", that's a product design opportunity.

Supplier Discovery

Monitor seller shops that consistently rank high for specific product categories. Track their new listings, pricing changes, and review trends to identify potential wholesale suppliers or partnership opportunities.

Scraping public product data from Etsy for non-commercial research is generally permissible under the hiQ v. LinkedIn line of cases. Etsy's ToS prohibits automated access, but enforcement targets accounts violating marketplace rules (fake reviews, listing manipulation). Scraping public pages without logging in, without overloading their servers, and without republishing copyrighted content falls in a different legal category.

Key rules: don't scrape behind login walls, don't copy product images for your own listings, don't republish review text as your own content. For commercial use cases, consult your counsel.


See also: How to Scrape Amazon Product Data | Scraping TripAdvisor Reviews | Proxy Types for Web Scraping