← Back to blog

Scraping Zara Product Data: Catalog, Stock Levels and Sale Tracking (2026)

Scraping Zara Product Data: Catalog, Stock Levels and Sale Tracking (2026)

Zara's business model is built on speed. New collections drop twice a week. Items sell out in hours. Prices change without warning and sale items disappear fast. If you're building a fashion price tracker, a resale arbitrage tool, or doing competitive research on fast fashion inventory behavior, Zara is one of the most data-dense targets you can scrape — and one of the harder ones to do reliably.

What Data Is Available

Zara's catalog exposes a surprising amount of structured data through its internal API:

Commercial Use Cases

People build real businesses on Zara product data:

Anti-Bot Measures

Zara runs Akamai Bot Manager across its infrastructure. This is enterprise-grade bot detection, not a trivial hurdle.

TLS fingerprinting. Akamai inspects the TLS ClientHello at the connection layer — cipher suite ordering, supported extensions, GREASE values. A stock Python httpx or requests client produces a fingerprint that is trivially distinguishable from Chrome. You need to either use a library that mimics Chrome's TLS stack (like curl_cffi with impersonate="chrome") or route through proxies whose TLS termination handles this for you.

JavaScript challenges. Akamai injects sensor data collection scripts that compute a behavioral fingerprint — mouse movement, timing, canvas rendering — and bundle it into a _abck cookie. This cookie is validated server-side on subsequent requests. Without a valid _abck, category and product API endpoints return 403s or redirect loops.

IP reputation scoring. Akamai maintains reputation scores for IP ranges. Datacenter ASNs (AWS, GCP, Hetzner, DigitalOcean) are blocklisted by default. Even a single request from a flagged IP will fail the reputation check before any other signal is evaluated.

Rate limiting. Requests beyond roughly 30-40 per minute per session trigger soft throttling — responses still return 200 but product arrays come back empty. Hard rate limiting kicks in faster on search and category endpoints than on individual product detail calls.

Zara's Internal API

Zara's frontend communicates with a versioned internal API at www.zara.com/itxrest/. You can discover the exact endpoints by opening Chrome DevTools, navigating to the Network tab, filtering by Fetch/XHR, and browsing a category page or product page on Zara.com. The API calls are visible immediately — look for requests to itxrest/2/catalog/ paths.

The two most useful endpoints:

Store IDs are country-specific. The US store ID is 11719. UK is 10701. Germany is 10103. Spain (home market) is 10706. You can find others by watching the network requests on country-specific Zara URLs, or by hitting the store discovery endpoint.

import httpx
import json
import time
import random

STORE_ID = "11719"  # US store
HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.zara.com/us/",
    "Origin": "https://www.zara.com",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
}

Fetching Category Products

def fetch_category_products(category_id: str,
                              store_id: str = STORE_ID,
                              proxy: str = None) -> list[dict]:
    """Fetch all products listed in a Zara category."""
    url = (f"https://www.zara.com/itxrest/2/catalog/"
           f"store/{store_id}/category/{category_id}/product")
    params = {
        "languageId": -1,
        "appVersion": "5.27.0",
    }

    with httpx.Client(
        headers=HEADERS,
        proxy=proxy,
        follow_redirects=True,
        timeout=20,
    ) as client:
        resp = client.get(url, params=params)

    if resp.status_code != 200:
        print(f"Category {category_id} failed: {resp.status_code}")
        return []

    data = resp.json()
    product_groups = data.get("productGroups", [])

    # Flatten product groups into individual products
    products = []
    for group in product_groups:
        for element in group.get("elements", []):
            product = element.get("commercialComponents", [{}])[0]
            if product:
                products.append(product)

    return products


def discover_categories(store_id: str = STORE_ID,
                         proxy: str = None) -> list[dict]:
    """Discover all available category IDs for a store."""
    url = (f"https://www.zara.com/itxrest/2/catalog/"
           f"store/{store_id}/category")
    params = {"languageId": -1, "appVersion": "5.27.0"}

    with httpx.Client(
        headers=HEADERS, proxy=proxy,
        follow_redirects=True, timeout=20,
    ) as client:
        resp = client.get(url, params=params)

    if resp.status_code != 200:
        return []

    data = resp.json()
    categories = []

    def walk_categories(cats, parent_name=""):
        for cat in cats:
            if not isinstance(cat, dict):
                continue
            cat_id = cat.get("id")
            cat_name = cat.get("name", "")
            full_name = f"{parent_name} > {cat_name}" if parent_name else cat_name

            if cat_id:
                categories.append({
                    "id": str(cat_id),
                    "name": full_name,
                    "key": cat.get("seoName", ""),
                })

            # Recurse into subcategories
            for sub_key in ["subcategories", "subCategories", "children"]:
                subs = cat.get(sub_key, [])
                if subs:
                    walk_categories(subs, full_name)

    cats = data.get("categories", data.get("categories", []))
    if isinstance(cats, list):
        walk_categories(cats)

    return categories

Fetching Product Detail with Stock Levels

def fetch_product_detail(product_id: str,
                           store_id: str = STORE_ID,
                           proxy: str = None) -> dict:
    """Fetch full product detail including size/color stock."""
    url = (f"https://www.zara.com/itxrest/2/catalog/"
           f"store/{store_id}/product/{product_id}/detail")
    params = {"languageId": -1}

    with httpx.Client(
        headers=HEADERS,
        proxy=proxy,
        follow_redirects=True,
        timeout=20,
    ) as client:
        resp = client.get(url, params=params)

    if resp.status_code != 200:
        return {}

    return resp.json()


def extract_stock_snapshot(product_data: dict) -> list[dict]:
    """Parse stock levels per size/color from a product detail response."""
    snapshots = []
    product_id = product_data.get("productId")
    name = product_data.get("name", "")
    reference = product_data.get("reference", "")

    # Prices are in cents
    price = product_data.get("price", 0) / 100
    sale_price = None
    special = product_data.get("specialPrice")
    if special:
        sale_price = special.get("price", 0) / 100

    discount_rate = product_data.get("saleDiscountRate")

    for color in product_data.get("detail", {}).get("colors", []):
        color_name = color.get("name", "")
        color_id = color.get("id", "")

        for size in color.get("sizes", []):
            snapshots.append({
                "product_id": product_id,
                "name": name,
                "reference": reference,
                "color": color_name,
                "color_id": color_id,
                "size_name": size.get("name"),
                "size_id": size.get("id"),
                "availability": size.get("availability"),
                "price": price,
                "sale_price": sale_price,
                "discount_rate": discount_rate,
                "is_on_sale": sale_price is not None,
            })

    return snapshots

Proxy Configuration

Akamai's IP reputation system is the first gate. Requests from datacenter IPs fail before any session or fingerprint check even runs. ThorData's residential proxies route traffic through real ISP-assigned IPs, which clears Akamai's ASN reputation filter. For Zara US, use US residential IPs — the store API returns different product sets depending on geolocation, and mismatched country signals (US store ID + non-US IP) increase detection probability.

PROXY_USER = "your_user"
PROXY_PASS = "your_pass"
PROXY_HOST = "proxy.thordata.com"
PROXY_PORT = 9000


def get_proxy(country: str = "us") -> str:
    """Build a ThorData residential proxy URL."""
    return (
        f"http://{PROXY_USER}:{PROXY_PASS}"
        f"@{PROXY_HOST}:{PROXY_PORT}?country={country}"
    )


PROXY = get_proxy("us")

Session rotation matters too. Don't reuse a single proxy connection for more than 15-20 requests. ThorData supports sticky sessions (same IP for a defined duration) and rotating sessions — use rotating for catalog crawls and sticky for multi-request product detail fetches where you need consistent session state.

SQLite Schema for Inventory Tracking

The key is capturing each check as a snapshot row rather than overwriting the previous state — that's what lets you detect when a size came back in stock or a price dropped:

import sqlite3
from datetime import datetime

def init_db(db_path: str = "zara.db") -> sqlite3.Connection:
    """Initialize SQLite schema for Zara inventory tracking."""
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS products (
            product_id TEXT PRIMARY KEY,
            name TEXT,
            reference TEXT,
            category_id TEXT,
            store_id TEXT,
            first_seen TEXT,
            last_seen TEXT
        );

        CREATE TABLE IF NOT EXISTS price_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_id TEXT,
            price REAL,
            sale_price REAL,
            discount_rate REAL,
            is_on_sale INTEGER DEFAULT 0,
            checked_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS stock_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_id TEXT,
            color TEXT,
            color_id TEXT,
            size_name TEXT,
            size_id TEXT,
            availability TEXT,
            checked_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS restock_events (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_id TEXT,
            color TEXT,
            size_name TEXT,
            previous_status TEXT,
            new_status TEXT,
            detected_at TEXT DEFAULT (datetime('now'))
        );

        CREATE INDEX IF NOT EXISTS idx_stock_product
            ON stock_snapshots(product_id, checked_at);
        CREATE INDEX IF NOT EXISTS idx_price_product
            ON price_history(product_id, checked_at);
        CREATE INDEX IF NOT EXISTS idx_restock_product
            ON restock_events(product_id);
    """)
    conn.commit()
    return conn


def save_snapshot(conn: sqlite3.Connection,
                   snapshots: list[dict]) -> None:
    """Save stock snapshot and detect changes."""
    now = datetime.utcnow().isoformat()

    for s in snapshots:
        pid = s["product_id"]
        color = s["color"]
        size = s["size_name"]
        status = s["availability"]

        # Check previous status for this size/color
        prev = conn.execute("""
            SELECT availability FROM stock_snapshots
            WHERE product_id = ? AND color = ? AND size_name = ?
            ORDER BY checked_at DESC LIMIT 1
        """, (pid, color, size)).fetchone()

        # Detect restock events
        if prev and prev[0] != status:
            if (prev[0] in ("out_of_stock", "low_on_stock")
                    and status == "in_stock"):
                conn.execute("""
                    INSERT INTO restock_events
                    (product_id, color, size_name,
                     previous_status, new_status, detected_at)
                    VALUES (?,?,?,?,?,?)
                """, (pid, color, size, prev[0], status, now))
                print(f"  RESTOCK: {s['name']} "
                      f"{color} {size}: {prev[0]} -> {status}")

        # Save stock snapshot
        conn.execute("""
            INSERT INTO stock_snapshots
            (product_id, color, color_id, size_name, size_id,
             availability, checked_at)
            VALUES (?,?,?,?,?,?,?)
        """, (pid, color, s.get("color_id"), size,
               s.get("size_id"), status, now))

        # Save price record
        conn.execute("""
            INSERT INTO price_history
            (product_id, price, sale_price, discount_rate,
             is_on_sale, checked_at)
            VALUES (?,?,?,?,?,?)
        """, (
            pid, s.get("price"), s.get("sale_price"),
            s.get("discount_rate"),
            1 if s.get("is_on_sale") else 0, now,
        ))

    conn.commit()


def detect_price_drops(conn: sqlite3.Connection,
                        hours_back: int = 24) -> list[dict]:
    """Find items where price dropped in the last N hours."""
    drops = []
    cutoff = datetime.utcnow().isoformat()

    for row in conn.execute("""
        SELECT p.product_id, p.name,
               ph1.price as old_price, ph2.price as new_price,
               ph2.sale_price, ph2.discount_rate
        FROM products p
        JOIN price_history ph1 ON p.product_id = ph1.product_id
        JOIN price_history ph2 ON p.product_id = ph2.product_id
        WHERE ph2.price < ph1.price
          AND ph2.checked_at > ph1.checked_at
          AND ph2.is_on_sale = 1
        GROUP BY p.product_id
        ORDER BY (ph1.price - ph2.price) DESC
    """):
        drops.append({
            "product_id": row[0],
            "name": row[1],
            "old_price": row[2],
            "new_price": row[3],
            "sale_price": row[4],
            "discount_pct": (
                (row[2] - row[3]) / row[2] * 100 if row[2] else 0
            ),
        })

    return drops

Full Category Monitor

def monitor_category(category_id: str,
                      store_id: str = STORE_ID,
                      proxy: str = None,
                      db_path: str = "zara.db") -> None:
    """Run a full category stock and price check."""
    conn = init_db(db_path)
    now = datetime.utcnow().isoformat()

    print(f"Fetching category {category_id}...")
    products = fetch_category_products(
        category_id, store_id=store_id, proxy=proxy
    )
    print(f"Found {len(products)} products")

    for i, product in enumerate(products, 1):
        pid = str(product.get("id", ""))
        name = product.get("name", "")
        if not pid:
            continue

        # Register product
        conn.execute("""
            INSERT OR IGNORE INTO products
            (product_id, name, category_id, store_id, first_seen, last_seen)
            VALUES (?,?,?,?,?,?)
        """, (pid, name, category_id, store_id, now, now))
        conn.execute("""
            UPDATE products SET last_seen = ? WHERE product_id = ?
        """, (now, pid))

        print(f"  [{i}/{len(products)}] {name[:40]}...", end=" ")

        # Fetch detail
        detail = fetch_product_detail(pid, store_id=store_id, proxy=proxy)
        if detail:
            snapshots = extract_stock_snapshot(detail)
            save_snapshot(conn, snapshots)
            in_stock = sum(
                1 for s in snapshots if s["availability"] == "in_stock"
            )
            price = snapshots[0]["price"] if snapshots else 0
            print(f"${price:.2f}, {in_stock}/{len(snapshots)} sizes in stock")
        else:
            print("(no detail data)")

        time.sleep(random.uniform(1.5, 3.0))

    # Report price drops found
    drops = detect_price_drops(conn)
    if drops:
        print(f"\n=== {len(drops)} price drops detected ===")
        for drop in drops[:5]:
            print(f"  {drop['name'][:40]}: "
                  f"${drop['old_price']:.2f} -> "
                  f"${drop['new_price']:.2f} "
                  f"(-{drop['discount_pct']:.0f}%)")

    conn.close()

Practical Tips and Gotchas

Category IDs change. Zara reorganizes its category structure seasonally. Don't hardcode category IDs — discover them by hitting the category endpoint and walking the tree. Cache the structure and refresh weekly.

Prices are in cents. The API returns integers (e.g., 2995 = $29.95). Divide by 100 before storing.

Low stock is a signal. The "low_on_stock" availability status often precedes full sellout by 1-3 hours during drops. If you're building a restock alert system, track transitions from out_of_stock back to in_stock and from in_stock to low_on_stock.

Sales follow a pattern. Zara markdown events typically hit in late January and late June for major clearance, with smaller mid-season markdowns. During sale periods, the saleDiscountRate field activates on eligible items — filter on this field to track markdown depth and progression over time.

The _abck cookie problem. For long-running scrapes, you'll eventually encounter Akamai's behavioral challenge even with residential IPs. If you start seeing 403s on endpoints that were working, rotate your proxy session and add a 30-60 second pause. If 403s persist, the session is flagged — drop it entirely and start fresh. ThorData's rotating residential pool handles this automatically since each rotation gives you a fresh IP identity.

Don't scrape images at scale. Image CDN requests go through a different infrastructure layer and get rate-limited independently. If you need images, store the URLs and fetch them lazily rather than pulling all image assets during your catalog crawl.

Multi-country pricing differences. Zara prices the same item differently across markets. Compare the same product across US, UK, and Spain store IDs to see pricing strategy. Some items are only available in certain markets.

Zara's Terms of Service prohibit automated scraping of their website and APIs. Their internal API endpoints are not publicly documented and accessing them may be considered unauthorized access under computer fraud laws in some jurisdictions. Additionally, using this data for commercial purposes — particularly for competitive pricing tools or resale — may trigger additional legal review. Consult legal counsel before building a commercial product on Zara API data.

Multi-Country Price Comparison

One underutilized capability of Zara's API structure is comparing the same product across multiple country stores. Zara prices the same item at different price points in different markets:

STORE_IDS = {
    "us": "11719",
    "uk": "10701",
    "de": "10103",
    "es": "10706",
    "fr": "10209",
    "it": "10211",
}

COUNTRY_CURRENCY = {
    "us": "USD",
    "uk": "GBP",
    "de": "EUR",
    "es": "EUR",
    "fr": "EUR",
    "it": "EUR",
}


def compare_product_prices(product_id: str,
                             proxy_fn=None) -> dict:
    """Fetch the same product across all stores and compare pricing."""
    prices = {}

    for country, store_id in STORE_IDS.items():
        proxy = proxy_fn(country) if proxy_fn else None
        detail = fetch_product_detail(
            product_id, store_id=store_id, proxy=proxy
        )
        if not detail:
            continue

        price_cents = detail.get("price", 0)
        sale_cents = None
        special = detail.get("specialPrice")
        if special:
            sale_cents = special.get("price", 0)

        prices[country] = {
            "price": price_cents / 100,
            "sale_price": sale_cents / 100 if sale_cents else None,
            "currency": COUNTRY_CURRENCY[country],
            "is_available": bool(detail.get("detail", {}).get("colors")),
        }

        time.sleep(random.uniform(1, 2))

    return prices


# Find the cheapest country for a product
def find_cheapest_market(product_id: str) -> tuple[str, float]:
    prices = compare_product_prices(product_id)
    available = {
        k: v for k, v in prices.items()
        if v["is_available"] and v["price"] > 0
    }
    if not available:
        return None, 0

    cheapest = min(available.items(), key=lambda x: x[1]["price"])
    return cheapest[0], cheapest[1]["price"]

Sale Detection and Discount Depth Analysis

Zara's markdown strategy is more systematic than it appears. Tracking discount depth over time reveals patterns:

def analyze_sale_patterns(conn: sqlite3.Connection,
                            min_samples: int = 3) -> None:
    """Analyze sale timing and discount depth patterns."""
    print("=== Sale Pattern Analysis ===\n")

    # Average discount depth when items go on sale
    for row in conn.execute("""
        SELECT
            STRFTIME('%m', checked_at) as month,
            COUNT(DISTINCT product_id) as items_on_sale,
            AVG(
                CASE WHEN price > 0 AND sale_price > 0
                THEN (price - sale_price) / price * 100
                ELSE 0 END
            ) as avg_discount_pct
        FROM price_history
        WHERE is_on_sale = 1
        GROUP BY month
        ORDER BY month
    """):
        print(f"  Month {row[0]}: {row[1]} items on sale, "
              f"{row[2]:.1f}% avg discount")

    # Items that restocked after selling out
    print("\nMost-restocked items (high demand signal):")
    for row in conn.execute("""
        SELECT p.name, COUNT(*) as restock_events
        FROM restock_events re
        JOIN products p ON re.product_id = p.product_id
        WHERE re.new_status = 'in_stock'
          AND re.previous_status = 'out_of_stock'
        GROUP BY re.product_id
        ORDER BY restock_events DESC
        LIMIT 10
    """):
        print(f"  {row[0][:40]}: {row[1]} restocks")

    # Time from 'in_stock' to 'out_of_stock' (sell-through speed)
    print("\nFastest-selling items (by avg hours to sell out):")
    for row in conn.execute("""
        SELECT
            p.name,
            ss1.color,
            ss1.size_name,
            MIN(
                (JULIANDAY(ss2.checked_at) - JULIANDAY(ss1.checked_at)) * 24
            ) as hours_to_sell
        FROM stock_snapshots ss1
        JOIN stock_snapshots ss2
            ON ss1.product_id = ss2.product_id
            AND ss1.color = ss2.color
            AND ss1.size_name = ss2.size_name
            AND ss1.availability = 'in_stock'
            AND ss2.availability = 'out_of_stock'
            AND ss2.checked_at > ss1.checked_at
        JOIN products p ON ss1.product_id = p.product_id
        GROUP BY ss1.product_id, ss1.color, ss1.size_name
        HAVING hours_to_sell < 48
        ORDER BY hours_to_sell
        LIMIT 10
    """):
        print(f"  {row[0][:30]} {row[1]} {row[2]}: "
              f"sold out in {row[3]:.1f}h")

Handling New Collection Drops

Zara drops new collections twice a week, typically Monday/Thursday or Tuesday/Friday. During drops, new product IDs appear in category listings that weren't there before. A simple diff-based detection:

def detect_new_products(category_id: str, conn: sqlite3.Connection,
                          store_id: str = STORE_ID,
                          proxy: str = None) -> list[str]:
    """Detect products that appeared since the last check."""
    # Get product IDs from the current API response
    current_products = fetch_category_products(
        category_id, store_id=store_id, proxy=proxy
    )
    current_ids = {
        str(p.get("id")) for p in current_products if p.get("id")
    }

    # Get product IDs we already know about for this category
    known_ids = {
        row[0] for row in conn.execute(
            "SELECT product_id FROM products WHERE category_id = ?",
            (category_id,)
        )
    }

    new_ids = current_ids - known_ids
    if new_ids:
        print(f"New products in category {category_id}: "
              f"{len(new_ids)}")
        for pid in new_ids:
            # Find the name for this new product
            product = next(
                (p for p in current_products
                 if str(p.get("id")) == pid), {}
            )
            print(f"  NEW: {product.get('name', pid)}")

    return list(new_ids)

Running this check every few hours during collection drop days lets you be among the first to detect and notify about new items — which is the core value proposition of a restock/new-drop alert service.