Scraping AliExpress Products Without Getting Blocked (2026)

March 30, 2026 · 22 min read

Contents Introduction Why AliExpress scraping is hard in 2026 Setting up your environment Step-by-step: Playwright with residential proxies Extracting structured data from window.__INIT_DATA__ Output format and data schema Scraping search results and category pages Anti-detection techniques that actually work Rate limiting and proxy rotation strategies Common errors and how to fix them Building a batch scraper with retry logic Real-world use cases Using a ready-made scraper Conclusion

Introduction

AliExpress is one of the largest e-commerce platforms in the world, with over 100 million products listed across virtually every consumer category. For anyone working in e-commerce intelligence, price monitoring, dropshipping research, or competitive analysis, getting reliable product data from AliExpress is not optional -- it is table stakes.

The problem is that AliExpress does not want you scraping their data. They have invested heavily in bot detection, JavaScript-rendered content, and IP-based rate limiting that makes naive scraping approaches completely useless. If you have ever tried to use Python's requests library to fetch an AliExpress product page, you already know the result: you get a page full of empty divs, "loading..." placeholders, and zero actual product data.

This guide is the result of months of trial and error building production scraping pipelines that pull data from AliExpress reliably. I will walk you through exactly what works in 2026, what does not, and the specific techniques that let you extract product data without getting your IP addresses burned. We will cover everything from the initial environment setup to handling AliExpress's specific anti-bot measures, building retry logic for production use, and structuring the extracted data into clean, usable formats.

Whether you are building a price comparison tool, researching suppliers for a dropshipping business, monitoring competitor pricing, or feeding data into an analytics pipeline, this guide gives you the complete playbook. Every code example is tested and working as of March 2026.

Why AliExpress scraping is hard in 2026

Unlike most retail sites that serve product data in clean HTML, AliExpress uses a combination of server-rendered shells and JavaScript-populated content. The initial HTML response contains the page layout and navigation, but nearly all product-specific data -- prices, shipping info, seller ratings, SKU variants -- is injected by JavaScript after the page loads. This immediately eliminates any approach based on simple HTTP requests and HTML parsing.

The second layer of difficulty is AliExpress's bot detection system, which has become significantly more sophisticated over the past two years. It operates on multiple signals simultaneously:

IP reputation scoring -- AliExpress maintains a database of IP ranges associated with datacenter providers, VPN services, and known proxy networks. Requests from these IPs face heightened scrutiny or immediate blocking.
TLS fingerprinting -- the TLS handshake from Python's requests or httpx libraries produces a JA3 fingerprint that is trivially distinguishable from a real Chrome browser. AliExpress checks this.
Request velocity tracking -- the system monitors request frequency per IP, per cookie session, and per device fingerprint. Hit it too fast and you get soft-blocked with CAPTCHA challenges.
Browser fingerprint validation -- when using headless browsers, AliExpress checks for telltale signs like navigator.webdriver === true, missing browser plugins, and Chrome DevTools Protocol artifacts.
Geographic consistency -- if your IP geolocates to Germany but your browser language is set to en-US and the timezone is UTC-8, that inconsistency raises a flag.

The third problem is structural. AliExpress frequently changes their page layout, CSS class names, and the internal data structure of their JavaScript payloads. A scraper that works perfectly today might break next month when they reorganize their frontend code. Any production scraper needs to be designed with this brittleness in mind.

Despite all of this, AliExpress scraping is entirely feasible with the right approach. The key insight is that AliExpress embeds most product data in a JavaScript variable called window.__INIT_DATA__, and this variable is far more stable than the visual DOM structure. Combined with a properly configured headless browser and residential proxy rotation, you can build scrapers that work reliably for months at a time.

Setting up your environment

Before writing any scraping code, you need the right tools installed. Here is the complete setup:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Project structure

aliexpress-scraper/
    scraper.py          # Main scraping logic
    search_scraper.py   # Search result scraping
    batch_runner.py     # Batch processing with retry logic
    config.py           # Proxy and settings configuration
    output/             # JSON output directory
    logs/               # Error and debug logs

Configuration file

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Step-by-step: Playwright with residential proxies

Playwright is the only reliable approach for AliExpress in 2026. It runs a real Chromium browser, executes JavaScript, and produces authentic browser fingerprints that pass AliExpress's detection. Here is the complete scraper, explained step by step.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Let me break down the critical parts of this code:

Stealth patches -- the playwright-stealth library modifies browser properties that headless detection scripts check. Without it, AliExpress detects you within the first few requests.
Network idle wait -- wait_until="networkidle" tells Playwright to wait until there have been no network requests for 500ms. This ensures all JavaScript has finished populating the page data.
Dual extraction -- we first try window.__INIT_DATA__ for structured data, then fall back to DOM selectors. The JavaScript variable is more reliable and contains more fields.
CAPTCHA detection -- checking for CAPTCHA keywords in the page content lets us fail fast rather than trying to parse a block page.

Extracting structured data from window.__INIT_DATA__

The window.__INIT_DATA__ variable is the single most valuable data source on any AliExpress product page. It is a large JSON object (often 50-200KB) that contains virtually everything about the product, the seller, shipping options, and SKU pricing. It exists because AliExpress's React frontend needs this data to render the page, and they inject it into the page as a global variable during server-side rendering.

The structure changes periodically as AliExpress refactors their frontend, but the overall shape has been stable since mid-2025. Here are the key paths you need to know:

Fields you'll need from a product page

Field	Contains
Product info	Title, product ID, category, item specifics
Pricing	Current price, original price, discount percentage, currency
Sales	Total sold count, formatted sales number
Reviews	Average star rating, total reviews, positive rate
Seller	Store name, ID, rating, follower count, years active
Shipping	Ships-from country, free shipping flag, estimated delivery
SKUs / variants	All SKU variants with prices, images, stock status
Images	Product image URLs (full resolution)
Description	Product description HTML and specification table
Related	Related products and category breadcrumbs

A managed actor returns these as flat, documented fields — you don't need to maintain selectors or internal JSON paths.

Handling missing or changed paths

Because these paths can change, always use defensive access with fallbacks:

# With a managed actor you get flat, documented fields — no need to walk internal JSON
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'productUrls': ['https://www.aliexpress.com/item/1005006123456789.html']}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    price = item.get('price', 'N/A')
    sold = item.get('sold_count', '0')

Additionally, log the raw __INIT_DATA__ structure when your parser encounters unexpected shapes. This makes debugging much easier when AliExpress changes their data format:

import os
from datetime import datetime

def dump_debug_data(init_data: dict, url: str):
    """Save raw init data for debugging when parsing fails."""
    os.makedirs("debug", exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"debug/init_data_{timestamp}.json"
    with open(filename, "w") as f:
        json.dump({"url": url, "data": init_data}, f, indent=2)
    logger.info(f"Saved debug data to {filename}")

Output format and data schema

Here is the complete JSON schema for a successfully scraped product. Every field in this schema maps to a specific extraction in the parse_init_data function above:

{
  "title": "Wireless Bluetooth Earbuds TWS Headphones",
  "product_id": "1005006123456789",
  "category_id": "44",
  "price": "US $12.99",
  "original_price": "US $25.98",
  "discount": "50%",
  "currency": "USD",
  "sold_count": "5,000+",
  "sold_count_raw": 5234,
  "star_rating": "4.8",
  "review_count": 1247,
  "positive_rate": "96.2%",
  "seller_name": "TechGadgets Official Store",
  "seller_id": "912345678",
  "seller_rating": "97.5%",
  "store_followers": 45230,
  "ships_from": "CN",
  "free_shipping": true,
  "variants": [
    {
      "name": "Color",
      "options": [
        {"name": "Black", "image": "https://ae01.alicdn.com/..."},
        {"name": "White", "image": "https://ae01.alicdn.com/..."}
      ]
    },
    {
      "name": "Ships From",
      "options": [
        {"name": "China", "image": ""},
        {"name": "United States", "image": ""}
      ]
    }
  ],
  "images": [
    "https://ae01.alicdn.com/kf/image1.jpg",
    "https://ae01.alicdn.com/kf/image2.jpg"
  ],
  "source_url": "https://www.aliexpress.com/item/1005006123456789.html",
  "scrape_status": "success"
}

For batch operations, wrap the results in a container with metadata:

{
  "scrape_run": {
    "timestamp": "2026-03-30T14:22:00Z",
    "total_urls": 50,
    "successful": 47,
    "failed": 3,
    "avg_time_per_product": 8.2
  },
  "products": [
    { "...product data..." },
    { "...product data..." }
  ],
  "errors": [
    {"url": "https://...", "error": "CAPTCHA detected", "timestamp": "..."}
  ]
}

Scraping search results and category pages

Product page scraping gets you detailed data on individual items, but often you need to discover products first. AliExpress search and category pages list dozens of products per page with summary data.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Note on selectors: AliExpress uses CSS module hashing, so class names contain random hashes like multi--titleText--3eOiq. Using partial matches with [class*='multi--titleText'] is more resilient than exact class names.

Anti-detection techniques that actually work

AliExpress's bot detection is multi-layered. Here are the specific techniques that matter, ranked by impact:

1. Residential proxies (critical)

This is the single most important factor. From a datacenter IP (AWS, GCP, DigitalOcean), you will get blocked within 5-10 requests regardless of what other techniques you use. Residential proxies route your traffic through real ISP connections, making your requests indistinguishable from a real home user.

For AliExpress specifically, ThorData provides residential proxies with geo-targeting that works well. Their ability to target specific countries matters because AliExpress shows different pricing and availability based on the requester's location. If you are tracking prices for a US-facing dropshipping store, you want your proxy to exit from a US residential IP.

2. Browser fingerprint stealth (important)

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

3. Request timing randomization (important)

# Use the managed scraper — no maintenance, no blocks, no auth headaches
from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')  # get yours at apify.com

run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'keywords': ['bluetooth earbuds'], 'maxResults': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

4. Session management (moderate)

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

5. Geographic consistency (moderate)

# Use the managed scraper — no maintenance, no blocks, no auth headaches
from apify_client import ApifyClient

client = ApifyClient('YOUR_API_TOKEN')  # get yours at apify.com

run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'keywords': ['bluetooth earbuds'], 'maxResults': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Rate limiting and proxy rotation strategies

Even with perfect anti-detection, hitting AliExpress too fast from any single IP will trigger rate limiting. Here is a production-ready rotation strategy:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

For AliExpress specifically, these rate limits apply as of 2026:

Product pages: 15-20 requests per hour per IP before soft blocks appear
Search pages: 8-12 requests per hour per IP (more aggressive limiting)
After a soft block: wait at least 30 minutes before retrying from the same IP
After a CAPTCHA: that IP is likely flagged for 2-4 hours. Switch to a different proxy.

Common errors and how to fix them

TimeoutError waiting for selector

Cause: The page either did not load (network issue) or loaded a different page than expected (CAPTCHA, auth wall, country redirect).

Fix: Check what page actually loaded before waiting for selectors:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Empty price field (shows "loading..." or blank)

Cause: JavaScript did not finish executing before extraction. Or the product uses dynamic pricing that requires additional API calls.

Fix: Use window.__INIT_DATA__ instead of DOM selectors. The data is available in the JS variable even before it renders in the DOM.

HTTP 403 on all requests

Cause: Your IP range is blocked. Datacenter IPs almost always get 403.

Fix: Switch to residential proxies. If you are already using residential proxies, your proxy provider's IP pool might be burned for AliExpress. Try a different provider or different geographic region.

Product shows different price than expected

Cause: AliExpress serves different prices based on geographic location, user history, and whether the visitor appears to be a new customer.

Fix: Standardize your locale, timezone, and proxy location. For consistent pricing data, always use the same country for proxy exit and browser locale.

"Suspicious activity" page

Cause: AliExpress's bot detection has flagged your session. This is more severe than a CAPTCHA -- it usually means your browser fingerprint or behavior pattern was flagged.

Fix: Kill the browser instance entirely. Rotate to a fresh proxy. Wait at least 10 minutes before retrying. Do not reuse any cookies or session state from the flagged session.

window.__INIT_DATA__ is empty or undefined

Cause: The page loaded an error state, or AliExpress has changed how they inject the data for certain product types (e.g., digital products, pre-order items).

Fix: Add a retry with a fresh session. If it consistently fails for specific products, those products may use a different frontend rendering path. Fall back to DOM extraction for those items.

Building a batch scraper with retry logic

For production use, you need a batch processor that handles failures gracefully and retries with different proxies:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/aliexpress-scraper').call(
    run_input={'searchKeywords': ['wireless earbuds'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Real-world use cases

1. Dropshipping product research

The most common use case. Dropshippers need to find products with high sales volume, good ratings, and reliable sellers. By scraping search results for trending categories and then deep-scraping the top products, you can identify winning products before they saturate the market. Key fields: sold_count, star_rating, seller_rating, free_shipping.

2. Price monitoring and alerts

E-commerce businesses that source from AliExpress need to know when supplier prices change. A daily scrape of your product catalog URLs, compared against historical prices stored in a database, lets you trigger alerts when a product's price drops (buying opportunity) or spikes (time to find an alternative supplier). Key fields: price, original_price, discount.

3. Competitive intelligence

If you sell on Amazon or Shopify, knowing the AliExpress source price for competing products tells you whether competitors are operating on thin margins or have room to undercut you. Cross-reference AliExpress product titles with Amazon listings to map the supply chain. Key fields: title, price, images (for visual matching).

4. Market trend analysis

Aggregating search result data across categories over time reveals which product types are gaining or losing traction. A product that shows accelerating sales velocity (increasing sold_count between weekly scrapes) is trending up. Key fields: sold_count_raw, review_count, category_id.

5. Supplier quality scoring

Before committing to a supplier for bulk orders, scrape all products from their store and aggregate their ratings. Sellers with consistently high star ratings and positive feedback rates across many products are more reliable than sellers with a single highly-rated product. Key fields: seller_rating, store_followers, star_rating, positive_rate.

Use a ready-made scraper

If you would rather not maintain Playwright code, proxy rotation, and anti-detection patches yourself, managed scraping tools handle all of this for you.

I built an AliExpress Product Scraper on Apify that returns 20+ fields per product including soldCount, starRating, reviewCount, originalPrice, discount, full SKU variant data, and seller metrics. It handles proxy rotation, CAPTCHA detection, retries, and AliExpress's constantly-changing page structure internally.

The advantage is zero maintenance. When AliExpress changes their frontend (which happens every few weeks), the managed scraper absorbs the update. Your pipeline keeps running without you debugging broken selectors at midnight.

Conclusion

AliExpress scraping in 2026 comes down to three non-negotiable requirements: a headless browser (Playwright), residential proxies, and respect for rate limits. Skip any one of these and you will spend more time fighting blocks than actually collecting data.

The window.__INIT_DATA__ approach is the most durable extraction method because it pulls from the same data source that AliExpress's own frontend uses. DOM selectors break monthly; the JavaScript data structure changes maybe twice a year.

For small-scale research (under 100 products per day), the code in this guide running with a few residential proxies is more than sufficient. For larger volumes, consider the managed Apify scraper or building out a distributed system with a proper proxy rotation infrastructure.

Key takeaway: Start with the simplest approach that works. Get your data pipeline producing results first, then optimize for scale. Over-engineering your scraper before you know what data you actually need is the most common mistake.

Built by Crypto Volume Signal Scanner -- tools for developers who work with web data. See also: Scrape Google Search Results | LinkedIn Data Without the API | YouTube Stats Without the API