← Back to blog

How to Scrape AliExpress Product Data in 2026: Prices, Reviews & Seller Ratings

How to Scrape AliExpress Product Data in 2026: Prices, Reviews & Seller Ratings

AliExpress has hundreds of millions of product listings across every category imaginable. Prices fluctuate constantly — the same item from ten different sellers can vary by 300%. Seller ratings and review counts tell you who's actually moving product versus who's a ghost storefront. And regional pricing means what a buyer sees in Poland differs from what they see in the US.

For dropshippers, price comparison tools, and market researchers, this data is genuinely useful. The problem is getting it.

What's Worth Scraping

AliExpress product pages expose a solid set of data points:

That's a lot of data that no official API exposes cleanly. AliExpress does have a partner API, but it's restricted, rate-limited, and doesn't give you the seller analytics you actually want.

AliExpress Anti-Bot Measures

AliExpress is owned by Alibaba, and they take scraping seriously. Their defenses layer on top of each other:

Cloudflare protection — search result pages and category pages sit behind Cloudflare's bot detection. Standard requests calls return a challenge page almost immediately.

Browser fingerprinting — AliExpress runs JavaScript checks on canvas rendering, WebGL, fonts, and navigator properties. Headless Chromium without stealth patching gets flagged within a few page loads.

CAPTCHA challenges — Alibaba uses a slider CAPTCHA that's harder than Google's reCAPTCHA. You'll hit it after 20-30 requests from a single IP, sometimes sooner on new IPs.

Rate limiting by IP and session — even if you pass the fingerprint checks, rapid sequential requests from the same IP trigger soft blocks. Pages return 200 but with empty product containers.

Dynamic rendering — prices, seller ratings, and review counts load via XHR after initial page load. A plain HTML scrape gets the skeleton, not the data.

Regional gating — AliExpress serves different prices and sometimes different products based on the request's apparent country. A German IP sees EUR prices; a US IP sees USD.

This is why requests + BeautifulSoup doesn't work here. You need a real browser, and you need clean residential IPs.

Installation

pip install playwright beautifulsoup4 lxml httpx
playwright install chromium

Core Playwright Scraper

import asyncio
import json
import re
import sqlite3
import time
import random
from datetime import datetime
from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeout
from bs4 import BeautifulSoup

STEALTH_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
Object.defineProperty(navigator, 'plugins', {
    get: () => [
        { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
        { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
        { name: 'Native Client', filename: 'internal-nacl-plugin' },
    ]
});
window.chrome = { runtime: {}, loadTimes: function() {}, csi: function() {}, app: {} };
"""

SEARCH_URL = "https://www.aliexpress.com/wholesale?SearchText={query}&page={page}&SortType=default"

async def create_browser(proxy_url: str = None, headless: bool = True):
    """Create a stealth Playwright browser."""
    p = await async_playwright().start()
    browser = await p.chromium.launch(
        headless=headless,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--no-sandbox",
            "--disable-dev-shm-usage",
            "--disable-infobars",
        ],
        proxy={"server": proxy_url} if proxy_url else None,
    )
    context = await browser.new_context(
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/124.0.0.0 Safari/537.36"
        ),
        viewport={"width": 1440, "height": 900},
        locale="en-US",
        timezone_id="America/New_York",
    )
    await context.add_init_script(STEALTH_SCRIPT)
    return p, browser, context

Search Results Scraper

async def scrape_search_results(
    query: str,
    pages: int = 3,
    proxy_url: str = None,
) -> list[dict]:
    """Scrape AliExpress search results for a given query."""
    results = []
    p, browser, context = await create_browser(proxy_url)

    try:
        page = await context.new_page()

        # Warm up with homepage visit
        await page.goto("https://www.aliexpress.com/", wait_until="domcontentloaded", timeout=30000)
        await asyncio.sleep(random.uniform(2, 4))

        for page_num in range(1, pages + 1):
            url = SEARCH_URL.format(query=query.replace(" ", "+"), page=page_num)
            print(f"Fetching page {page_num}: {query}")

            try:
                await page.goto(url, wait_until="domcontentloaded", timeout=30000)
                # Wait for product cards to render
                await page.wait_for_selector(
                    "[data-item-id], .search-item-card-wrapper-gallery",
                    timeout=15000,
                )
                # Let XHR finish loading prices
                await asyncio.sleep(random.uniform(2, 3))

                # Scroll to trigger lazy-loaded images and prices
                await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.5)")
                await asyncio.sleep(1)
                await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
                await asyncio.sleep(1.5)

            except PlaywrightTimeout:
                print(f"Timeout on page {page_num}, skipping")
                continue

            html = await page.content()
            soup = BeautifulSoup(html, "lxml")

            cards = soup.select("[data-item-id]")
            if not cards:
                cards = soup.select(".search-item-card-wrapper-gallery")

            for card in cards:
                try:
                    item = parse_search_card(card)
                    if item:
                        results.append(item)
                except Exception as e:
                    continue

            print(f"  Page {page_num}: {len(cards)} products")
            await asyncio.sleep(random.uniform(3, 6))

    finally:
        await browser.close()
        await p.stop()

    return results


def parse_search_card(card) -> dict | None:
    """Parse a single AliExpress search result card."""
    item_id = card.get("data-item-id")
    if not item_id:
        return None

    # Title
    title_el = card.select_one("h3, [class*='titleText'], .multi--titleText--nXeOvyr")
    title = title_el.get_text(strip=True) if title_el else None

    # Sale price
    price_el = card.select_one(
        ".multi--price-sale--U-S0jtj, [class*='price-sale'], "
        ".price--currentPriceText--V8_y_b, [class*='sale-price']"
    )
    price_raw = price_el.get_text(strip=True) if price_el else None

    # Original price (before discount)
    orig_price_el = card.select_one(
        ".multi--price-original--1zEQqOK, [class*='price-original'], "
        "[class*='original-price']"
    )
    original_price = orig_price_el.get_text(strip=True) if orig_price_el else None

    # Discount badge
    discount_el = card.select_one("[class*='discount'], [class*='sale-tag']")
    discount = discount_el.get_text(strip=True) if discount_el else None

    # Store name
    store_el = card.select_one(
        ".multi--shop-name--wt9Xr, [class*='shop-name'], .cards--storeLink--1J-Bkvy"
    )
    store_name = store_el.get_text(strip=True) if store_el else None

    # Rating
    rating_el = card.select_one("[class*='star-view'], [class*='rating-score']")
    rating = None
    if rating_el:
        aria = rating_el.get("aria-label", "")
        match = re.search(r"(\d+\.?\d*)", aria)
        rating = float(match.group(1)) if match else rating_el.get_text(strip=True) or None

    # Review count
    reviews_el = card.select_one("[class*='review'], [class*='feedback']")
    review_count_raw = reviews_el.get_text(strip=True) if reviews_el else None

    # Orders
    orders_el = card.select_one("[class*='trade'], [class*='order-count'], [class*='sold']")
    orders = orders_el.get_text(strip=True) if orders_el else None

    # Shipping
    shipping_el = card.select_one("[class*='shipping'], [class*='delivery-text']")
    shipping = shipping_el.get_text(strip=True) if shipping_el else None

    # Thumbnail
    img_el = card.select_one("img")
    thumbnail = img_el.get("src") or img_el.get("data-src") if img_el else None

    return {
        "item_id": item_id,
        "title": title,
        "price": price_raw,
        "original_price": original_price,
        "discount": discount,
        "store_name": store_name,
        "rating": rating,
        "review_count": review_count_raw,
        "orders": orders,
        "shipping": shipping,
        "thumbnail": thumbnail,
        "url": f"https://www.aliexpress.com/item/{item_id}.html",
    }

Individual Product Page Scraper

async def scrape_product_page(
    item_url: str,
    proxy_url: str = None,
) -> dict:
    """Scrape detailed data from an AliExpress product page."""
    p, browser, context = await create_browser(proxy_url)
    result = {"url": item_url}

    try:
        page = await context.new_page()
        await page.goto(item_url, wait_until="domcontentloaded", timeout=30000)

        # Wait for price element
        try:
            await page.wait_for_selector(
                "[class*='product-price'], [class*='uniform-banner'], "
                "[class*='price-current']",
                timeout=15000,
            )
        except PlaywrightTimeout:
            result["error"] = "Price element not found"

        # Scroll to load seller info and shipping details
        await page.evaluate("window.scrollTo(0, 600)")
        await asyncio.sleep(1.5)
        await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.4)")
        await asyncio.sleep(2)

        html = await page.content()
        result.update(parse_product_html(html))

    except Exception as e:
        result["error"] = str(e)
    finally:
        await browser.close()
        await p.stop()

    return result


def parse_product_html(html: str) -> dict:
    """Parse product page HTML into structured data."""
    soup = BeautifulSoup(html, "lxml")
    result = {}

    # Title
    title_el = soup.select_one(
        "h1.product-title-text, [class*='product-title'], "
        ".pdp-product-title"
    )
    result["title"] = title_el.get_text(strip=True) if title_el else None

    # Current price
    price_el = soup.select_one(
        "[class*='product-price-value'], .uniform-banner-box-price, "
        "[class*='price-current']"
    )
    result["price"] = price_el.get_text(strip=True) if price_el else None

    # Original price
    orig_el = soup.select_one("[class*='product-price-original'], [class*='price-origin']")
    result["original_price"] = orig_el.get_text(strip=True) if orig_el else None

    # Seller name
    store_el = soup.select_one(
        "a[href*='/store/'] .store-header-name, [class*='shop-name-text']"
    )
    result["store_name"] = store_el.get_text(strip=True) if store_el else None

    # Store URL
    store_link_el = soup.select_one("a[href*='/store/']")
    result["store_url"] = store_link_el.get("href") if store_link_el else None

    # Seller ratings (3 sub-scores: communication, shipping speed, item accuracy)
    rating_groups = soup.select("[class*='seller-score'] .score-item, [class*='store-detail'] .score-row")
    ratings = {}
    for item in rating_groups:
        label_el = item.select_one("[class*='label'], span:first-child")
        score_el = item.select_one("[class*='score'], [class*='value'], span:last-child")
        if label_el and score_el:
            ratings[label_el.get_text(strip=True)] = score_el.get_text(strip=True)
    result["seller_ratings"] = ratings

    # Average product rating
    avg_rating_el = soup.select_one(
        "[class*='overview-rating-average'], [class*='score-average'], "
        "[class*='rating-value']"
    )
    result["avg_rating"] = avg_rating_el.get_text(strip=True) if avg_rating_el else None

    # Review count
    review_count_el = soup.select_one(
        "[class*='review-count'], [class*='feedback-total'], "
        "[class*='product-review-count']"
    )
    result["review_count"] = review_count_el.get_text(strip=True) if review_count_el else None

    # Order count
    orders_el = soup.select_one("[class*='order-count'], [class*='trade-count']")
    result["orders"] = orders_el.get_text(strip=True) if orders_el else None

    # Shipping options
    shipping_options = []
    for row in soup.select("[class*='shipping-item'], [class*='delivery-option']"):
        method_el = row.select_one("[class*='carrier'], [class*='shipping-name']")
        cost_el = row.select_one("[class*='fee'], [class*='shipping-price'], [class*='price']")
        time_el = row.select_one("[class*='estimate'], [class*='shipping-time'], [class*='days']")
        if method_el or cost_el:
            shipping_options.append({
                "method": method_el.get_text(strip=True) if method_el else None,
                "cost": cost_el.get_text(strip=True) if cost_el else None,
                "estimated_days": time_el.get_text(strip=True) if time_el else None,
            })
    result["shipping_options"] = shipping_options

    # Product variants
    variants = []
    for sku in soup.select("[class*='sku-item'], [data-sku-id], [class*='sku-prop-item']"):
        name_el = sku.select_one("span, img[alt]")
        if name_el:
            variant_name = name_el.get("alt") or name_el.get_text(strip=True)
            if variant_name and len(variant_name) < 100:
                variants.append(variant_name)
    result["variants"] = list(set(variants))[:20]

    # Product images
    images = []
    for img in soup.select("[class*='image-view'] img, .product-image-thumb img"):
        src = img.get("src") or img.get("data-src")
        if src and "aliexpress" in src:
            # Get full-size version
            src = re.sub(r'_\d+x\d+\.', '_960x960.', src)
            images.append(src)
    result["images"] = list(set(images))[:10]

    # Description (truncated if needed)
    desc_el = soup.select_one("[class*='product-description'], #product-description")
    if desc_el:
        result["description"] = desc_el.get_text(strip=True)[:2000]

    # Category breadcrumbs
    breadcrumbs = soup.select(".breadcrumb a, [class*='breadcrumb'] a")
    result["categories"] = [b.get_text(strip=True) for b in breadcrumbs if b.get_text(strip=True)]

    return result

Scraping Product Reviews

async def scrape_product_reviews(
    item_id: str,
    proxy_url: str = None,
    max_pages: int = 5,
) -> list[dict]:
    """Scrape reviews for an AliExpress product."""
    reviews = []
    p, browser, context = await create_browser(proxy_url)

    try:
        page = await context.new_page()
        url = f"https://www.aliexpress.com/item/{item_id}.html"
        await page.goto(url, wait_until="domcontentloaded", timeout=30000)
        await asyncio.sleep(2)

        # Scroll to reviews section
        await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.7)")
        await asyncio.sleep(2)

        for page_num in range(1, max_pages + 1):
            html = await page.content()
            soup = BeautifulSoup(html, "lxml")

            review_items = soup.select("[class*='review-item'], .feedback--wrap--lnFPDMK")
            if not review_items and page_num == 1:
                # Try alternate selector pattern
                review_items = soup.select("[class*='feedback-item']")

            for item in review_items:
                reviewer_el = item.select_one(
                    "[class*='buyer-name'], [class*='reviewer-name'], [class*='user-name']"
                )
                stars_el = item.select_one(
                    "[class*='star-view'], [class*='rating-star']"
                )
                text_el = item.select_one(
                    "[class*='review-content'], [class*='feedback-content'], "
                    "[class*='review-text']"
                )
                date_el = item.select_one("[class*='review-date'], [class*='date']")
                country_el = item.select_one(
                    "[class*='buyer-country'], [class*='country']"
                )
                helpful_el = item.select_one("[class*='helpful-count']")

                # Parse star count from style or aria-label
                star_count = 0
                if stars_el:
                    aria = stars_el.get("aria-label", "")
                    style = stars_el.get("style", "")
                    star_match = re.search(r"(\d+)\.?\d*\s*(star|out)", aria)
                    if star_match:
                        star_count = int(star_match.group(1))
                    elif "width" in style:
                        # Width percentage: 20% = 1 star, 100% = 5 stars
                        w_match = re.search(r"width:\s*(\d+)%", style)
                        if w_match:
                            star_count = round(int(w_match.group(1)) / 20)

                # Images attached to review
                review_images = [
                    img.get("src") for img in item.select("[class*='review-img'] img")
                    if img.get("src")
                ]

                reviews.append({
                    "reviewer": reviewer_el.get_text(strip=True) if reviewer_el else None,
                    "rating": star_count or None,
                    "text": text_el.get_text(strip=True) if text_el else None,
                    "date": date_el.get_text(strip=True) if date_el else None,
                    "country": country_el.get_text(strip=True) if country_el else None,
                    "helpful": helpful_el.get_text(strip=True) if helpful_el else None,
                    "images": review_images,
                })

            if not review_items:
                break

            # Click "Next" in reviews pagination
            next_btn = await page.query_selector(
                "[class*='review-pagination'] [class*='next']:not([class*='disabled']), "
                "[class*='pagination-next']:not(.disabled)"
            )
            if not next_btn:
                break

            await next_btn.click()
            await asyncio.sleep(random.uniform(2, 4))

    finally:
        await browser.close()
        await p.stop()

    return reviews

Anti-Detection in Production

AliExpress's bot detection is geo-aware. Datacenter IPs from AWS or DigitalOcean get blocked before the first product loads. Rotating through a pool of residential IPs bypasses most of this.

For production scraping, ThorData provides a rotating residential proxy pool with country/city-level targeting. This is essential for AliExpress since their regional pricing means you need to control the apparent request origin:

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "proxy.thordata.com"
THORDATA_PORT = 9000

def get_proxy(country: str = "US", city: str = None) -> str:
    user = f"{THORDATA_USER}_country-{country}"
    if city:
        user += f"_city-{city}"
    return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"

# Compare prices across regions
proxy_us = get_proxy("US")
proxy_de = get_proxy("DE")
proxy_pl = get_proxy("PL")

Data Parsing and Cleaning

Prices from AliExpress come back as strings like US $4.29 or €3,99. Clean them before storing:

def parse_price(raw: str | None) -> float | None:
    """Parse AliExpress price string to float."""
    if not raw:
        return None
    cleaned = re.sub(r"[^\d.,]", "", raw)
    # Handle European decimal format: 3,99 → 3.99
    if re.match(r"^\d{1,3},\d{2}$", cleaned):
        cleaned = cleaned.replace(",", ".")
    else:
        cleaned = cleaned.replace(",", "")
    # Handle price ranges (e.g., "4.29 - 6.80"): take lower bound
    if " " in cleaned:
        cleaned = cleaned.split()[0]
    try:
        return float(cleaned)
    except ValueError:
        return None

def parse_review_count(raw: str | None) -> int | None:
    """Parse review count like '1,234 reviews' or '1.2k'."""
    if not raw:
        return None
    raw = raw.lower().replace(",", "")
    match = re.search(r"([\d.]+)\s*([km])?", raw)
    if not match:
        return None
    val = float(match.group(1))
    suffix = match.group(2)
    if suffix == "k":
        val *= 1000
    elif suffix == "m":
        val *= 1000000
    return int(val)

def parse_orders(raw: str | None) -> int | None:
    """Parse order count like '1.2k+ sold' or '12345 orders'."""
    if not raw:
        return None
    raw = raw.lower().replace(",", "")
    match = re.search(r"([\d.]+)\s*([km])?", raw)
    if not match:
        return None
    val = float(match.group(1))
    suffix = match.group(2)
    if suffix == "k":
        val *= 1000
    elif suffix == "m":
        val *= 1000000
    return int(val)

SQLite Storage

def init_db(db_path: str = "aliexpress.db") -> sqlite3.Connection:
    """Initialize the AliExpress data database."""
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS products (
            item_id TEXT PRIMARY KEY,
            title TEXT,
            price REAL,
            original_price REAL,
            discount TEXT,
            store_name TEXT,
            store_url TEXT,
            avg_rating REAL,
            review_count INTEGER,
            orders INTEGER,
            shipping TEXT,
            categories TEXT,
            variants TEXT,
            images TEXT,
            url TEXT,
            scraped_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS reviews (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            item_id TEXT NOT NULL,
            reviewer TEXT,
            rating INTEGER,
            review_text TEXT,
            review_date TEXT,
            country TEXT,
            helpful TEXT,
            scraped_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS price_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            item_id TEXT NOT NULL,
            price REAL,
            original_price REAL,
            captured_at TEXT DEFAULT (datetime('now'))
        );

        CREATE INDEX IF NOT EXISTS idx_products_store ON products(store_name);
        CREATE INDEX IF NOT EXISTS idx_price_history_item ON price_history(item_id, captured_at);
    """)
    conn.commit()
    return conn

def save_product(conn: sqlite3.Connection, product: dict):
    """Save a product with price tracking."""
    price = parse_price(product.get("price"))
    orig_price = parse_price(product.get("original_price"))
    reviews = parse_review_count(product.get("review_count"))
    orders = parse_orders(product.get("orders"))

    conn.execute("""
        INSERT OR REPLACE INTO products
        (item_id, title, price, original_price, discount, store_name, store_url,
         avg_rating, review_count, orders, shipping, categories, variants, images, url)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        product.get("item_id"),
        product.get("title"),
        price, orig_price,
        product.get("discount"),
        product.get("store_name"),
        product.get("store_url"),
        product.get("avg_rating"),
        reviews, orders,
        product.get("shipping"),
        json.dumps(product.get("categories", [])),
        json.dumps(product.get("variants", [])),
        json.dumps(product.get("images", [])),
        product.get("url"),
    ))

    # Log to price history
    if price:
        conn.execute(
            "INSERT INTO price_history (item_id, price, original_price) VALUES (?, ?, ?)",
            (product.get("item_id"), price, orig_price),
        )

    conn.commit()

Rate Limiting

AliExpress soft-blocks aggressive scrapers even with residential IPs. Use these delays:

DELAYS = {
    "search_page": (3.0, 7.0),
    "product_page": (5.0, 12.0),
    "review_page": (2.0, 5.0),
    "store_page": (4.0, 9.0),
}

async def polite_delay(action: str = "product_page"):
    """Apply randomized delay appropriate for the action type."""
    min_s, max_s = DELAYS.get(action, (2.0, 5.0))
    await asyncio.sleep(random.uniform(min_s, max_s))

async def full_pipeline(
    query: str,
    pages: int = 3,
    proxy_url: str = None,
    db_path: str = "aliexpress.db",
) -> dict:
    """
    Full AliExpress scraping pipeline:
    1. Search for query
    2. Get details for each product
    3. Store everything in SQLite
    """
    conn = init_db(db_path)
    stats = {"searched": 0, "detailed": 0, "errors": 0}

    # Step 1: Search
    print(f"Searching AliExpress for: {query}")
    search_results = await scrape_search_results(query, pages=pages, proxy_url=proxy_url)
    stats["searched"] = len(search_results)
    print(f"Found {len(search_results)} products")

    # Step 2: Get details for each
    for i, product in enumerate(search_results):
        item_id = product.get("item_id")
        if not item_id:
            continue

        print(f"[{i+1}/{len(search_results)}] Getting details for {item_id}")
        await polite_delay("product_page")

        try:
            detail = await scrape_product_page(product["url"], proxy_url=proxy_url)
            product.update(detail)
            save_product(conn, product)
            stats["detailed"] += 1
        except Exception as e:
            print(f"  Error: {e}")
            # Save search-level data at minimum
            save_product(conn, product)
            stats["errors"] += 1

    conn.close()
    return stats

# Run pipeline
import asyncio
stats = asyncio.run(full_pipeline(
    "wireless earbuds",
    pages=3,
    proxy_url="http://user:[email protected]:9000",
))
print(f"Pipeline complete: {stats}")

Where This Goes

AliExpress's inventory changes fast. Sellers appear and disappear. Prices drop 40% for a week during a sale, then go back up. The interesting data isn't a single snapshot — it's the delta over time. Once you have a working scraper, the value is in running it daily and tracking what changes:

That's where the database starts earning its keep.

Business Use Cases

Dropshipping research — Find products with high order counts and good seller ratings. Track which suppliers have stable pricing vs erratic price swings. Monitor new product launches in your niche.

Price comparison tools — Build a real-time price tracker for specific product categories. Alert users when products drop below a threshold. Compare AliExpress prices against Amazon and eBay to find arbitrage opportunities.

Market trend detection — Monitor which product categories are seeing sudden increases in new listings and order counts. Rising order counts + new seller entries = emerging market trend.

Supplier evaluation — Evaluate potential dropshipping suppliers by their seller rating trajectory over time, response rate, and review sentiment analysis.