How to Scrape AliExpress Product Data in 2026: Prices, Reviews & Seller Ratings
How to Scrape AliExpress Product Data in 2026: Prices, Reviews & Seller Ratings
AliExpress has hundreds of millions of product listings across every category imaginable. Prices fluctuate constantly — the same item from ten different sellers can vary by 300%. Seller ratings and review counts tell you who's actually moving product versus who's a ghost storefront. And regional pricing means what a buyer sees in Poland differs from what they see in the US.
For dropshippers, price comparison tools, and market researchers, this data is genuinely useful. The problem is getting it.
What's Worth Scraping
AliExpress product pages expose a solid set of data points:
- Product title and description — often keyword-stuffed, but useful for categorization
- Price — both the listed price and any promotional/sale price, which differ regularly
- Seller name and store URL — each seller has their own storefront
- Seller rating — a score out of 5 covering communication, shipping speed, and item accuracy
- Review count — total reviews plus star distribution
- Order count — how many times the item has sold, a rough proxy for popularity
- Shipping info — carrier, cost, estimated delivery window, country of origin
- Variants — color/size options with per-variant pricing
- Category path — breadcrumb hierarchy for classification
That's a lot of data that no official API exposes cleanly. AliExpress does have a partner API, but it's restricted, rate-limited, and doesn't give you the seller analytics you actually want.
AliExpress Anti-Bot Measures
AliExpress is owned by Alibaba, and they take scraping seriously. Their defenses layer on top of each other:
Cloudflare protection — search result pages and category pages sit behind Cloudflare's bot detection. Standard requests calls return a challenge page almost immediately.
Browser fingerprinting — AliExpress runs JavaScript checks on canvas rendering, WebGL, fonts, and navigator properties. Headless Chromium without stealth patching gets flagged within a few page loads.
CAPTCHA challenges — Alibaba uses a slider CAPTCHA that's harder than Google's reCAPTCHA. You'll hit it after 20-30 requests from a single IP, sometimes sooner on new IPs.
Rate limiting by IP and session — even if you pass the fingerprint checks, rapid sequential requests from the same IP trigger soft blocks. Pages return 200 but with empty product containers.
Dynamic rendering — prices, seller ratings, and review counts load via XHR after initial page load. A plain HTML scrape gets the skeleton, not the data.
Regional gating — AliExpress serves different prices and sometimes different products based on the request's apparent country. A German IP sees EUR prices; a US IP sees USD.
This is why requests + BeautifulSoup doesn't work here. You need a real browser, and you need clean residential IPs.
Installation
pip install playwright beautifulsoup4 lxml httpx
playwright install chromium
Core Playwright Scraper
import asyncio
import json
import re
import sqlite3
import time
import random
from datetime import datetime
from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeout
from bs4 import BeautifulSoup
STEALTH_SCRIPT = """
Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
Object.defineProperty(navigator, 'plugins', {
get: () => [
{ name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
{ name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
{ name: 'Native Client', filename: 'internal-nacl-plugin' },
]
});
window.chrome = { runtime: {}, loadTimes: function() {}, csi: function() {}, app: {} };
"""
SEARCH_URL = "https://www.aliexpress.com/wholesale?SearchText={query}&page={page}&SortType=default"
async def create_browser(proxy_url: str = None, headless: bool = True):
"""Create a stealth Playwright browser."""
p = await async_playwright().start()
browser = await p.chromium.launch(
headless=headless,
args=[
"--disable-blink-features=AutomationControlled",
"--no-sandbox",
"--disable-dev-shm-usage",
"--disable-infobars",
],
proxy={"server": proxy_url} if proxy_url else None,
)
context = await browser.new_context(
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
viewport={"width": 1440, "height": 900},
locale="en-US",
timezone_id="America/New_York",
)
await context.add_init_script(STEALTH_SCRIPT)
return p, browser, context
Search Results Scraper
async def scrape_search_results(
query: str,
pages: int = 3,
proxy_url: str = None,
) -> list[dict]:
"""Scrape AliExpress search results for a given query."""
results = []
p, browser, context = await create_browser(proxy_url)
try:
page = await context.new_page()
# Warm up with homepage visit
await page.goto("https://www.aliexpress.com/", wait_until="domcontentloaded", timeout=30000)
await asyncio.sleep(random.uniform(2, 4))
for page_num in range(1, pages + 1):
url = SEARCH_URL.format(query=query.replace(" ", "+"), page=page_num)
print(f"Fetching page {page_num}: {query}")
try:
await page.goto(url, wait_until="domcontentloaded", timeout=30000)
# Wait for product cards to render
await page.wait_for_selector(
"[data-item-id], .search-item-card-wrapper-gallery",
timeout=15000,
)
# Let XHR finish loading prices
await asyncio.sleep(random.uniform(2, 3))
# Scroll to trigger lazy-loaded images and prices
await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.5)")
await asyncio.sleep(1)
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await asyncio.sleep(1.5)
except PlaywrightTimeout:
print(f"Timeout on page {page_num}, skipping")
continue
html = await page.content()
soup = BeautifulSoup(html, "lxml")
cards = soup.select("[data-item-id]")
if not cards:
cards = soup.select(".search-item-card-wrapper-gallery")
for card in cards:
try:
item = parse_search_card(card)
if item:
results.append(item)
except Exception as e:
continue
print(f" Page {page_num}: {len(cards)} products")
await asyncio.sleep(random.uniform(3, 6))
finally:
await browser.close()
await p.stop()
return results
def parse_search_card(card) -> dict | None:
"""Parse a single AliExpress search result card."""
item_id = card.get("data-item-id")
if not item_id:
return None
# Title
title_el = card.select_one("h3, [class*='titleText'], .multi--titleText--nXeOvyr")
title = title_el.get_text(strip=True) if title_el else None
# Sale price
price_el = card.select_one(
".multi--price-sale--U-S0jtj, [class*='price-sale'], "
".price--currentPriceText--V8_y_b, [class*='sale-price']"
)
price_raw = price_el.get_text(strip=True) if price_el else None
# Original price (before discount)
orig_price_el = card.select_one(
".multi--price-original--1zEQqOK, [class*='price-original'], "
"[class*='original-price']"
)
original_price = orig_price_el.get_text(strip=True) if orig_price_el else None
# Discount badge
discount_el = card.select_one("[class*='discount'], [class*='sale-tag']")
discount = discount_el.get_text(strip=True) if discount_el else None
# Store name
store_el = card.select_one(
".multi--shop-name--wt9Xr, [class*='shop-name'], .cards--storeLink--1J-Bkvy"
)
store_name = store_el.get_text(strip=True) if store_el else None
# Rating
rating_el = card.select_one("[class*='star-view'], [class*='rating-score']")
rating = None
if rating_el:
aria = rating_el.get("aria-label", "")
match = re.search(r"(\d+\.?\d*)", aria)
rating = float(match.group(1)) if match else rating_el.get_text(strip=True) or None
# Review count
reviews_el = card.select_one("[class*='review'], [class*='feedback']")
review_count_raw = reviews_el.get_text(strip=True) if reviews_el else None
# Orders
orders_el = card.select_one("[class*='trade'], [class*='order-count'], [class*='sold']")
orders = orders_el.get_text(strip=True) if orders_el else None
# Shipping
shipping_el = card.select_one("[class*='shipping'], [class*='delivery-text']")
shipping = shipping_el.get_text(strip=True) if shipping_el else None
# Thumbnail
img_el = card.select_one("img")
thumbnail = img_el.get("src") or img_el.get("data-src") if img_el else None
return {
"item_id": item_id,
"title": title,
"price": price_raw,
"original_price": original_price,
"discount": discount,
"store_name": store_name,
"rating": rating,
"review_count": review_count_raw,
"orders": orders,
"shipping": shipping,
"thumbnail": thumbnail,
"url": f"https://www.aliexpress.com/item/{item_id}.html",
}
Individual Product Page Scraper
async def scrape_product_page(
item_url: str,
proxy_url: str = None,
) -> dict:
"""Scrape detailed data from an AliExpress product page."""
p, browser, context = await create_browser(proxy_url)
result = {"url": item_url}
try:
page = await context.new_page()
await page.goto(item_url, wait_until="domcontentloaded", timeout=30000)
# Wait for price element
try:
await page.wait_for_selector(
"[class*='product-price'], [class*='uniform-banner'], "
"[class*='price-current']",
timeout=15000,
)
except PlaywrightTimeout:
result["error"] = "Price element not found"
# Scroll to load seller info and shipping details
await page.evaluate("window.scrollTo(0, 600)")
await asyncio.sleep(1.5)
await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.4)")
await asyncio.sleep(2)
html = await page.content()
result.update(parse_product_html(html))
except Exception as e:
result["error"] = str(e)
finally:
await browser.close()
await p.stop()
return result
def parse_product_html(html: str) -> dict:
"""Parse product page HTML into structured data."""
soup = BeautifulSoup(html, "lxml")
result = {}
# Title
title_el = soup.select_one(
"h1.product-title-text, [class*='product-title'], "
".pdp-product-title"
)
result["title"] = title_el.get_text(strip=True) if title_el else None
# Current price
price_el = soup.select_one(
"[class*='product-price-value'], .uniform-banner-box-price, "
"[class*='price-current']"
)
result["price"] = price_el.get_text(strip=True) if price_el else None
# Original price
orig_el = soup.select_one("[class*='product-price-original'], [class*='price-origin']")
result["original_price"] = orig_el.get_text(strip=True) if orig_el else None
# Seller name
store_el = soup.select_one(
"a[href*='/store/'] .store-header-name, [class*='shop-name-text']"
)
result["store_name"] = store_el.get_text(strip=True) if store_el else None
# Store URL
store_link_el = soup.select_one("a[href*='/store/']")
result["store_url"] = store_link_el.get("href") if store_link_el else None
# Seller ratings (3 sub-scores: communication, shipping speed, item accuracy)
rating_groups = soup.select("[class*='seller-score'] .score-item, [class*='store-detail'] .score-row")
ratings = {}
for item in rating_groups:
label_el = item.select_one("[class*='label'], span:first-child")
score_el = item.select_one("[class*='score'], [class*='value'], span:last-child")
if label_el and score_el:
ratings[label_el.get_text(strip=True)] = score_el.get_text(strip=True)
result["seller_ratings"] = ratings
# Average product rating
avg_rating_el = soup.select_one(
"[class*='overview-rating-average'], [class*='score-average'], "
"[class*='rating-value']"
)
result["avg_rating"] = avg_rating_el.get_text(strip=True) if avg_rating_el else None
# Review count
review_count_el = soup.select_one(
"[class*='review-count'], [class*='feedback-total'], "
"[class*='product-review-count']"
)
result["review_count"] = review_count_el.get_text(strip=True) if review_count_el else None
# Order count
orders_el = soup.select_one("[class*='order-count'], [class*='trade-count']")
result["orders"] = orders_el.get_text(strip=True) if orders_el else None
# Shipping options
shipping_options = []
for row in soup.select("[class*='shipping-item'], [class*='delivery-option']"):
method_el = row.select_one("[class*='carrier'], [class*='shipping-name']")
cost_el = row.select_one("[class*='fee'], [class*='shipping-price'], [class*='price']")
time_el = row.select_one("[class*='estimate'], [class*='shipping-time'], [class*='days']")
if method_el or cost_el:
shipping_options.append({
"method": method_el.get_text(strip=True) if method_el else None,
"cost": cost_el.get_text(strip=True) if cost_el else None,
"estimated_days": time_el.get_text(strip=True) if time_el else None,
})
result["shipping_options"] = shipping_options
# Product variants
variants = []
for sku in soup.select("[class*='sku-item'], [data-sku-id], [class*='sku-prop-item']"):
name_el = sku.select_one("span, img[alt]")
if name_el:
variant_name = name_el.get("alt") or name_el.get_text(strip=True)
if variant_name and len(variant_name) < 100:
variants.append(variant_name)
result["variants"] = list(set(variants))[:20]
# Product images
images = []
for img in soup.select("[class*='image-view'] img, .product-image-thumb img"):
src = img.get("src") or img.get("data-src")
if src and "aliexpress" in src:
# Get full-size version
src = re.sub(r'_\d+x\d+\.', '_960x960.', src)
images.append(src)
result["images"] = list(set(images))[:10]
# Description (truncated if needed)
desc_el = soup.select_one("[class*='product-description'], #product-description")
if desc_el:
result["description"] = desc_el.get_text(strip=True)[:2000]
# Category breadcrumbs
breadcrumbs = soup.select(".breadcrumb a, [class*='breadcrumb'] a")
result["categories"] = [b.get_text(strip=True) for b in breadcrumbs if b.get_text(strip=True)]
return result
Scraping Product Reviews
async def scrape_product_reviews(
item_id: str,
proxy_url: str = None,
max_pages: int = 5,
) -> list[dict]:
"""Scrape reviews for an AliExpress product."""
reviews = []
p, browser, context = await create_browser(proxy_url)
try:
page = await context.new_page()
url = f"https://www.aliexpress.com/item/{item_id}.html"
await page.goto(url, wait_until="domcontentloaded", timeout=30000)
await asyncio.sleep(2)
# Scroll to reviews section
await page.evaluate("window.scrollTo(0, document.body.scrollHeight * 0.7)")
await asyncio.sleep(2)
for page_num in range(1, max_pages + 1):
html = await page.content()
soup = BeautifulSoup(html, "lxml")
review_items = soup.select("[class*='review-item'], .feedback--wrap--lnFPDMK")
if not review_items and page_num == 1:
# Try alternate selector pattern
review_items = soup.select("[class*='feedback-item']")
for item in review_items:
reviewer_el = item.select_one(
"[class*='buyer-name'], [class*='reviewer-name'], [class*='user-name']"
)
stars_el = item.select_one(
"[class*='star-view'], [class*='rating-star']"
)
text_el = item.select_one(
"[class*='review-content'], [class*='feedback-content'], "
"[class*='review-text']"
)
date_el = item.select_one("[class*='review-date'], [class*='date']")
country_el = item.select_one(
"[class*='buyer-country'], [class*='country']"
)
helpful_el = item.select_one("[class*='helpful-count']")
# Parse star count from style or aria-label
star_count = 0
if stars_el:
aria = stars_el.get("aria-label", "")
style = stars_el.get("style", "")
star_match = re.search(r"(\d+)\.?\d*\s*(star|out)", aria)
if star_match:
star_count = int(star_match.group(1))
elif "width" in style:
# Width percentage: 20% = 1 star, 100% = 5 stars
w_match = re.search(r"width:\s*(\d+)%", style)
if w_match:
star_count = round(int(w_match.group(1)) / 20)
# Images attached to review
review_images = [
img.get("src") for img in item.select("[class*='review-img'] img")
if img.get("src")
]
reviews.append({
"reviewer": reviewer_el.get_text(strip=True) if reviewer_el else None,
"rating": star_count or None,
"text": text_el.get_text(strip=True) if text_el else None,
"date": date_el.get_text(strip=True) if date_el else None,
"country": country_el.get_text(strip=True) if country_el else None,
"helpful": helpful_el.get_text(strip=True) if helpful_el else None,
"images": review_images,
})
if not review_items:
break
# Click "Next" in reviews pagination
next_btn = await page.query_selector(
"[class*='review-pagination'] [class*='next']:not([class*='disabled']), "
"[class*='pagination-next']:not(.disabled)"
)
if not next_btn:
break
await next_btn.click()
await asyncio.sleep(random.uniform(2, 4))
finally:
await browser.close()
await p.stop()
return reviews
Anti-Detection in Production
AliExpress's bot detection is geo-aware. Datacenter IPs from AWS or DigitalOcean get blocked before the first product loads. Rotating through a pool of residential IPs bypasses most of this.
For production scraping, ThorData provides a rotating residential proxy pool with country/city-level targeting. This is essential for AliExpress since their regional pricing means you need to control the apparent request origin:
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "proxy.thordata.com"
THORDATA_PORT = 9000
def get_proxy(country: str = "US", city: str = None) -> str:
user = f"{THORDATA_USER}_country-{country}"
if city:
user += f"_city-{city}"
return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"
# Compare prices across regions
proxy_us = get_proxy("US")
proxy_de = get_proxy("DE")
proxy_pl = get_proxy("PL")
Data Parsing and Cleaning
Prices from AliExpress come back as strings like US $4.29 or €3,99. Clean them before storing:
def parse_price(raw: str | None) -> float | None:
"""Parse AliExpress price string to float."""
if not raw:
return None
cleaned = re.sub(r"[^\d.,]", "", raw)
# Handle European decimal format: 3,99 → 3.99
if re.match(r"^\d{1,3},\d{2}$", cleaned):
cleaned = cleaned.replace(",", ".")
else:
cleaned = cleaned.replace(",", "")
# Handle price ranges (e.g., "4.29 - 6.80"): take lower bound
if " " in cleaned:
cleaned = cleaned.split()[0]
try:
return float(cleaned)
except ValueError:
return None
def parse_review_count(raw: str | None) -> int | None:
"""Parse review count like '1,234 reviews' or '1.2k'."""
if not raw:
return None
raw = raw.lower().replace(",", "")
match = re.search(r"([\d.]+)\s*([km])?", raw)
if not match:
return None
val = float(match.group(1))
suffix = match.group(2)
if suffix == "k":
val *= 1000
elif suffix == "m":
val *= 1000000
return int(val)
def parse_orders(raw: str | None) -> int | None:
"""Parse order count like '1.2k+ sold' or '12345 orders'."""
if not raw:
return None
raw = raw.lower().replace(",", "")
match = re.search(r"([\d.]+)\s*([km])?", raw)
if not match:
return None
val = float(match.group(1))
suffix = match.group(2)
if suffix == "k":
val *= 1000
elif suffix == "m":
val *= 1000000
return int(val)
SQLite Storage
def init_db(db_path: str = "aliexpress.db") -> sqlite3.Connection:
"""Initialize the AliExpress data database."""
conn = sqlite3.connect(db_path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS products (
item_id TEXT PRIMARY KEY,
title TEXT,
price REAL,
original_price REAL,
discount TEXT,
store_name TEXT,
store_url TEXT,
avg_rating REAL,
review_count INTEGER,
orders INTEGER,
shipping TEXT,
categories TEXT,
variants TEXT,
images TEXT,
url TEXT,
scraped_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS reviews (
id INTEGER PRIMARY KEY AUTOINCREMENT,
item_id TEXT NOT NULL,
reviewer TEXT,
rating INTEGER,
review_text TEXT,
review_date TEXT,
country TEXT,
helpful TEXT,
scraped_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS price_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
item_id TEXT NOT NULL,
price REAL,
original_price REAL,
captured_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_products_store ON products(store_name);
CREATE INDEX IF NOT EXISTS idx_price_history_item ON price_history(item_id, captured_at);
""")
conn.commit()
return conn
def save_product(conn: sqlite3.Connection, product: dict):
"""Save a product with price tracking."""
price = parse_price(product.get("price"))
orig_price = parse_price(product.get("original_price"))
reviews = parse_review_count(product.get("review_count"))
orders = parse_orders(product.get("orders"))
conn.execute("""
INSERT OR REPLACE INTO products
(item_id, title, price, original_price, discount, store_name, store_url,
avg_rating, review_count, orders, shipping, categories, variants, images, url)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
product.get("item_id"),
product.get("title"),
price, orig_price,
product.get("discount"),
product.get("store_name"),
product.get("store_url"),
product.get("avg_rating"),
reviews, orders,
product.get("shipping"),
json.dumps(product.get("categories", [])),
json.dumps(product.get("variants", [])),
json.dumps(product.get("images", [])),
product.get("url"),
))
# Log to price history
if price:
conn.execute(
"INSERT INTO price_history (item_id, price, original_price) VALUES (?, ?, ?)",
(product.get("item_id"), price, orig_price),
)
conn.commit()
Rate Limiting
AliExpress soft-blocks aggressive scrapers even with residential IPs. Use these delays:
DELAYS = {
"search_page": (3.0, 7.0),
"product_page": (5.0, 12.0),
"review_page": (2.0, 5.0),
"store_page": (4.0, 9.0),
}
async def polite_delay(action: str = "product_page"):
"""Apply randomized delay appropriate for the action type."""
min_s, max_s = DELAYS.get(action, (2.0, 5.0))
await asyncio.sleep(random.uniform(min_s, max_s))
async def full_pipeline(
query: str,
pages: int = 3,
proxy_url: str = None,
db_path: str = "aliexpress.db",
) -> dict:
"""
Full AliExpress scraping pipeline:
1. Search for query
2. Get details for each product
3. Store everything in SQLite
"""
conn = init_db(db_path)
stats = {"searched": 0, "detailed": 0, "errors": 0}
# Step 1: Search
print(f"Searching AliExpress for: {query}")
search_results = await scrape_search_results(query, pages=pages, proxy_url=proxy_url)
stats["searched"] = len(search_results)
print(f"Found {len(search_results)} products")
# Step 2: Get details for each
for i, product in enumerate(search_results):
item_id = product.get("item_id")
if not item_id:
continue
print(f"[{i+1}/{len(search_results)}] Getting details for {item_id}")
await polite_delay("product_page")
try:
detail = await scrape_product_page(product["url"], proxy_url=proxy_url)
product.update(detail)
save_product(conn, product)
stats["detailed"] += 1
except Exception as e:
print(f" Error: {e}")
# Save search-level data at minimum
save_product(conn, product)
stats["errors"] += 1
conn.close()
return stats
# Run pipeline
import asyncio
stats = asyncio.run(full_pipeline(
"wireless earbuds",
pages=3,
proxy_url="http://user:[email protected]:9000",
))
print(f"Pipeline complete: {stats}")
Where This Goes
AliExpress's inventory changes fast. Sellers appear and disappear. Prices drop 40% for a week during a sale, then go back up. The interesting data isn't a single snapshot — it's the delta over time. Once you have a working scraper, the value is in running it daily and tracking what changes:
- Which sellers are gaining reviews (signals product quality improving)
- Which products are dropping in price (competitive pressure or slow sales)
- What's selling across a category before it shows up on mainstream trend sites
- Price differences across regions (arbitrage opportunities)
That's where the database starts earning its keep.
Business Use Cases
Dropshipping research — Find products with high order counts and good seller ratings. Track which suppliers have stable pricing vs erratic price swings. Monitor new product launches in your niche.
Price comparison tools — Build a real-time price tracker for specific product categories. Alert users when products drop below a threshold. Compare AliExpress prices against Amazon and eBay to find arbitrage opportunities.
Market trend detection — Monitor which product categories are seeing sudden increases in new listings and order counts. Rising order counts + new seller entries = emerging market trend.
Supplier evaluation — Evaluate potential dropshipping suppliers by their seller rating trajectory over time, response rate, and review sentiment analysis.