← All posts

How to Scrape Amazon Product Data in Python (2026)

April 17, 2026 · 11 min read
Contents Why Amazon scraping is hard What you need in 2026 Understanding ASINs and URLs A working Python scraper Parsing the product page Retry logic and backoff Simpler alternative: a managed API Approach comparison

Amazon has the most aggressive anti-bot stack of any major ecommerce site. Cloudflare, in-house fingerprinting, session-level rate limits, geographic CAPTCHAs, and an HTML layout that changes just often enough to break your parser -- all on top of the largest product catalog on the planet. If you have ever tried to run a price tracker, competitive monitor, or affiliate tool against Amazon, you know the pain.

This post is the short, honest version of what actually works in 2026. Real code, real limits, and when to stop fighting and use a managed scraper.

Why Amazon scraping is hard

Three things make Amazon uniquely painful compared to, say, eBay or AliExpress:

Legal note: Amazon's Conditions of Use prohibit scraping without written permission. Courts have generally held that scraping public data is legal (see hiQ v. LinkedIn), but Amazon has sued scrapers and won settlements. Personal-use price trackers are widely tolerated; redistributing Amazon catalog data commercially is a lawsuit waiting to happen.

What you need in 2026

A scraper that survives more than 50 requests against Amazon needs all four of these -- skip any one and you will get blocked fast:

  1. Residential or mobile proxies. Datacenter IPs (AWS, DigitalOcean, Hetzner) are flagged within a handful of requests. Residential IPs from pools like Bright Data, Oxylabs, or Smartproxy survive much longer.
  2. User-Agent rotation. Pick from a pool of real Chrome/Firefox/Safari strings. Match the sec-ch-ua and Accept-Language headers to the UA you picked.
  3. Retry with backoff. 503s and CAPTCHA pages are normal -- assume 10-30% of requests will fail on first try. Retry on a different proxy, not the same one.
  4. Polite velocity. 1-2 requests per second per IP is a reasonable ceiling. Burst traffic gets you banned even on good proxies.

Understanding ASINs and URLs

Every Amazon product has a 10-character ASIN (Amazon Standard Identification Number). It is the stable ID you should key off in your database -- URLs change, ASINs do not.

The canonical product URL is https://www.amazon.com/dp/{ASIN}. You will see longer URLs with slugs and referral params in the wild, but /dp/{ASIN} always resolves to the same page and is what you should scrape. Amazon regional domains (.co.uk, .de, .co.jp) use the same /dp/ pattern with the same ASIN.

A working Python scraper

Rather than maintaining a scraper with User-Agent rotation, proxy pools, and fragile fallback selectors, call a managed actor that handles all of it:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/amazon-scraper').call(
    run_input={'categoryUrls': ['https://www.amazon.com/s?k=headphones'], 'maxItems': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Parsing the product page

The parser above uses fallback selector chains for every field because Amazon runs many layout variants at the same time. A few notes from production experience:

Retry logic and backoff

A working scraper is 20% parsing and 80% error handling. Treat every request as likely to fail and design the retry loop carefully:

Simpler alternative: a managed API

Maintaining a working Amazon scraper is a part-time job. Proxy costs alone run $50-500/month for any serious volume, plus engineering time on selector updates, retry logic, and CAPTCHA handling.

For most use cases -- price tracking, affiliate catalogs, competitive monitoring, or building a dataset -- a managed scraper API is cheaper than doing it yourself. You pass an ASIN or URL and get back structured JSON. The operator eats the proxy bill and the cat-and-mouse game against Amazon's anti-bot team.

Our Amazon Scraper actor on Apify handles exactly this. Pass a list of ASINs or product URLs and get titles, prices, ratings, review counts, images, availability, and variations. Pricing is pay-per-use, so a price tracker that checks 500 products once a day costs cents, not a monthly minimum.

When to roll your own vs use a managed actor: If you are scraping less than ~100 products/month for a personal project, the DIY approach above will work. If you are scraping thousands of products or running commercially, the cost of proxies plus maintenance almost always exceeds the cost of a managed actor.

Approach comparison

ApproachCostVolumeReliability
Plain requests from your laptopFree<50 before blockNear zero
Python + residential proxies$50-500/moMediumRequires maintenance
Playwright + stealth + proxies$100-1000/moHighSlow, memory heavy
Managed actor (Apify)Pay-per-useHighHigh

Whichever path you take, build defensively: pin ASINs in your database, log raw HTML when parsers fail, and treat Amazon as a moving target. The selectors in this post will rot -- the strategy around them is what lasts.


Try Apify free — the platform powering these scrapers. Get started →