Amazon has the most aggressive anti-bot stack of any major ecommerce site. Cloudflare, in-house fingerprinting, session-level rate limits, geographic CAPTCHAs, and an HTML layout that changes just often enough to break your parser -- all on top of the largest product catalog on the planet. If you have ever tried to run a price tracker, competitive monitor, or affiliate tool against Amazon, you know the pain.
This post is the short, honest version of what actually works in 2026. Real code, real limits, and when to stop fighting and use a managed scraper.
Three things make Amazon uniquely painful compared to, say, eBay or AliExpress:
requests.get() from a datacenter IP gets a 503 Service Unavailable or a "Sorry, we just need to make sure you're not a robot" page within 1-2 hits.None on Friday. You need multiple fallback selectors for every field.A scraper that survives more than 50 requests against Amazon needs all four of these -- skip any one and you will get blocked fast:
sec-ch-ua and Accept-Language headers to the UA you picked.Every Amazon product has a 10-character ASIN (Amazon Standard Identification Number). It is the stable ID you should key off in your database -- URLs change, ASINs do not.
The canonical product URL is https://www.amazon.com/dp/{ASIN}. You will see longer URLs with slugs and referral params in the wild, but /dp/{ASIN} always resolves to the same page and is what you should scrape. Amazon regional domains (.co.uk, .de, .co.jp) use the same /dp/ pattern with the same ASIN.
Rather than maintaining a scraper with User-Agent rotation, proxy pools, and fragile fallback selectors, call a managed actor that handles all of it:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/amazon-scraper').call(
run_input={'categoryUrls': ['https://www.amazon.com/s?k=headphones'], 'maxItems': 50}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
The parser above uses fallback selector chains for every field because Amazon runs many layout variants at the same time. A few notes from production experience:
aria-label-style string like "4.6 out of 5 stars". Parse the leading float, do not try to extract from the star image.data-parent-asin in the buybox if you need to dedupe.data-component-type="s-sponsored-label-text" marker to filter them out.A working scraper is 20% parsing and 80% error handling. Treat every request as likely to fail and design the retry loop carefully:
Maintaining a working Amazon scraper is a part-time job. Proxy costs alone run $50-500/month for any serious volume, plus engineering time on selector updates, retry logic, and CAPTCHA handling.
For most use cases -- price tracking, affiliate catalogs, competitive monitoring, or building a dataset -- a managed scraper API is cheaper than doing it yourself. You pass an ASIN or URL and get back structured JSON. The operator eats the proxy bill and the cat-and-mouse game against Amazon's anti-bot team.
Our Amazon Scraper actor on Apify handles exactly this. Pass a list of ASINs or product URLs and get titles, prices, ratings, review counts, images, availability, and variations. Pricing is pay-per-use, so a price tracker that checks 500 products once a day costs cents, not a monthly minimum.
| Approach | Cost | Volume | Reliability |
|---|---|---|---|
Plain requests from your laptop | Free | <50 before block | Near zero |
| Python + residential proxies | $50-500/mo | Medium | Requires maintenance |
| Playwright + stealth + proxies | $100-1000/mo | High | Slow, memory heavy |
| Managed actor (Apify) | Pay-per-use | High | High |
Whichever path you take, build defensively: pin ASINs in your database, log raw HTML when parsers fail, and treat Amazon as a moving target. The selectors in this post will rot -- the strategy around them is what lasts.
Try Apify free — the platform powering these scrapers. Get started →