← Back to blog

How to Scope a Custom Web Scraper Project (and What It Should Cost)

Every week, someone messages us saying "I need a scraper for [website]." The conversation that follows determines whether the project takes two days or two months. The difference isn't the scraper itself — it's the scoping.

Bad scoping leads to scope creep, blown budgets, and scrapers that work on day one but break on day three. Good scoping means you know exactly what you're getting, what it costs, and when it's done.

Here's the framework we use internally to estimate every custom scraping project.

The Five Questions That Define Every Scraper

Before writing a single line of code, you need clear answers to these:

1. What's the target site? Not just the domain — the specific pages. "Scrape LinkedIn" is not a spec. "Extract job title, company, and location from LinkedIn job postings matching 'data engineer' in the US" is a spec.

2. What data fields do you need? List every field. If you want product names, prices, ratings, review counts, availability, and seller info from Amazon — say so upfront. Adding "oh, and also the shipping cost" after the scraper is built means restructuring the parser.

3. How often does it need to run? Once? Daily? Every hour? A one-time export of 500 records is a fundamentally different project than a daily pipeline refreshing 50,000 listings.

4. What volume are we talking about? 100 pages and 100,000 pages are different engineering problems. At scale, you need proxy rotation, rate limiting, retry logic, and sometimes distributed execution.

5. Does the site require authentication or have anti-bot protection? Sites behind login walls (Glassdoor, Facebook, most B2B platforms) add significant complexity. Sites with Cloudflare, DataDome, or PerimeterX protection require specialized browser automation and sometimes residential proxies.

The Three Complexity Tiers

Based on the answers above, most projects fall into one of three tiers:

Factor Simple (~$100) Standard (~$300) Complex (~$500+)
Site structure Single page type, static HTML Multiple page types, pagination Dynamic loading, infinite scroll
JavaScript None needed Required (React, Vue, etc.) Heavy SPA with API calls
Authentication Public pages None, but rate-limited Login required
Anti-bot None Basic (rate limits, headers) Cloudflare, CAPTCHA, fingerprinting
Data volume Under 1,000 records 1,000 - 50,000 records 50,000+ records
Output Single JSON/CSV file Structured dataset, deduped Multi-format, incremental updates
Proxy needs None Datacenter proxies Residential/mobile proxies
Maintenance Minimal Occasional updates Active monitoring needed

Simple Tier (~$100)

Straightforward extraction from static sites with predictable HTML structure.

Real examples: - Scraping a company directory from Crunchbase public profiles — name, industry, location, employee count - Extracting job listings from a regional job board like jobs.ac.uk — title, employer, salary, deadline - Pulling product names and prices from a small e-commerce catalog with server-rendered HTML

These scrapers typically use HTTP requests with BeautifulSoup or Cheerio. No browser needed. A competent developer builds one in a few hours.

Standard Tier (~$300)

Sites that require JavaScript rendering, handle pagination, or serve data through internal APIs.

Real examples: - Amazon product data — title, price, ratings, review count, seller, and availability across categories with pagination - Indeed job postings — full listing details across multiple search queries with location filters - News article extraction from sites like Reuters or TechCrunch — headline, author, date, full text, tags

These need a headless browser (Playwright or Puppeteer), proper request headers, and often pagination logic that handles edge cases. The scraper also needs error handling for missing fields and retry logic for failed requests.

Complex Tier (~$500+)

Sites with aggressive anti-bot measures, required authentication, or geo-restricted content.

Real examples: - Glassdoor salary data — requires login, aggressive bot detection, data only visible after contributing - Facebook Ads Library — political ad data with dynamic loading, rate limiting, and geographic filtering - Zillow property data — Cloudflare protection, IP-based rate limiting, and data that varies by location

These projects require residential proxy rotation, browser fingerprint management, session handling, and sometimes CAPTCHA solving. The engineering time is higher, but so is the ongoing maintenance — these scrapers need monitoring because the target sites actively try to block them.

What 'Delivered' Actually Means

Vague deliverables create disputes. Here's what we include with every custom scraper:

A working scraper deployed on Apify. Not a script on someone's laptop. A production-ready actor on the Apify platform that you can run on-demand or schedule. You get your own Apify account access to run it whenever you want.

Clean output in JSON and CSV. Every run produces structured data you can import directly into Excel, Google Sheets, a database, or your own pipeline. Fields are consistent, nulls are handled, and duplicates are removed.

Tested against the live site. Before delivery, we run the scraper against the actual target and verify the output. You get a sample dataset to review before signing off.

30-day fix guarantee. If the target site changes its structure within 30 days of delivery and the scraper breaks, we fix it at no additional cost. This covers layout changes, new anti-bot measures, or API modifications — anything that wasn't caused by your own changes to the scraper.

How to Get an Accurate Quote

The more specific your request, the faster and more accurate the quote. Include:

A vague request like "scrape real estate data" gets a vague estimate. A specific request like "extract price, address, bedrooms, bathrooms, and square footage from all active Realtor.com listings in Austin, TX — approximately 3,000 listings, refreshed weekly" gets a firm quote within hours.

Get Started

We've built and maintained dozens of production scrapers across e-commerce, real estate, job boards, social media, and B2B data. Every project starts with a proper scope.

Request a custom scraper at https://frog03-20494.wykr.es/custom-scraper — describe your target site and data needs, and get a quote within 4 hours.