The Data Collector Blog

Technical articles on web scraping, data collection, and the anti-bot arms race.

Scraping Foursquare Venue Data with Python (2026)

Access venue data, categories, ratings, tips, and popularity from Foursquare Places API v3. Working Python code for location data collection, bulk city scanning, and web scraping supplement techniques.

How to Scrape Behance Portfolio Data with Python (2026)

Extract creative portfolio data, project statistics, designer profiles, and trending work from Behance using Python — covering both the Adobe API and web scraping approaches with full code examples.

Scraping Twitter Spaces: Metadata, Participants and Topics (2026)

Extract Twitter/X Spaces data including space metadata, host and speaker info, participant counts, and trending topics using the Twitter API v2 and Python. Covers pagination, SQLite storage, proxy integration, and monitoring strategies.

How to Scrape Flickr Photo Data with Python (2026)

Extract photo metadata, EXIF data, group pools, and user galleries from Flickr using the API and Python — with working code for search, geo-tagged photos, and bulk downloads.

Scraping Kickstarter Project Data (2026)

How to scrape Kickstarter campaigns — discover API, project details, backer counts, funding progress, creator data, and reward tiers using Python in 2026. Covers pagination, anti-detection, ThorData proxy integration, and SQLite storage.

How to Scrape Kickstarter Campaign Data with Python (2026 Guide)

Extract Kickstarter crowdfunding data — campaign funding, backer counts, reward tiers, and category trends — using Python with the discovery API, GraphQL interception, and Playwright. Full working code included.

How to Scrape Bing Search Results with Python in 2026

Complete guide to scraping Bing SERP data using Python, Playwright, and API alternatives. Covers pagination, rate limiting, anti-detection, ThorData proxy integration, and structured data extraction.

How to Scrape FBref for Football Stats with Python (2026)

Complete guide to scraping FBref football stats with Python in 2026. League tables, player xG, passing networks, shot maps, match data — with working code, anti-bot handling, proxy rotation, and output schemas.

How to Scrape Coursera Course Data with Python (2026)

Extract Coursera course listings, enrollment stats, ratings, instructor data, and syllabus information using the Coursera API and web scraping with Python.

Scraping Best Buy Product Data with Python (2026)

Extract electronics product data, prices, specifications, and customer reviews from Best Buy using their API and web scraping. Complete Python code for price monitoring, product research, and anti-bot bypass.

How to Scrape Bandcamp Music Data with Python (2026)

Scrape Bandcamp artist pages, album listings, track data, fan collections, and estimated sales using Python and BeautifulSoup. Working code with anti-bot handling, ThorData proxy integration, and full data pipeline.

How to Scrape Tumblr Data in 2026 (API + Web Scraping)

A practical guide to collecting Tumblr posts, tags, media, and reblog chains using the Tumblr API v2 and web scraping fallbacks for content the API does not expose.

How to Scrape Glassdoor Reviews with Python (2026)

Extract Glassdoor company reviews, salary data, and employee sentiment using Python — reverse-engineering the GraphQL API and bypassing anti-scraping protections.

Scraping Open Food Facts for Nutrition Data with Python (2026 Complete Guide)

Pull product nutrition data, ingredients, allergens, Nutri-Score, and barcodes from Open Food Facts using their free API. Complete Python guide with bulk collection, cross-referencing retail sites with ThorData proxy rotation, error handling, and output schemas.

How to Scrape ESPN Sports Stats in 2026: Scores, Player Stats & Standings

Learn how to scrape ESPN scores, player stats, and standings using Python. Covers ESPN's hidden API, sports-reference.com scraping, anti-bot evasion, SQLite storage, proxy integration, and building analytics dashboards.

Scraping Indeed Company Reviews with Python (2026)

Collect company culture reviews, CEO approval ratings, and work-life balance scores from Indeed using Python. Anti-bot bypass techniques and structured data extraction.

How to Scrape Dribbble Design Data with Python (2026)

Extract design shots, designer profiles, project collections, and popular design trends from Dribbble using Python. Covers API access, web scraping techniques, Cloudflare bypass, SQLite storage, proxy integration, and building design trend datasets.

How to Scrape Public Court Records: PACER, CourtListener & State Courts (2026)

Extract public court records from PACER, CourtListener API, and state court systems with Python. Complete working code, proxy rotation, anti-detection, error handling, output schemas, and 7 real-world use cases for legal data analysis in 2026.

Scraping Google Scholar Citations and Author Profiles with Python (2026)

Complete guide to scraping Google Scholar for citation counts, h-index, author profiles, and publication data using Python in 2026. Includes working code for scholarly, httpx, Playwright, proxy rotation via ThorData, anti-bot evasion, and full error handling.

How to Scrape Substack Newsletters with Python (2026 Guide)

Extract Substack newsletter posts, subscriber estimates, author profiles, and archives using Python. Covers the undocumented Substack API, RSS feeds, anti-bot handling, and proxy rotation.

Scraping Dev.to Tag Analytics and Trending Patterns with Python (2026)

Analyze dev.to tag popularity, trending article patterns, and author follower growth using the dev.to API. Working Python code for building developer content analytics — with pagination, error handling, data storage, and proxy configuration.

How to Scrape eBay Auction Data in 2026: Sold Listings, Prices & Seller Stats

Learn how to scrape eBay sold listings, auction prices, seller data, and trending items using Python. Covers eBay Finding API, Browse API, web scraping with BeautifulSoup, error handling, pagination, and residential proxy integration with ThorData.

How to Scrape Google Play Store with Python (2026)

Extract app details, ratings, reviews, and developer info from Google Play Store using Python — working with batchexecute endpoints and handling Google's anti-bot systems.

Scraping LinkedIn Company Pages: Python Guide (2026)

How to scrape LinkedIn company pages using Python — guest API endpoints, Voyager API structure, employee counts, job postings, TLS fingerprinting challenges, SQLite storage, and proxy integration.

How to Scrape Upwork Freelancer Data in 2026: Profiles, Rates & Job Postings

Learn how to extract Upwork freelancer profiles, hourly rates, skills, and job postings using the Upwork API and Python web scraping fallback. Covers OAuth authentication, rate limits, anti-bot measures, ThorData proxy integration, and SQLite storage.

Scraping Twitter/X Followers and Following (2026)

How to scrape Twitter/X follower and following lists in 2026 - guest tokens, GraphQL endpoints, cursor pagination, rate limits, and alternative approaches.

How to Scrape Yahoo Finance with Python (2026)

Extract stock quotes, financial statements, historical price data, and options chains from Yahoo Finance using Python — working with the v8 chart and v10 quoteSummary APIs. Includes SQLite storage, error handling, and proxy rotation.

How to Scrape Houzz Interior Design Data in 2026 (Playwright Guide)

A practical guide to scraping Houzz interior design photos, product listings, professional profiles, and project collections using Playwright browser automation with proxy rotation, stealth configuration, SQLite storage, and data analysis pipelines.

How to Scrape Eventbrite Events in 2026: Listings, Prices & Organizer Data

Learn how to extract Eventbrite event listings, ticket prices, attendee counts, and organizer data using the Eventbrite API v3 and Python web scraping. Covers full pagination, authentication, anti-bot handling, data storage, and ThorData proxy integration for scale.

Pulling World Bank Economic Data with Python (2026)

Access World Bank economic indicators, GDP data, and country statistics using the World Bank Open Data API with Python. Full guide with async bulk collection, SQLite storage, and proxy rotation.

Scrape LinkedIn Post Engagement Data with Python (2026)

Extract LinkedIn post engagement metrics, hashtag performance, and company content analytics using Python — a practical guide with working code, Voyager API access, Playwright fallback, and SQLite storage.

How to Scrape Drugs.com for Medication Data with Python (2026)

Scrape Drugs.com for drug information, dosage guides, interaction data, and user reviews using Python. Includes working code, proxy rotation, anti-detection techniques, CAPTCHA handling, and complete error handling.

Scraping Product Hunt Launches: Python Guide (2026)

How to scrape Product Hunt launches using Python — GraphQL API, pagination, upvote counts, maker profiles, and daily rankings with working code.

Scraping Vimeo Video Data with Python (2026)

Extract Vimeo video metadata, view counts, channel information, and embed data using the Vimeo API and oEmbed endpoint. Complete guide with async collection, SQLite storage, error handling, and proxy setup.

How to Scrape Etsy Listings in 2026: Shops, Products & Reviews

Extract Etsy shop data, product listings, pricing, and reviews using Python. Covers the bespoke AJAX API, public shop pages, anti-bot bypass techniques, SQLite storage, and proxy configuration.

Scraping Numbeo Cost of Living Data for City Comparisons (2026)

How to scrape Numbeo for cost of living indices, city comparisons, quality of life data, and property prices using Python. Covers anti-bot detection, proxy rotation, error handling, and SQLite storage.

Scraping Semantic Scholar Paper Metadata and Citations with Python (2026)

Extract academic paper metadata, citation graphs, author h-index, and reference lists from Semantic Scholar's public Graph API using Python. Covers batch fetching, SQLite storage, rate limiting, and scaling strategies.

Scraping IKEA Product Data and Prices Across Countries (2026)

Extract IKEA product catalog data, compare prices across countries, check store availability, and access assembly information using Python and IKEA's internal search API. Covers Akamai bypass, geo-IP proxy targeting, and SQLite storage.

Scraping Basketball-Reference for NBA Stats with Python (2026)

The complete guide to scraping Basketball-Reference for NBA player stats, game logs, advanced metrics, team data, and historical records using Python. Includes anti-detection, proxy rotation, and production-ready code.

How to Scrape Walmart Product Data with Python in 2026

Technical guide to extracting Walmart product prices, reviews, and inventory data. Covers the GraphQL API, price history tracking, and competitor monitoring strategies.

Scraping Trustpilot Reviews at Scale (2026)

Scrape Trustpilot company reviews, ratings, and consumer feedback using their public API with Python. Covers pagination, dynamic loading, and fake review detection.

How to Scrape Etsy Shop Analytics with Python (2026)

Scrape Etsy shop analytics — listing performance, review patterns, and sales estimates using Etsy API v3 and web scraping fallback with Python.

Scraping Patreon Creator Data with Python (2026)

Extract Patreon creator profiles, patron counts, tier pricing, and post frequency using the Patreon API v2 and Python web scraping techniques. Covers OAuth setup, Cloudflare bypass, ThorData proxy integration, RSS feed analysis, and SQLite storage.

How to Scrape npm Package Data with Python (2026 Guide)

Extract npm package metadata, download counts, version history, and dependency graphs using the npm registry API and Python. Includes rate limiting strategies, proxy rotation, and full dataset collection scripts.

How to Scrape Booking.com Hotel Data with Python in 2026

Complete guide to scraping Booking.com for hotel prices, availability, and reviews. Covers API endpoints, Playwright automation, anti-bot bypass, ThorData proxy integration, and price monitoring pipeline.

How to Scrape Yelp Business Data in 2026: A Complete Guide

Learn how to scrape Yelp business listings, reviews, and ratings using Python. Covers Yelp's anti-bot protections, structured data extraction, and proxy rotation strategies.

Scrape Facebook Public Pages & Post Engagement with Python (2026)

How to scrape Facebook public pages, post engagement metrics, and group data using the Meta Graph API and Playwright. Full code with authentication, rate limits, anti-detection, batch requests, proxy configuration, and data storage.

How to Scrape GitHub Repositories with Python (2026)

Extract GitHub repository data — stars, contributors, topics, and code search — using Python and the GitHub REST API. Covers rate limits, token auth, and pagination.

Scraping Roblox Game Statistics and Player Data with Python (2026)

Extract Roblox game visit counts, player concurrency, asset thumbnails, game passes, and developer stats using the Roblox API v2 and Python. Covers rate limits, async collection, ThorData proxy integration, and SQLite storage.

How to Scrape Craigslist Listings with Python (2026)

Scrape Craigslist listings across cities for pricing trends and geographic analysis. Covers RSS feeds, city-specific URLs, and anti-bot handling with Python.

Scraping Instacart Grocery Prices with Python (2026)

Track grocery item prices across stores on Instacart using web scraping. Working Python code for price comparison, availability monitoring, and deal tracking.

How to Scrape Metacritic Scores: Python Guide (2026)

Scrape Metacritic game and movie scores with Python — critic vs user ratings, review aggregation, search API, JSON-LD structured data extraction, and anti-bot handling.

How to Scrape Last.fm Music Data with Python (2026)

Extract scrobble history, artist stats, track info, and user listening habits from Last.fm using their public API and Python — with working code for all major endpoints.

Scraping Google Reviews and Business Data (2026)

Technical guide to scraping Google Maps reviews and business data in 2026 — place_id extraction, review pagination, and bypassing DataDome.

Advanced LinkedIn Profile Scraping Techniques (2026)

Deep dive into LinkedIn's Voyager API for scraping profiles, skills, endorsements, and connection graphs in 2026 - with Python code and anti-detection strategies.

How to Scrape News Articles via RSS in 2026: Full-Text Extraction at Scale

Extract full news articles using RSS feeds, newspaper4k, readability-lxml, CommonCrawl, and archive.org in 2026. Covers feed discovery, paywall bypass, anti-detection, ThorData proxy integration, and building a production SQLite pipeline.

How to Scrape LinkedIn Job Postings in 2026: No Login Required

Extract LinkedIn job postings, salaries, company data, and descriptions using Python without authentication. Covers pagination, anti-bot measures, proxy strategies, and building salary databases.

Scraping Apple Podcasts Data: Charts, Episodes and Reviews (2026)

Pull podcast chart rankings, episode metadata, and review data from Apple Podcasts using the iTunes Search API and targeted web scraping — with working Python code.

How to Scrape Apple App Store Data in 2026 (Python Guide)

Learn to extract app metadata, reviews, and rankings from Apple's App Store using Python. Covers iTunes Search API, app lookup, review endpoints, and residential proxy setup.

Scraping BoardGameGeek Data with Python (2026)

Extract board game ratings, mechanics, user collections, and play logs from BoardGameGeek using the BGG XML API v2 and Python.

Scraping Fandango Movie Showtimes and Ticket Prices with Python (2026)

How to scrape Fandango for movie showtimes, theater locations, and ticket pricing using Python. Covers anti-bot protections, Akamai bypass techniques, SQLite storage, proxy integration, and building regional showtime databases.

Scraping AngelList/Wellfound Jobs (2026)

How to scrape startup jobs from Wellfound (formerly AngelList Talent) — GraphQL API, job listings, salary ranges, equity data, and startup info in 2026.

Scraping TikTok in 2026: Video Data, Profiles, and the Unofficial API

TikTok is one of the most aggressively protected platforms on the internet. This guide covers the signature system (msToken, X-Bogus), public profile scraping, video data extraction, embedded JSON parsing, the Research API, pagination, data storage, proxy strategies, and realistic alternatives for 2026.

How to Scrape Redfin Real Estate Data in 2026 (API + Web Scraping)

A complete technical guide to extracting property listings, price history, market stats, comparable homes, and school data from Redfin using internal API endpoints, Python, and residential proxies — with full working code examples.

Scraping Thingiverse 3D Model Data and Remix Networks with Python (2026)

Extract 3D printable model metadata, download counts, remix relationships, and creator profiles from Thingiverse using the MakerBot API and Python. Covers authentication, rate limits, anti-detection, ThorData proxy integration, and SQLite storage.

Scraping Freelancer.com: Project Data, Bids and Skill Trends (2026)

How to pull project listings, bid counts, skill demand, and budget data from Freelancer.com using the official API and Python. Covers authentication, pagination, skill aggregation, and proxy configuration.

Scraping Uber Eats Restaurant Data with Python (2026)

Extract restaurant menus, delivery estimates, ratings, and menu item prices from Uber Eats using their internal API and Playwright. Working Python code with anti-bot bypass techniques.

Scraping bioRxiv Preprints: Author Networks and Topic Clusters (2026)

Extract biology preprint metadata, build author collaboration networks, and cluster research topics using the bioRxiv API and Python web scraping. Complete code with storage, pagination, and ThorData proxy integration.

Scraping Google News Articles in 2026 (RSS + Topic APIs)

How to scrape Google News articles using RSS feeds, topic endpoints, and build a deduplicated news aggregator in Python. Covers anti-bot measures, proxies, and full article content extraction at scale.

Scraping YouTube: Channels, Playlists, Comments, and Video Metadata (2026)

Go beyond basic video stats. Extract channels, playlists, comment threads, and metadata from YouTube using the Data API v3 and the unofficial InnerTube API. Covers quota management, batch requests, pagination, proxy strategies, data storage, and real use cases.

Scraping Instagram Data in 2026: Profiles, Posts, Reels, and the Mobile API

Instagram is one of the hardest platforms to scrape. This guide covers the official Graph API, public profile scraping with og:meta tags, the mobile/private API, session cookies, rate limits, CDN URL expiry, pagination, data storage, and proxy strategies for 2026.

Scraping GitHub: Repos, Stars, Issues, and User Profiles in 2026

A practical guide to extracting data from GitHub using REST API v3, GraphQL API v4, and archived datasets. Covers authentication, rate limits, pagination, proxy rotation, and scaling strategies.

Scraping Zillow and Real Estate Data (2026)

Zillow's Zestimate API is gone. This guide covers current scraping approaches using curl-cffi, ZPID extraction, Zillow's anti-bot measures, and when to use residential proxies.

Scraping TripAdvisor Reviews and Business Data (2026)

Extract TripAdvisor restaurant, hotel, and attraction reviews using Python and Playwright. Covers JSON-LD structured data, lazy-loaded review pagination, and proxy rotation.

Scraping Booking.com Hotel Data (2026)

How to extract hotel listings, prices, availability, ratings, and review counts from Booking.com in 2026. Covers the unofficial search JSON endpoint, URL construction, Playwright stealth, ThorData proxies, and full data pipeline.

Scraping eBay Products and Prices (2026)

eBay's Finding API is dead. This guide covers the current Browse API, direct HTML scraping with httpx and BeautifulSoup, pagination, seller ratings, bid prices, error handling, data storage, and residential proxy integration for scale.

Scraping LinkedIn Profiles and Job Listings in 2026 (Without Getting Banned)

How to scrape LinkedIn profiles and job listings in 2026 using curl-cffi, Playwright stealth, JSON-LD extraction, residential proxies, and pagination. Covers the 999 error, auth walls, bulk job scraping, data storage, and legal considerations.

Using Google Trends Unofficial API with Python (2026)

Access Google Trends data programmatically using the undocumented API. Extract interest over time, related queries, and regional breakdowns with raw HTTP requests in Python.

How to Scrape Pinterest Boards and Pins in 2026: The Definitive Python Guide

A comprehensive guide to extracting Pinterest boards, pins, search results, comments, trending data, and shopping pins with Python. Covers anti-detection, CSRF handling, proxy strategy, SQLite storage, and complete runnable scripts for every use case.

Scraping Cloudflare-Protected Sites in 2026

Cloudflare blocks most datacenter scrapers by default. This guide covers the techniques that actually work in 2026 — from TLS fingerprint spoofing to residential proxies, Playwright stealth, CAPTCHA handling, and proxy rotation with ThorData.

Playwright for Web Scraping in 2026: A Complete Practical Guide

How to use Playwright for scraping JavaScript-heavy sites in 2026 — setup, stealth, proxy rotation with ThorData, CAPTCHA handling, pagination, fingerprint spoofing, and production-ready patterns.

20+ Free Web Scrapers for Developers in 2026 (No API Key Required)

A curated list of free, production-ready scrapers for LinkedIn, Reddit, Twitter, Amazon, TikTok, YouTube, and 15+ more platforms — no API key required. Includes Python code examples, anti-detection techniques, and proxy rotation strategies.

Storing Scraped Data: SQLite, PostgreSQL, CSV, and JSON Compared (2026)

The complete guide to storing web scraping output — choosing between SQLite, PostgreSQL, CSV files, JSON, and cloud databases — with Python code examples, deduplication patterns, schema design, and when to use each option.

How to Find and Use Unofficial APIs for Web Scraping (2026 Complete Guide)

Every modern web app runs on an internal API that's far easier to scrape than HTML. Here's how to find those APIs with browser DevTools and mitmproxy, reproduce them in Python, handle authentication and rate limits, and build robust scrapers that don't break.

JavaScript Rendering for Web Scraping: When You Actually Need a Browser (2026)

The complete guide to JavaScript rendering in web scraping — how to detect when you need a headless browser, when to skip it, Playwright vs Puppeteer comparison, hidden API discovery, anti-detection techniques, and performance optimization.

How to Scrape Twitter/X Without the API in 2026 (Complete Guide)

Twitter's API costs $100-5000/month. Here's how to scrape tweets, profiles, and search results without it using Python, Playwright, and proxy rotation — with full working code and anti-detection strategies.

Scraping AliExpress Products Without Getting Blocked (2026)

AliExpress runs aggressive bot detection. This guide covers Playwright with residential proxies, window.__INIT_DATA__ extraction, proxy rotation strategies, CAPTCHA handling, and complete Python examples that actually work in 2026.

How to Scrape Reddit Without the API in 2026 (Complete Python Guide)

Reddit's API costs killed third-party apps. Here's what still works for scraping posts, comments, user profiles, and search results in 2026 — with full Python code, data storage, and anti-blocking strategies.

Extracting Structured Data from HTML: The Complete Python Guide (2026)

Master every HTML data extraction technique — CSS selectors, XPath, regex, JSON-LD, microdata, Open Graph, and JavaScript-rendered content — with production-ready Python examples and proxy integration.

BeautifulSoup Web Scraping Tutorial: Complete Python Guide (2026)

A complete BeautifulSoup scraping tutorial — parsing HTML, navigating the DOM, extracting data, proxy rotation, anti-detection headers, CAPTCHA handling, retry logic, and production patterns for 2026.

How to Scrape JavaScript-Heavy Websites with Playwright (2026)

Learn how to scrape dynamic, JavaScript-rendered websites using Playwright in Python. Covers setup, auto-wait, screenshots, performance tricks, proxy integration, anti-detection, and real-world use cases.

How to Structure a Web Scraping Project in Python (2026)

Complete guide to organizing production Python web scraping projects — folder layout, config management, proxy rotation, anti-detection, error handling, retry logic, scheduling, and real-world patterns from projects that actually run reliably.

Web Scraping Without Getting Blocked: A 2026 Practical Guide

A practical, layered guide to avoiding blocks while web scraping in 2026. Covers IP rotation with ThorData, headers, browser fingerprinting, behavioral analysis, CAPTCHA handling, and complete Python code examples.