Scraping Freelancer.com: Project Data, Bids and Skill Trends (2026)
Scraping Freelancer.com: Project Data, Bids and Skill Trends (2026)
Freelancer.com lists millions of active projects across every conceivable category — software development, design, writing, data entry, engineering. That makes it one of the better sources for freelance market intelligence: what skills are in demand, how much clients are willing to pay, how competitive specific categories are, which niche is heating up. If you're doing pricing research, building a job aggregator, or trying to spot emerging technical trends before they show up in salary surveys, the data is worth collecting.
Unlike Upwork, Freelancer's official API is comparatively accessible. You don't need a business account or a formal approval process to start querying project data. That said, there are still rate limits, Cloudflare on the web layer, and a few gotchas worth knowing before you write a single line of code.
What Data Is Available
The Freelancer API exposes a solid chunk of what you'd want:
- Project listings: title, description, project type (fixed vs. hourly), status (open, closed, frozen), category, subcategory
- Budget data: minimum and maximum budget ranges per project
- Bid counts: number of bids received, average bid amount — useful for gauging competition levels
- Skills required: each project tags required skills, which is the cleanest source for demand analysis
- Employer details: username, reputation score, country, hire rate, review count
- Timestamps: project creation date and deadline, which lets you track posting velocity over time
Category breakdowns and subcategories are also available, which lets you segment the market by domain without having to infer it from titles.
Anti-Bot Measures
Freelancer uses Cloudflare at the web layer, so direct HTML scraping from a clean datacenter IP will get challenged quickly. The API itself doesn't have that problem — authenticated API requests go through a different path — but you'll still run into rate limits.
Unauthenticated requests are limited to a small number of calls per hour and return limited data. Authenticated requests (with an OAuth2 token) have higher limits, roughly 600 requests per 10 minutes per token. Heavy usage — large batch jobs, rapid pagination — can trigger temporary IP-level throttling even with a valid token. If you're scraping supplementary web pages (employer profiles, reviews, project pages not exposed by the API), you'll need residential proxies to get through Cloudflare reliably.
CAPTCHA appears on the web interface under heavy automated access. The API path avoids this if you stay within rate limits and rotate IPs sensibly.
Using the Freelancer API
Get an OAuth2 token from the Freelancer Developer Portal. The API base is https://www.freelancer.com/api/projects/0.1/. All responses are JSON.
import httpx
import asyncio
import json
from typing import AsyncGenerator
API_BASE = "https://www.freelancer.com/api/projects/0.1"
TOKEN = "your_oauth2_token_here"
HEADERS = {
"freelancer-oauth-v1": TOKEN,
"Content-Type": "application/json",
}
# Optional: residential proxy for supplementary web requests
PROXY = "http://USER:[email protected]:9000"
async def fetch_projects(
client: httpx.AsyncClient,
query: str = "",
offset: int = 0,
limit: int = 100,
) -> dict:
params = {
"offset": offset,
"limit": limit,
"job_details": True,
"full_description": False,
"compact": False,
}
if query:
params["query"] = query
resp = await client.get(
f"{API_BASE}/projects/",
params=params,
headers=HEADERS,
timeout=30.0,
)
resp.raise_for_status()
return resp.json()
async def paginate_projects(query: str = "", max_results: int = 500) -> AsyncGenerator[dict, None]:
"""Paginate through project results, yielding individual projects."""
async with httpx.AsyncClient() as client:
offset = 0
limit = 100
fetched = 0
while fetched < max_results:
try:
data = await fetch_projects(client, query=query, offset=offset, limit=limit)
except httpx.HTTPStatusError as e:
print(f"HTTP error at offset {offset}: {e.response.status_code}")
break
except httpx.RequestError as e:
print(f"Request error: {e}")
break
result = data.get("result", {})
projects = result.get("projects", [])
if not projects:
break
for project in projects:
yield project
fetched += 1
if fetched >= max_results:
return
total = result.get("total_count", 0)
offset += limit
if offset >= total:
break
await asyncio.sleep(0.5) # stay within rate limits
def extract_project(raw: dict) -> dict:
"""Normalize a raw project object into a flat record."""
budget = raw.get("budget", {})
jobs = raw.get("jobs", [])
return {
"id": raw.get("id"),
"title": raw.get("title"),
"status": raw.get("status"),
"type": raw.get("type"),
"budget_min": budget.get("minimum"),
"budget_max": budget.get("maximum"),
"currency": raw.get("currency", {}).get("code"),
"bid_count": raw.get("bid_stats", {}).get("bid_count", 0),
"avg_bid": raw.get("bid_stats", {}).get("bid_avg"),
"skills": [j.get("name") for j in jobs],
"category": raw.get("category", {}).get("name"),
"subcategory": raw.get("sub_category", {}).get("name"),
"employer_country": raw.get("owner", {}).get("location", {}).get("country", {}).get("name"),
"posted_at": raw.get("time_submitted"),
"deadline": raw.get("deadline"),
}
async def collect_projects(query: str = "", max_results: int = 500) -> list[dict]:
projects = []
async for raw in paginate_projects(query=query, max_results=max_results):
projects.append(extract_project(raw))
return projects
if __name__ == "__main__":
results = asyncio.run(collect_projects(query="machine learning", max_results=200))
print(f"Collected {len(results)} projects")
with open("projects.json", "w") as f:
json.dump(results, f, indent=2)
Analyzing Skill Demand
Once you have a batch of projects, skill aggregation is straightforward. The jobs field on each project contains tagged skills — these map to Freelancer's internal job category taxonomy, which is fairly consistent.
from collections import Counter, defaultdict
import statistics
def analyze_skills(projects: list[dict]) -> dict:
skill_counts = Counter()
skill_budgets: dict[str, list[float]] = defaultdict(list)
for project in projects:
skills = project.get("skills", [])
budget_max = project.get("budget_max")
for skill in skills:
if not skill:
continue
skill_counts[skill] += 1
if budget_max and isinstance(budget_max, (int, float)):
skill_budgets[skill].append(float(budget_max))
results = []
for skill, count in skill_counts.most_common(50):
budgets = skill_budgets.get(skill, [])
results.append({
"skill": skill,
"project_count": count,
"median_budget_max": round(statistics.median(budgets), 2) if budgets else None,
"avg_budget_max": round(statistics.mean(budgets), 2) if budgets else None,
})
return {"top_skills": results, "total_projects_analyzed": len(projects)}
# Usage
projects = asyncio.run(collect_projects(max_results=1000))
analysis = analyze_skills(projects)
for row in analysis["top_skills"][:10]:
print(f"{row['skill']}: {row['project_count']} projects, median budget ${row['median_budget_max']}")
Running this across a few thousand projects gives you a ranked list of skills with budget context — not just what's in demand but what clients are paying for it.
Proxy Configuration
When querying the API at volume, or when you need to scrape supplementary web pages — employer profiles, project pages outside the API's coverage — residential proxies become necessary. Cloudflare on the web layer blocks datacenter IPs reliably. ThorData's residential proxies work well here; their rotating pool gives you a different IP per request without managing a list manually.
async def fetch_with_proxy(url: str, headers: dict) -> dict:
proxy = "http://USER:[email protected]:9000"
async with httpx.AsyncClient(proxy=proxy) as client:
resp = await client.get(url, headers=headers, timeout=30.0)
resp.raise_for_status()
return resp.json()
For pure API calls that stay within rate limits, a proxy is optional. Add it when you're hitting throttling from a single IP or when pulling web pages that the API doesn't expose. ThorData's residential proxy format supports per-request IP rotation by default — no session stickiness unless you explicitly configure it.
Tracking Market Trends
A one-off scrape tells you what the market looks like today. Run the same query weekly and store results in SQLite and you can detect real shifts.
import sqlite3
import json
from datetime import date
def save_skill_snapshot(analysis: dict, db_path: str = "freelancer_trends.db"):
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS skill_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_date TEXT,
skill TEXT,
project_count INTEGER,
median_budget_max REAL,
avg_budget_max REAL
)
""")
snapshot_date = date.today().isoformat()
rows = [
(snapshot_date, row["skill"], row["project_count"],
row["median_budget_max"], row["avg_budget_max"])
for row in analysis["top_skills"]
]
conn.executemany(
"INSERT INTO skill_snapshots (snapshot_date, skill, project_count, median_budget_max, avg_budget_max) VALUES (?,?,?,?,?)",
rows,
)
conn.commit()
conn.close()
print(f"Saved {len(rows)} skill rows for {snapshot_date}")
To detect trends, compare a skill's project_count week-over-week. A skill jumping from 200 to 350 appearances in two weeks is a signal worth paying attention to.
Practical Gotchas
Pagination caps out: The API's total_count field is sometimes inaccurate for large result sets. Don't rely on it to know when to stop — stop when you get an empty projects array instead.
Budget ranges are loose: Many clients post "$10-$500" ranges. Median budget across a skill category is more useful than average, since a handful of large projects skew averages significantly.
Skill names drift: Freelancer periodically renames or merges job categories. "Machine Learning" and "Machine Learning (ML)" have appeared as separate tags at different points. Normalize skill names to lowercase and strip parentheticals before aggregating.
Status filtering matters: The default query returns all project statuses including frozen and closed ones. Add status=open to the request params if you only want active projects, which is usually what you want for demand analysis.
Token expiry: OAuth2 tokens from Freelancer's API do expire. Build in token refresh logic before you schedule anything long-running, or the job silently stops returning data after the first few hours.
Employer Analysis and Quality Scoring
Not all clients are equal. Here is how to score employers by hire rate, review quality, and budget reliability:
import httpx
import asyncio
import json
import time
import random
from fake_useragent import UserAgent
ua = UserAgent()
API_BASE = "https://www.freelancer.com/api/projects/0.1"
TOKEN = "your_token_here"
HEADERS = {"freelancer-oauth-v1": TOKEN}
async def fetch_employer_profile(
client: httpx.AsyncClient,
user_id: int,
) -> dict:
"""Fetch employer profile data including hire rate and review count."""
url = f"https://www.freelancer.com/api/users/0.1/users/{user_id}"
params = {
"user_details": True,
"employer_reputation": True,
"jobs": True,
}
try:
resp = await client.get(url, params=params, headers=HEADERS, timeout=20)
resp.raise_for_status()
except httpx.HTTPStatusError:
return {}
data = resp.json().get("result", {})
reputation = data.get("employer_reputation", {})
stats = reputation.get("entire_history", {})
return {
"user_id": user_id,
"username": data.get("username"),
"country": data.get("location", {}).get("country", {}).get("name"),
"hire_rate": stats.get("hire_rate"),
"review_count": stats.get("reviews"),
"all_time_projects": stats.get("all"),
"complete_rate": stats.get("complete"),
"reputation_score": reputation.get("entire_history", {}).get("overall"),
}
def score_employer_quality(employer: dict) -> float:
"""
Score an employer 0-100 based on available signals.
Higher = more likely to be a good client.
"""
score = 50.0 # Start at neutral
hire_rate = employer.get("hire_rate")
if hire_rate is not None:
if hire_rate >= 0.8:
score += 20
elif hire_rate >= 0.5:
score += 10
elif hire_rate < 0.2:
score -= 20
review_count = employer.get("review_count", 0) or 0
if review_count >= 50:
score += 15
elif review_count >= 10:
score += 8
elif review_count == 0:
score -= 10
reputation = employer.get("reputation_score")
if reputation is not None:
score += (reputation - 0.5) * 30 # Scale around 0.5 baseline
return max(0, min(100, round(score, 1)))
async def enrich_projects_with_employer_data(
projects: list,
max_employer_fetches: int = 50,
) -> list:
"""Add employer quality scores to a list of projects."""
# Get unique employer IDs (limit to avoid burning API calls)
employer_ids = list(set(
p.get("employer_id") for p in projects
if p.get("employer_id")
))[:max_employer_fetches]
employer_data = {}
async with httpx.AsyncClient() as client:
for uid in employer_ids:
profile = await fetch_employer_profile(client, uid)
if profile:
profile["quality_score"] = score_employer_quality(profile)
employer_data[uid] = profile
await asyncio.sleep(0.5)
# Merge into projects
enriched = []
for project in projects:
uid = project.get("employer_id")
if uid and uid in employer_data:
project["employer_profile"] = employer_data[uid]
project["employer_quality_score"] = employer_data[uid]["quality_score"]
else:
project["employer_quality_score"] = None
enriched.append(project)
return enriched
Budget Distribution Analysis
Understanding what clients actually pay is more useful than what they budget:
import statistics
from collections import Counter
def analyze_budget_distribution(projects: list) -> dict:
"""Analyze budget ranges and bid patterns to understand actual market rates."""
budget_mins = [p.get("budget_min") for p in projects if p.get("budget_min") and p["budget_min"] > 0]
budget_maxs = [p.get("budget_max") for p in projects if p.get("budget_max") and p["budget_max"] > 0]
avg_bids = [p.get("avg_bid") for p in projects if p.get("avg_bid") and p["avg_bid"] > 0]
bid_counts = [p.get("bid_count") for p in projects if p.get("bid_count") is not None]
# Budget range analysis
budget_ranges = []
for p in projects:
bmin = p.get("budget_min", 0) or 0
bmax = p.get("budget_max", 0) or 0
if bmin > 0 and bmax > bmin:
budget_ranges.append((bmin + bmax) / 2)
# Competition analysis (bid count distribution)
competition_levels = Counter()
for bc in bid_counts:
if bc == 0:
competition_levels["no_bids"] += 1
elif bc <= 5:
competition_levels["low"] += 1
elif bc <= 20:
competition_levels["medium"] += 1
elif bc <= 50:
competition_levels["high"] += 1
else:
competition_levels["very_high"] += 1
result = {
"total_projects": len(projects),
"budget_ranges": {},
"competition_distribution": dict(competition_levels),
}
if budget_ranges:
result["budget_ranges"] = {
"median_midpoint": round(statistics.median(budget_ranges), 2),
"mean_midpoint": round(statistics.mean(budget_ranges), 2),
"p25": sorted(budget_ranges)[len(budget_ranges)//4],
"p75": sorted(budget_ranges)[3*len(budget_ranges)//4],
}
if avg_bids:
result["avg_bid_stats"] = {
"median": round(statistics.median(avg_bids), 2),
"mean": round(statistics.mean(avg_bids), 2),
}
return result
Skill Demand Forecasting
Use time-series data to predict which skills are growing:
import sqlite3
from datetime import datetime, timedelta
def forecast_skill_demand(
skill: str,
db_path: str = "freelancer_trends.db",
weeks_back: int = 8,
) -> dict:
"""
Analyze a skill's demand trend and project forward.
Uses linear regression on weekly project counts.
"""
conn = sqlite3.connect(db_path)
rows = conn.execute("""
SELECT snapshot_date, project_count
FROM skill_snapshots
WHERE skill = ?
ORDER BY snapshot_date ASC
""", (skill,)).fetchall()
conn.close()
if len(rows) < 3:
return {"skill": skill, "error": "insufficient_data", "data_points": len(rows)}
dates = [row[0] for row in rows]
counts = [row[1] for row in rows]
# Simple linear regression for trend
n = len(counts)
x_values = list(range(n))
x_mean = sum(x_values) / n
y_mean = sum(counts) / n
numerator = sum((x - x_mean) * (y - y_mean) for x, y in zip(x_values, counts))
denominator = sum((x - x_mean) ** 2 for x in x_values)
if denominator == 0:
slope = 0
else:
slope = numerator / denominator
intercept = y_mean - slope * x_mean
# Project 4 weeks ahead
next_4_weeks = [
max(0, slope * (n + i) + intercept)
for i in range(4)
]
# Calculate trend strength (% change from first to last)
if counts[0] > 0:
pct_change = ((counts[-1] - counts[0]) / counts[0]) * 100
else:
pct_change = 0
return {
"skill": skill,
"current_count": counts[-1],
"trend_slope": round(slope, 2),
"pct_change_over_period": round(pct_change, 1),
"trend_direction": "up" if slope > 0.5 else "down" if slope < -0.5 else "stable",
"projected_next_4_weeks": [round(v) for v in next_4_weeks],
"data_points": len(rows),
"date_range": f"{dates[0]} to {dates[-1]}",
}
Proxy and Rate Limit Configuration
The Freelancer API allows ~600 requests per 10 minutes per token. For web scraping fallback:
PROXY = "http://USER:[email protected]:9000"
class RateLimitedAPIClient:
"""API client with automatic rate limiting and retry logic."""
def __init__(self, token: str, proxy: str = None, requests_per_10min: int = 500):
self.token = token
self.proxy = proxy
self.requests_per_10min = requests_per_10min
self._request_times = []
def _wait_if_needed(self):
import time
now = time.time()
# Remove requests older than 10 minutes
self._request_times = [t for t in self._request_times if now - t < 600]
if len(self._request_times) >= self.requests_per_10min:
# Wait until oldest request falls out of window
wait_time = 600 - (now - self._request_times[0]) + 1
if wait_time > 0:
print(f" Rate limit pause: {wait_time:.0f}s")
time.sleep(wait_time)
self._request_times.append(now)
async def get(self, url: str, params: dict = None) -> dict:
self._wait_if_needed()
client_kwargs = {
"headers": {"freelancer-oauth-v1": self.token},
"timeout": 30,
}
if self.proxy:
client_kwargs["proxies"] = {"all://": self.proxy}
async with httpx.AsyncClient(**client_kwargs) as client:
for attempt in range(3):
try:
resp = await client.get(url, params=params)
if resp.status_code == 200:
return resp.json()
elif resp.status_code == 429:
wait = 2 ** attempt * 30
print(f" 429 Rate limited, waiting {wait}s")
await asyncio.sleep(wait)
elif resp.status_code == 401:
return {"error": "unauthorized", "status": 401}
else:
return {"error": f"http_{resp.status_code}"}
except httpx.RequestError as e:
if attempt == 2:
return {"error": str(e)}
await asyncio.sleep(5)
return {"error": "max_retries"}
Complete Analytics Pipeline
async def run_full_market_analysis(
search_queries: list,
db_path: str = "freelancer_trends.db",
proxy: str = None,
):
"""
Complete market analysis pipeline:
1. Collect projects across multiple query categories
2. Enrich with employer quality data
3. Analyze skill demand and budget distribution
4. Save snapshots for trend tracking
"""
api_client = RateLimitedAPIClient(TOKEN, proxy=proxy)
print("Phase 1: Collecting projects")
all_projects = []
for query in search_queries:
print(f" Query: '{query}'")
async for raw in paginate_projects(query=query, max_results=200):
all_projects.append(extract_project(raw))
await asyncio.sleep(random.uniform(3, 8))
print(f"Collected {len(all_projects)} projects")
print("\nPhase 2: Skill analysis")
skill_analysis = analyze_skills(all_projects)
conn = sqlite3.connect(db_path)
save_skill_snapshot(skill_analysis, db_path)
print("\nPhase 3: Budget analysis")
budget_stats = analyze_budget_distribution(all_projects)
print("\nPhase 4: Trend analysis (if historical data exists)")
skill_trends = []
for skill_row in skill_analysis["top_skills"][:20]:
trend = forecast_skill_demand(skill_row["skill"], db_path)
if "error" not in trend:
skill_trends.append(trend)
conn.close()
# Report
print("\n=== Market Report ===")
print("\nTop 10 skills by demand:")
for row in skill_analysis["top_skills"][:10]:
print(f" {row['skill']:<35} {row['project_count']:>5} projects ${row['median_budget_max']}")
if skill_trends:
growing = [t for t in skill_trends if t["trend_direction"] == "up"]
print(f"\nGrowing skills ({len(growing)} of {len(skill_trends)} analyzed):")
for t in sorted(growing, key=lambda x: x["pct_change_over_period"], reverse=True)[:5]:
print(f" {t['skill']:<35} +{t['pct_change_over_period']:.0f}% trend")
# Run analysis
asyncio.run(run_full_market_analysis(
search_queries=[
"machine learning AI",
"web scraping python",
"react nextjs",
"data analytics",
"blockchain smart contract",
"mobile app flutter",
],
proxy="http://USER:[email protected]:9000",
))
Key Takeaways
- Freelancer's official OAuth2 API provides structured project data at 600 requests/10 min -- start here before any scraping
- The
paginate_projectsgenerator handles pagination correctly; stop on empty responses rather than relying ontotal_count - Skill demand analysis with weekly snapshots stored in SQLite is the highest-value use of this data
- Budget medians are more meaningful than averages for freelance market research -- a few large enterprise projects skew means significantly
- For web page scraping (employer profiles, supplementary data), Cloudflare protection requires ThorData's residential proxies
- The employer quality score helps filter out low-quality clients before bidding -- hire rate and reputation score are the two strongest signals
- Trend direction (slope of weekly project counts) is more actionable than absolute counts for identifying growing skill areas