How to Scrape Upwork Freelancer Data in 2026: Profiles, Rates & Job Postings
How to Scrape Upwork Freelancer Data in 2026: Profiles, Rates & Job Postings
Upwork has over 18 million registered freelancers as of 2026. That's a massive dataset sitting in public profiles — hourly rates, skills, job success scores, earnings history. If you're doing market rate research, building a talent sourcing tool, tracking how rates shift across categories, or understanding supply and demand in a freelance niche, Upwork is one of the best sources around. The problem is getting the data out cleanly.
There are two paths: the official Upwork API (OAuth 1.0a, limited but stable), and direct page scraping as a fallback when the API doesn't cover what you need. This guide covers both, along with the anti-detection strategies that actually work and how to store everything systematically.
Why Upwork Data Is Valuable
Before diving into the technical approach, it's worth understanding what makes this data set uniquely useful:
Hourly rates are verified by the market — unlike survey-based salary data, Upwork rates represent what freelancers actually charge and clients actually pay. A $120/hr rate on an active Upwork profile with 95% job success score and $500K+ earnings is credibly priced.
Skill demand signals are real-time — as new technologies emerge, Upwork job postings reflect demand within weeks. Tracking skill frequency across job postings gives you an early indicator of what's gaining traction in the market.
Geographic rate differentials — Upwork's global freelancer base with USD rates makes it one of the few sources for cross-country rate comparison without currency conversion complexity.
Job success scores — Upwork's JSS metric aggregates client feedback into a single score that correlates strongly with actual quality. It's a proxy metric that other platforms don't have equivalents of.
The Upwork API: OAuth 1.0a Setup
Upwork offers a real developer API, but it requires OAuth 1.0a — not the more modern OAuth 2.0 despite what some outdated documentation implies. Get your client key and secret from the Upwork Developer Portal, then complete the three-legged OAuth flow to get access tokens.
uv pip install requests requests-oauthlib beautifulsoup4 httpx
Setting Up OAuth Authentication
import requests
from requests_oauthlib import OAuth1Session
import json
import os
# Store credentials in environment variables, never in code
CLIENT_KEY = os.environ["UPWORK_CLIENT_KEY"]
CLIENT_SECRET = os.environ["UPWORK_CLIENT_SECRET"]
ACCESS_TOKEN = os.environ["UPWORK_ACCESS_TOKEN"]
ACCESS_TOKEN_SECRET = os.environ["UPWORK_ACCESS_TOKEN_SECRET"]
def get_oauth_session() -> OAuth1Session:
"""Create an authenticated OAuth 1.0a session for Upwork API calls."""
return OAuth1Session(
CLIENT_KEY,
client_secret=CLIENT_SECRET,
resource_owner_key=ACCESS_TOKEN,
resource_owner_secret=ACCESS_TOKEN_SECRET,
)
def api_request(
endpoint: str,
params: dict = None,
oauth: OAuth1Session = None,
max_retries: int = 3,
) -> dict:
"""
Make an Upwork API request with rate limit handling and retries.
Returns the parsed JSON response, or {} on failure.
"""
import time
if oauth is None:
oauth = get_oauth_session()
url = f"https://www.upwork.com{endpoint}"
default_params = {"format": "json"}
if params:
default_params.update(params)
for attempt in range(max_retries):
try:
resp = oauth.get(url, params=default_params, timeout=20)
except Exception as e:
print(f"Request error (attempt {attempt + 1}): {e}")
time.sleep(2 ** attempt)
continue
if resp.status_code == 200:
return resp.json()
elif resp.status_code == 429:
wait = int(resp.headers.get("Retry-After", 30))
print(f"Rate limited. Waiting {wait}s (attempt {attempt + 1}/{max_retries})")
time.sleep(wait)
elif resp.status_code == 403:
print(f"Access denied for {endpoint}")
return {}
elif resp.status_code == 404:
print(f"Not found: {endpoint}")
return {}
else:
print(f"Error {resp.status_code} on {endpoint}: {resp.text[:200]}")
return {}
return {}
Fetching Freelancer Profiles via API
def get_freelancer_profile(username: str, oauth: OAuth1Session = None) -> dict:
"""
Fetch a freelancer's profile by Upwork username.
The profile contains hourly rate, skills, JSS, earnings, and portfolio info.
Note: username here is the Upwork profile URL slug, not the ~ID format.
"""
data = api_request(
f"/api/profiles/v1/providers/{username}",
oauth=oauth,
)
profile = data.get("profile", {})
if not profile:
return {}
# Extract skills list (response format can vary)
skills_raw = profile.get("skills", {})
if isinstance(skills_raw, dict):
skill_items = skills_raw.get("skill", [])
elif isinstance(skills_raw, list):
skill_items = skills_raw
else:
skill_items = []
skills = [
s.get("o:skill") if isinstance(s, dict) else str(s)
for s in skill_items
]
return {
"username": username,
"name": profile.get("dev_full_name"),
"title": profile.get("dev_blurb"),
"hourly_rate": profile.get("dev_bill_rate"),
"skills": skills,
"job_success_score": profile.get("dev_recent_rank_percentile"),
"total_earnings": profile.get("dev_total_revenue"),
"total_hours": profile.get("dev_total_hours_rounded"),
"country": profile.get("dev_country"),
"timezone": profile.get("dev_timezone"),
"member_since": profile.get("dev_member_since"),
"last_activity": profile.get("dev_last_activity"),
"availability": profile.get("dev_availability"),
"profile_url": profile.get("profile_url"),
"feedback_score": profile.get("dev_score"),
"response_time": profile.get("dev_response_time"),
}
# Fetch a specific profile
oauth = get_oauth_session()
profile = get_freelancer_profile("some-username", oauth=oauth)
print(f"{profile['name']} — ${profile['hourly_rate']}/hr — JSS: {profile['job_success_score']}")
Searching Freelancers
The profile search endpoint lets you find freelancers by skills and categories:
def search_freelancers(
query: str,
category: str = None,
min_rate: float = None,
max_rate: float = None,
min_jss: float = None,
page: int = 0,
per_page: int = 10,
oauth: OAuth1Session = None,
) -> list[dict]:
"""
Search for freelancers by skill keyword and filters.
Args:
query: Skill keyword (e.g., "python machine learning")
category: Category v2 name (e.g., "Web, Mobile & Software Dev")
min_rate/max_rate: Hourly rate range in USD
min_jss: Minimum job success score (0-100)
page: Page offset (0-based, 10 results per page)
"""
params = {
"q": query,
"paging": f"{page * per_page};{per_page}",
}
if category:
params["category2"] = category
if min_rate:
params["hourly_rate"] = f"{int(min_rate)}:"
if max_rate:
existing_rate = params.get("hourly_rate", ":")
min_part = existing_rate.split(":")[0]
params["hourly_rate"] = f"{min_part}:{int(max_rate)}"
data = api_request("/api/profiles/v2/search/providers", params=params, oauth=oauth)
freelancers = []
providers = data.get("providers", {})
if isinstance(providers, dict):
provider_list = providers.get("provider", [])
else:
provider_list = []
for p in provider_list:
freelancers.append({
"username": p.get("dev_username"),
"name": p.get("dev_full_name"),
"title": p.get("dev_blurb"),
"hourly_rate": p.get("dev_bill_rate"),
"country": p.get("dev_country"),
"jss": p.get("dev_recent_rank_percentile"),
"total_hours": p.get("dev_total_hours_rounded"),
"profile_url": p.get("profile_url"),
})
return freelancers
def search_all_freelancers(
query: str,
max_results: int = 100,
oauth: OAuth1Session = None,
) -> list[dict]:
"""Paginate through freelancer search results."""
all_results = []
page = 0
while len(all_results) < max_results:
batch = search_freelancers(query, page=page, oauth=oauth)
if not batch:
break
all_results.extend(batch)
page += 1
import time
time.sleep(2.0) # 30 req/min = 2s between requests
return all_results[:max_results]
Searching Job Postings via API
def search_jobs(
query: str,
category: str = None,
job_type: str = None,
min_budget: float = None,
page: int = 0,
max_results: int = 50,
oauth: OAuth1Session = None,
) -> list[dict]:
"""
Search Upwork job postings via the API.
Args:
query: Keyword query
category: Category v2 name
job_type: "hourly" or "fixed-price"
min_budget: Minimum budget in USD
"""
jobs = []
page_num = page
while len(jobs) < max_results:
params = {
"q": query,
"paging": f"{page_num * 10};10",
}
if category:
params["category2"] = category
if job_type:
params["job_type"] = job_type
data = api_request(
"/api/profiles/v2/search/jobs",
params=params,
oauth=oauth,
)
job_data = data.get("jobs", {})
if isinstance(job_data, dict):
results = job_data.get("job", [])
else:
results = []
if not results:
break
for job in results:
skills = job.get("skills", {})
if isinstance(skills, dict):
skill_list = skills.get("skill", [])
else:
skill_list = []
jobs.append({
"title": job.get("title"),
"description": (job.get("description") or "")[:500],
"budget": job.get("budget"),
"job_type": job.get("job_type"),
"duration": job.get("duration"),
"skills": skill_list if isinstance(skill_list, list) else [skill_list],
"posted": job.get("date_created"),
"job_id": job.get("id"),
"url": job.get("url"),
"client_country": job.get("client", {}).get("country") if isinstance(job.get("client"), dict) else None,
"client_total_spent": job.get("client", {}).get("total_spent") if isinstance(job.get("client"), dict) else None,
})
page_num += 1
import time
time.sleep(2.0)
return jobs[:max_results]
The API rate limit is 30 requests per minute per token. Stay well under it — Upwork will hard-block your key if you exceed it repeatedly.
Web Scraping Fallback: When the API Isn't Enough
The API doesn't expose everything. Portfolio items, detailed earnings breakdowns, review text, client feedback — those live on the profile page, not in the API response. Here's a scraping approach using requests and BeautifulSoup.
Profile Page Structure
Upwork renders profile pages server-side for public views, so basic HTTP requests work for getting the initial content. The profile data is embedded in Next.js page props or directly in the HTML.
import requests
from bs4 import BeautifulSoup
import json
import time
import random
from typing import Optional
BROWSER_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/125.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Referer": "https://www.upwork.com/",
}
def scrape_freelancer_page(
profile_url: str,
proxy: Optional[str] = None,
) -> dict:
"""
Scrape a public Upwork freelancer profile page.
profile_url: Full URL like https://www.upwork.com/freelancers/~01234567890abcdef
proxy: Optional proxy URL for bypassing blocks
"""
kwargs = {
"headers": BROWSER_HEADERS,
"timeout": 25,
"allow_redirects": True,
}
if proxy:
kwargs["proxies"] = {"http": proxy, "https": proxy}
try:
resp = requests.get(profile_url, **kwargs)
except requests.exceptions.RequestException as e:
print(f"Request failed for {profile_url}: {e}")
return {}
if resp.status_code == 403:
print(f"Blocked (403) — try with residential proxy")
return {}
if resp.status_code == 404:
print(f"Profile not found: {profile_url}")
return {}
if resp.status_code != 200:
print(f"Status {resp.status_code} for {profile_url}")
return {}
soup = BeautifulSoup(resp.text, "html.parser")
# Try to extract embedded JSON first (most reliable)
next_data = soup.find("script", {"id": "__NEXT_DATA__"})
if next_data:
try:
data = json.loads(next_data.string)
# Navigate Next.js page props
profile_data = (
data.get("props", {})
.get("pageProps", {})
.get("profile", {})
)
if profile_data:
return _extract_from_next_data(profile_data, profile_url)
except (json.JSONDecodeError, AttributeError):
pass
# Fall back to HTML parsing
return _extract_from_html(soup, profile_url)
def _extract_from_html(soup: BeautifulSoup, url: str) -> dict:
"""Extract profile data from HTML elements (fallback method)."""
def safe_text(selector: str) -> Optional[str]:
el = soup.select_one(selector)
return el.get_text(strip=True) if el else None
# Name and title
name = safe_text("h1[itemprop='name']") or safe_text("[data-test='freelancer-name']")
title = safe_text("p.title") or safe_text("[data-test='freelancer-title']")
# Rate
rate_el = soup.select_one("[data-test='dev-bill-rate']") or soup.select_one(".rate")
rate = rate_el.get_text(strip=True) if rate_el else None
# Skills
skill_els = (
soup.select("[data-test='badge-label']") or
soup.select(".skills-list .skill") or
soup.select("[data-test='freelancer-skill']")
)
skills = [s.get_text(strip=True) for s in skill_els]
# Stats
jss = safe_text("[data-test='job-success-score']") or safe_text(".job-success-score")
earnings = safe_text("[data-test='earned-amount']") or safe_text(".total-earned")
hours = safe_text("[data-test='total-hours']")
# Reviews
reviews = []
for review_el in soup.select("[data-test='feedback-card']")[:5]:
reviews.append({
"text": (review_el.get_text(strip=True) or "")[:300],
})
return {
"name": name,
"title": title,
"hourly_rate": rate,
"skills": skills,
"job_success_score": jss,
"total_earnings": earnings,
"total_hours": hours,
"recent_reviews": reviews,
"url": url,
}
def _extract_from_next_data(profile_data: dict, url: str) -> dict:
"""Extract profile data from Next.js embedded JSON."""
return {
"name": profile_data.get("name"),
"title": profile_data.get("title"),
"hourly_rate": profile_data.get("hourlyRate", {}).get("amount") if isinstance(profile_data.get("hourlyRate"), dict) else profile_data.get("hourlyRate"),
"skills": [s.get("prettyName", s.get("name", "")) for s in profile_data.get("skills", [])],
"job_success_score": profile_data.get("jobSuccessScore"),
"total_earnings": profile_data.get("totalEarnings"),
"total_hours": profile_data.get("totalHours"),
"country": profile_data.get("location", {}).get("country") if isinstance(profile_data.get("location"), dict) else None,
"availability": profile_data.get("availability"),
"url": url,
}
Anti-Bot Measures: The Full Picture
Upwork's bot detection is aggressive compared to most job marketplaces — they're protecting commercial data with real monetary value. Here's a systematic breakdown of what you're dealing with.
Layer 1: Cloudflare
The first obstacle. Most requests from datacenter IPs (AWS, DigitalOcean, Hetzner, Vultr ranges) get challenged or silently 403'd before they reach Upwork's servers. Cloudflare validates:
- TLS fingerprint — the cipher suite ordering in your TLS handshake. Python's
requestshas a characteristic fingerprint that differs from browser TLS. Tools likecurl_cffimimic Chrome's TLS fingerprint to bypass this. - IP reputation — datacenter IP ranges have poor reputation scores with Cloudflare. Residential IPs from real ISPs score much better.
- HTTP/2 fingerprint — the order and values of HTTP/2 headers differ between browsers and HTTP clients.
Layer 2: Session Validation
Upwork checks that your cookies, localStorage tokens, and request patterns look like a real browser session. A cold HTTP request with no prior cookies gets flagged quickly. When you visit Upwork for the first time, their JavaScript sets several tracking cookies and localStorage values that subsequent requests are expected to carry.
Layer 3: Behavioral Analysis
Continuous requests at uniform intervals look automated. Browser users click, scroll, pause, navigate — the timing patterns are irregular. Very regular request intervals (like a scraper with time.sleep(5)) can trigger increased scrutiny even with legitimate session cookies.
Layer 4: IP Reputation Scoring
Upwork maintains reputation scores for IP ranges. IPs that have accessed Upwork at scale before — even if they haven't violated rate limits — may have elevated risk scores. This is why IP rotation matters even when individual requests are within rate limits.
Using curl_cffi for TLS Fingerprint Spoofing
from curl_cffi import requests as cffi_requests
import time
import random
def scrape_upwork_profile_cffi(
profile_url: str,
proxy: Optional[str] = None,
) -> dict:
"""
Scrape Upwork profile using curl_cffi to mimic Chrome's TLS fingerprint.
Significantly reduces Cloudflare detection vs. standard requests/httpx.
"""
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
"accept-language": "en-US,en;q=0.9",
"accept-encoding": "gzip, deflate, br",
"sec-ch-ua": '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"upgrade-insecure-requests": "1",
}
kwargs = {
"headers": headers,
"impersonate": "chrome124", # Mimic Chrome 124's full TLS+HTTP2 fingerprint
"timeout": 25,
}
if proxy:
kwargs["proxies"] = {"http": proxy, "https": proxy}
try:
resp = cffi_requests.get(profile_url, **kwargs)
except Exception as e:
return {"error": str(e)}
if resp.status_code != 200:
return {"status_code": resp.status_code}
soup = BeautifulSoup(resp.text, "html.parser")
return _extract_from_html(soup, profile_url)
Install with: uv pip install curl-cffi
Proxy Strategy with ThorData
Upwork's bot detection requires residential proxies — datacenter ranges are consistently blocked at Cloudflare. ThorData's residential proxy network provides IPs from real ISP ranges that pass Cloudflare's IP reputation checks.
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "gate.thordata.net"
THORDATA_PORT = 9000
def make_proxy(country: str = "us", session_id: str = None) -> str:
"""
Build a ThorData residential proxy URL.
Use session_id for sticky sessions (same IP across multiple requests).
Sticky sessions are important for Upwork to maintain session continuity.
"""
user = f"{THORDATA_USER}-country-{country}"
if session_id:
user += f"-session-{session_id}"
return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"
def scrape_upwork_batch(
profile_urls: list[str],
delay_range: tuple = (5, 12),
) -> list[dict]:
"""
Scrape a batch of Upwork profiles with randomized delays and proxy rotation.
Rotate proxy per profile but use sticky session within profile scrape.
"""
import random
import string
results = []
for url in profile_urls:
# Fresh sticky session per profile
session_id = "".join(random.choices(string.ascii_lowercase, k=8))
proxy = make_proxy(country="us", session_id=session_id)
profile = scrape_freelancer_page(url, proxy=proxy)
if profile:
results.append(profile)
print(f"Got: {profile.get('name', 'unknown')} @ {profile.get('hourly_rate', '?')}/hr")
# Randomized delay mimics human browsing patterns
delay = random.uniform(*delay_range)
time.sleep(delay)
return results
Rotate the proxy per profile scrape, not per individual request within a profile. Reusing the same residential IP for 50+ consecutive requests still triggers rate limits, but using a fresh IP per profile while keeping the same IP for the page load chain of a single profile prevents session mismatch detection.
Storing Data in SQLite
import sqlite3
import json
from datetime import datetime, timezone
def init_db(db_path: str = "upwork_data.db") -> sqlite3.Connection:
"""Initialize the Upwork data database."""
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS freelancers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT UNIQUE,
profile_url TEXT,
name TEXT,
title TEXT,
hourly_rate TEXT,
hourly_rate_numeric REAL,
skills TEXT,
job_success_score TEXT,
total_earnings TEXT,
total_hours TEXT,
country TEXT,
availability TEXT,
first_seen TEXT,
last_updated TEXT
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS rate_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
username TEXT NOT NULL,
hourly_rate REAL,
job_success_score REAL,
recorded_at TEXT NOT NULL
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS jobs (
id TEXT PRIMARY KEY,
title TEXT,
description TEXT,
budget TEXT,
job_type TEXT,
duration TEXT,
skills TEXT,
posted TEXT,
client_country TEXT,
client_total_spent REAL,
scraped_at TEXT NOT NULL
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_freelancers_country ON freelancers(country)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_rate_snapshots_user ON rate_snapshots(username)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_jobs_posted ON jobs(posted)")
conn.commit()
return conn
def parse_rate(rate_str: str) -> Optional[float]:
"""Parse '$125.00' or '125' to float."""
if not rate_str:
return None
try:
cleaned = rate_str.replace("$", "").replace(",", "").split("/")[0].strip()
return float(cleaned)
except (ValueError, AttributeError):
return None
def save_freelancer(conn: sqlite3.Connection, profile: dict):
"""Save or update a freelancer profile, tracking rate history."""
now = datetime.now(timezone.utc).isoformat()
username = profile.get("username") or (profile.get("url", "").split("/")[-1])
rate_numeric = parse_rate(profile.get("hourly_rate"))
# Track rate snapshot
if username and rate_numeric:
conn.execute("""
INSERT INTO rate_snapshots (username, hourly_rate, job_success_score, recorded_at)
VALUES (?,?,?,?)
""", (username, rate_numeric, profile.get("job_success_score"), now))
conn.execute("""
INSERT INTO freelancers
(username, profile_url, name, title, hourly_rate, hourly_rate_numeric,
skills, job_success_score, total_earnings, total_hours,
country, availability, first_seen, last_updated)
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
ON CONFLICT(username) DO UPDATE SET
name=excluded.name,
title=excluded.title,
hourly_rate=excluded.hourly_rate,
hourly_rate_numeric=excluded.hourly_rate_numeric,
skills=excluded.skills,
job_success_score=excluded.job_success_score,
total_earnings=excluded.total_earnings,
total_hours=excluded.total_hours,
availability=excluded.availability,
last_updated=excluded.last_updated
""", (
username,
profile.get("profile_url") or profile.get("url"),
profile.get("name"),
profile.get("title"),
profile.get("hourly_rate"),
rate_numeric,
json.dumps(profile.get("skills", [])),
str(profile.get("job_success_score", "")),
profile.get("total_earnings"),
profile.get("total_hours"),
profile.get("country"),
profile.get("availability"),
now,
now,
))
conn.commit()
def save_jobs(conn: sqlite3.Connection, jobs: list[dict]):
"""Save job postings to the database."""
now = datetime.now(timezone.utc).isoformat()
for job in jobs:
budget = job.get("budget")
budget_numeric = None
if isinstance(budget, (int, float)):
budget_numeric = float(budget)
elif isinstance(budget, str):
budget_numeric = parse_rate(budget)
conn.execute("""
INSERT OR REPLACE INTO jobs
(id, title, description, budget, job_type, duration, skills,
posted, client_country, client_total_spent, scraped_at)
VALUES (?,?,?,?,?,?,?,?,?,?,?)
""", (
str(job.get("job_id", "")),
job.get("title"),
job.get("description"),
str(budget) if budget else None,
job.get("job_type"),
job.get("duration"),
json.dumps(job.get("skills", [])),
job.get("posted"),
job.get("client_country"),
budget_numeric,
now,
))
conn.commit()
Market Rate Analysis
With freelancer data in SQLite, you can build useful market intelligence:
def rate_distribution_by_skill(conn: sqlite3.Connection, skill: str) -> dict:
"""
Analyze hourly rate distribution for freelancers with a specific skill.
Useful for understanding market rates in a niche.
"""
rows = conn.execute("""
SELECT hourly_rate_numeric, country
FROM freelancers
WHERE skills LIKE ?
AND hourly_rate_numeric IS NOT NULL
AND hourly_rate_numeric > 0
ORDER BY hourly_rate_numeric
""", (f"%{skill}%",)).fetchall()
if not rows:
return {}
rates = [r[0] for r in rows]
rates.sort()
n = len(rates)
p25 = rates[int(n * 0.25)]
p50 = rates[int(n * 0.50)]
p75 = rates[int(n * 0.75)]
p90 = rates[int(n * 0.90)]
# Rate by country
from collections import defaultdict
by_country = defaultdict(list)
for rate, country in rows:
if country:
by_country[country].append(rate)
country_medians = {
c: sorted(r)[len(r)//2]
for c, r in by_country.items()
if len(r) >= 3
}
return {
"skill": skill,
"sample_size": n,
"p25_rate": p25,
"median_rate": p50,
"p75_rate": p75,
"p90_rate": p90,
"avg_rate": round(sum(rates) / n, 2),
"country_medians": dict(sorted(country_medians.items(), key=lambda x: x[1], reverse=True)[:10]),
}
def trending_skills_from_jobs(conn: sqlite3.Connection, days: int = 30) -> list[dict]:
"""
Find skills that appear most frequently in recent job postings.
Higher frequency = more demand = potentially higher rates.
"""
from datetime import datetime, timedelta
cutoff = (datetime.now(timezone.utc) - timedelta(days=days)).isoformat()
rows = conn.execute("""
SELECT skills FROM jobs
WHERE posted >= ? AND skills IS NOT NULL
""", (cutoff,)).fetchall()
from collections import Counter
skill_counts = Counter()
for row in rows:
try:
skills = json.loads(row[0])
for skill in skills:
if isinstance(skill, str) and skill.strip():
skill_counts[skill.strip()] += 1
except (json.JSONDecodeError, TypeError):
pass
return [
{"skill": skill, "job_count": count}
for skill, count in skill_counts.most_common(50)
]
Ethical Considerations and Rate Limiting
Upwork's ToS prohibits scraping — that's standard. For the API, you're on firmer ground since they've given you authorized access, but the rate limits exist for a reason. Guidelines:
- API: Stay under 30 requests per minute. Upwork will hard-block your key if you repeatedly exceed this.
- Web scraping: Minimum 5 seconds between profile requests, ideally 8-12 with randomization.
- Data use: Hourly rates and skills are publicly visible. Job success scores appear on public profiles. Earnings are aggregated, not transaction-level. None of this is particularly sensitive in its published form.
- Volume: Building a dataset for research or tooling is reasonable. Building a database to sell profiles is not.
- Residential proxies: Required for web scraping at scale due to Cloudflare. ThorData works well for this — their residential pool has good coverage and IPs don't flag as fast as datacenter ranges.
# Complete pipeline example
if __name__ == "__main__":
oauth = get_oauth_session()
conn = init_db()
# Collect jobs via API
print("Fetching Python ML jobs...")
jobs = search_jobs("python machine learning", max_results=50, oauth=oauth)
save_jobs(conn, jobs)
print(f"Saved {len(jobs)} jobs")
# Collect freelancers via API
print("Fetching ML freelancers...")
freelancers = search_all_freelancers("machine learning python", max_results=50, oauth=oauth)
for f in freelancers:
save_freelancer(conn, f)
print(f"Saved {len(freelancers)} freelancers")
# Analyze rates
stats = rate_distribution_by_skill(conn, "machine learning")
print(f"\nMachine Learning rates (n={stats.get('sample_size', 0)}):")
print(f" 25th pct: ${stats.get('p25_rate', 0):.0f}/hr")
print(f" Median: ${stats.get('median_rate', 0):.0f}/hr")
print(f" 75th pct: ${stats.get('p75_rate', 0):.0f}/hr")
print(f" 90th pct: ${stats.get('p90_rate', 0):.0f}/hr")
# Trending skills
print("\nTop 10 trending skills in last 30 days:")
trending = trending_skills_from_jobs(conn)
for item in trending[:10]:
print(f" {item['skill']}: {item['job_count']} jobs")
conn.close()
Where This Goes
The interesting use cases for Upwork data aren't one-off scrapes — they're longitudinal. Rate tracking over time shows how category markets move. Skill demand correlation with job posting volume tells you what's hot before LinkedIn does. And since Upwork is global with rates in USD, it's one of the few sources for cross-country freelance rate benchmarking without currency conversion headaches.
Start with the API for volume, use scraping (with curl_cffi + ThorData residential proxies) for the profile details the API misses, and keep your request pacing generous. The data compounds over time — a 6-month dataset of rate snapshots tells a much richer story than a single snapshot.