How to Scrape Upwork Freelancer Data in 2026: Profiles, Rates & Job Postings

2026-04-09 ["upwork" "web scraping" "python" "freelancer data" "api"]

How to Scrape Upwork Freelancer Data in 2026: Profiles, Rates & Job Postings

Upwork has over 18 million registered freelancers as of 2026. That's a massive dataset sitting in public profiles — hourly rates, skills, job success scores, earnings history. If you're doing market rate research, building a talent sourcing tool, tracking how rates shift across categories, or understanding supply and demand in a freelance niche, Upwork is one of the best sources around. The problem is getting the data out cleanly.

There are two paths: the official Upwork API (OAuth 1.0a, limited but stable), and direct page scraping as a fallback when the API doesn't cover what you need. This guide covers both, along with the anti-detection strategies that actually work and how to store everything systematically.

Why Upwork Data Is Valuable

Before diving into the technical approach, it's worth understanding what makes this data set uniquely useful:

Hourly rates are verified by the market — unlike survey-based salary data, Upwork rates represent what freelancers actually charge and clients actually pay. A $120/hr rate on an active Upwork profile with 95% job success score and $500K+ earnings is credibly priced.

Skill demand signals are real-time — as new technologies emerge, Upwork job postings reflect demand within weeks. Tracking skill frequency across job postings gives you an early indicator of what's gaining traction in the market.

Geographic rate differentials — Upwork's global freelancer base with USD rates makes it one of the few sources for cross-country rate comparison without currency conversion complexity.

Job success scores — Upwork's JSS metric aggregates client feedback into a single score that correlates strongly with actual quality. It's a proxy metric that other platforms don't have equivalents of.

The Upwork API: OAuth 1.0a Setup

Upwork offers a real developer API, but it requires OAuth 1.0a — not the more modern OAuth 2.0 despite what some outdated documentation implies. Get your client key and secret from the Upwork Developer Portal, then complete the three-legged OAuth flow to get access tokens.

uv pip install requests requests-oauthlib beautifulsoup4 httpx

Setting Up OAuth Authentication

import requests
from requests_oauthlib import OAuth1Session
import json
import os

# Store credentials in environment variables, never in code
CLIENT_KEY = os.environ["UPWORK_CLIENT_KEY"]
CLIENT_SECRET = os.environ["UPWORK_CLIENT_SECRET"]
ACCESS_TOKEN = os.environ["UPWORK_ACCESS_TOKEN"]
ACCESS_TOKEN_SECRET = os.environ["UPWORK_ACCESS_TOKEN_SECRET"]

def get_oauth_session() -> OAuth1Session:
    """Create an authenticated OAuth 1.0a session for Upwork API calls."""
    return OAuth1Session(
        CLIENT_KEY,
        client_secret=CLIENT_SECRET,
        resource_owner_key=ACCESS_TOKEN,
        resource_owner_secret=ACCESS_TOKEN_SECRET,
    )


def api_request(
    endpoint: str,
    params: dict = None,
    oauth: OAuth1Session = None,
    max_retries: int = 3,
) -> dict:
    """
    Make an Upwork API request with rate limit handling and retries.

    Returns the parsed JSON response, or {} on failure.
    """
    import time

    if oauth is None:
        oauth = get_oauth_session()

    url = f"https://www.upwork.com{endpoint}"
    default_params = {"format": "json"}
    if params:
        default_params.update(params)

    for attempt in range(max_retries):
        try:
            resp = oauth.get(url, params=default_params, timeout=20)
        except Exception as e:
            print(f"Request error (attempt {attempt + 1}): {e}")
            time.sleep(2 ** attempt)
            continue

        if resp.status_code == 200:
            return resp.json()
        elif resp.status_code == 429:
            wait = int(resp.headers.get("Retry-After", 30))
            print(f"Rate limited. Waiting {wait}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(wait)
        elif resp.status_code == 403:
            print(f"Access denied for {endpoint}")
            return {}
        elif resp.status_code == 404:
            print(f"Not found: {endpoint}")
            return {}
        else:
            print(f"Error {resp.status_code} on {endpoint}: {resp.text[:200]}")
            return {}

    return {}

Fetching Freelancer Profiles via API

def get_freelancer_profile(username: str, oauth: OAuth1Session = None) -> dict:
    """
    Fetch a freelancer's profile by Upwork username.

    The profile contains hourly rate, skills, JSS, earnings, and portfolio info.
    Note: username here is the Upwork profile URL slug, not the ~ID format.
    """
    data = api_request(
        f"/api/profiles/v1/providers/{username}",
        oauth=oauth,
    )
    profile = data.get("profile", {})
    if not profile:
        return {}

    # Extract skills list (response format can vary)
    skills_raw = profile.get("skills", {})
    if isinstance(skills_raw, dict):
        skill_items = skills_raw.get("skill", [])
    elif isinstance(skills_raw, list):
        skill_items = skills_raw
    else:
        skill_items = []

    skills = [
        s.get("o:skill") if isinstance(s, dict) else str(s)
        for s in skill_items
    ]

    return {
        "username": username,
        "name": profile.get("dev_full_name"),
        "title": profile.get("dev_blurb"),
        "hourly_rate": profile.get("dev_bill_rate"),
        "skills": skills,
        "job_success_score": profile.get("dev_recent_rank_percentile"),
        "total_earnings": profile.get("dev_total_revenue"),
        "total_hours": profile.get("dev_total_hours_rounded"),
        "country": profile.get("dev_country"),
        "timezone": profile.get("dev_timezone"),
        "member_since": profile.get("dev_member_since"),
        "last_activity": profile.get("dev_last_activity"),
        "availability": profile.get("dev_availability"),
        "profile_url": profile.get("profile_url"),
        "feedback_score": profile.get("dev_score"),
        "response_time": profile.get("dev_response_time"),
    }


# Fetch a specific profile
oauth = get_oauth_session()
profile = get_freelancer_profile("some-username", oauth=oauth)
print(f"{profile['name']} — ${profile['hourly_rate']}/hr — JSS: {profile['job_success_score']}")

Searching Freelancers

The profile search endpoint lets you find freelancers by skills and categories:

def search_freelancers(
    query: str,
    category: str = None,
    min_rate: float = None,
    max_rate: float = None,
    min_jss: float = None,
    page: int = 0,
    per_page: int = 10,
    oauth: OAuth1Session = None,
) -> list[dict]:
    """
    Search for freelancers by skill keyword and filters.

    Args:
        query: Skill keyword (e.g., "python machine learning")
        category: Category v2 name (e.g., "Web, Mobile & Software Dev")
        min_rate/max_rate: Hourly rate range in USD
        min_jss: Minimum job success score (0-100)
        page: Page offset (0-based, 10 results per page)
    """
    params = {
        "q": query,
        "paging": f"{page * per_page};{per_page}",
    }
    if category:
        params["category2"] = category
    if min_rate:
        params["hourly_rate"] = f"{int(min_rate)}:"
    if max_rate:
        existing_rate = params.get("hourly_rate", ":")
        min_part = existing_rate.split(":")[0]
        params["hourly_rate"] = f"{min_part}:{int(max_rate)}"

    data = api_request("/api/profiles/v2/search/providers", params=params, oauth=oauth)

    freelancers = []
    providers = data.get("providers", {})
    if isinstance(providers, dict):
        provider_list = providers.get("provider", [])
    else:
        provider_list = []

    for p in provider_list:
        freelancers.append({
            "username": p.get("dev_username"),
            "name": p.get("dev_full_name"),
            "title": p.get("dev_blurb"),
            "hourly_rate": p.get("dev_bill_rate"),
            "country": p.get("dev_country"),
            "jss": p.get("dev_recent_rank_percentile"),
            "total_hours": p.get("dev_total_hours_rounded"),
            "profile_url": p.get("profile_url"),
        })

    return freelancers


def search_all_freelancers(
    query: str,
    max_results: int = 100,
    oauth: OAuth1Session = None,
) -> list[dict]:
    """Paginate through freelancer search results."""
    all_results = []
    page = 0

    while len(all_results) < max_results:
        batch = search_freelancers(query, page=page, oauth=oauth)
        if not batch:
            break
        all_results.extend(batch)
        page += 1
        import time
        time.sleep(2.0)  # 30 req/min = 2s between requests

    return all_results[:max_results]

Searching Job Postings via API

def search_jobs(
    query: str,
    category: str = None,
    job_type: str = None,
    min_budget: float = None,
    page: int = 0,
    max_results: int = 50,
    oauth: OAuth1Session = None,
) -> list[dict]:
    """
    Search Upwork job postings via the API.

    Args:
        query: Keyword query
        category: Category v2 name
        job_type: "hourly" or "fixed-price"
        min_budget: Minimum budget in USD
    """
    jobs = []
    page_num = page

    while len(jobs) < max_results:
        params = {
            "q": query,
            "paging": f"{page_num * 10};10",
        }
        if category:
            params["category2"] = category
        if job_type:
            params["job_type"] = job_type

        data = api_request(
            "/api/profiles/v2/search/jobs",
            params=params,
            oauth=oauth,
        )

        job_data = data.get("jobs", {})
        if isinstance(job_data, dict):
            results = job_data.get("job", [])
        else:
            results = []

        if not results:
            break

        for job in results:
            skills = job.get("skills", {})
            if isinstance(skills, dict):
                skill_list = skills.get("skill", [])
            else:
                skill_list = []

            jobs.append({
                "title": job.get("title"),
                "description": (job.get("description") or "")[:500],
                "budget": job.get("budget"),
                "job_type": job.get("job_type"),
                "duration": job.get("duration"),
                "skills": skill_list if isinstance(skill_list, list) else [skill_list],
                "posted": job.get("date_created"),
                "job_id": job.get("id"),
                "url": job.get("url"),
                "client_country": job.get("client", {}).get("country") if isinstance(job.get("client"), dict) else None,
                "client_total_spent": job.get("client", {}).get("total_spent") if isinstance(job.get("client"), dict) else None,
            })

        page_num += 1
        import time
        time.sleep(2.0)

    return jobs[:max_results]

The API rate limit is 30 requests per minute per token. Stay well under it — Upwork will hard-block your key if you exceed it repeatedly.

Web Scraping Fallback: When the API Isn't Enough

The API doesn't expose everything. Portfolio items, detailed earnings breakdowns, review text, client feedback — those live on the profile page, not in the API response. Here's a scraping approach using requests and BeautifulSoup.

Profile Page Structure

Upwork renders profile pages server-side for public views, so basic HTTP requests work for getting the initial content. The profile data is embedded in Next.js page props or directly in the HTML.

import requests
from bs4 import BeautifulSoup
import json
import time
import random
from typing import Optional

BROWSER_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/125.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Referer": "https://www.upwork.com/",
}


def scrape_freelancer_page(
    profile_url: str,
    proxy: Optional[str] = None,
) -> dict:
    """
    Scrape a public Upwork freelancer profile page.

    profile_url: Full URL like https://www.upwork.com/freelancers/~01234567890abcdef
    proxy: Optional proxy URL for bypassing blocks
    """
    kwargs = {
        "headers": BROWSER_HEADERS,
        "timeout": 25,
        "allow_redirects": True,
    }
    if proxy:
        kwargs["proxies"] = {"http": proxy, "https": proxy}

    try:
        resp = requests.get(profile_url, **kwargs)
    except requests.exceptions.RequestException as e:
        print(f"Request failed for {profile_url}: {e}")
        return {}

    if resp.status_code == 403:
        print(f"Blocked (403) — try with residential proxy")
        return {}

    if resp.status_code == 404:
        print(f"Profile not found: {profile_url}")
        return {}

    if resp.status_code != 200:
        print(f"Status {resp.status_code} for {profile_url}")
        return {}

    soup = BeautifulSoup(resp.text, "html.parser")

    # Try to extract embedded JSON first (most reliable)
    next_data = soup.find("script", {"id": "__NEXT_DATA__"})
    if next_data:
        try:
            data = json.loads(next_data.string)
            # Navigate Next.js page props
            profile_data = (
                data.get("props", {})
                .get("pageProps", {})
                .get("profile", {})
            )
            if profile_data:
                return _extract_from_next_data(profile_data, profile_url)
        except (json.JSONDecodeError, AttributeError):
            pass

    # Fall back to HTML parsing
    return _extract_from_html(soup, profile_url)


def _extract_from_html(soup: BeautifulSoup, url: str) -> dict:
    """Extract profile data from HTML elements (fallback method)."""

    def safe_text(selector: str) -> Optional[str]:
        el = soup.select_one(selector)
        return el.get_text(strip=True) if el else None

    # Name and title
    name = safe_text("h1[itemprop='name']") or safe_text("[data-test='freelancer-name']")
    title = safe_text("p.title") or safe_text("[data-test='freelancer-title']")

    # Rate
    rate_el = soup.select_one("[data-test='dev-bill-rate']") or soup.select_one(".rate")
    rate = rate_el.get_text(strip=True) if rate_el else None

    # Skills
    skill_els = (
        soup.select("[data-test='badge-label']") or
        soup.select(".skills-list .skill") or
        soup.select("[data-test='freelancer-skill']")
    )
    skills = [s.get_text(strip=True) for s in skill_els]

    # Stats
    jss = safe_text("[data-test='job-success-score']") or safe_text(".job-success-score")
    earnings = safe_text("[data-test='earned-amount']") or safe_text(".total-earned")
    hours = safe_text("[data-test='total-hours']")

    # Reviews
    reviews = []
    for review_el in soup.select("[data-test='feedback-card']")[:5]:
        reviews.append({
            "text": (review_el.get_text(strip=True) or "")[:300],
        })

    return {
        "name": name,
        "title": title,
        "hourly_rate": rate,
        "skills": skills,
        "job_success_score": jss,
        "total_earnings": earnings,
        "total_hours": hours,
        "recent_reviews": reviews,
        "url": url,
    }


def _extract_from_next_data(profile_data: dict, url: str) -> dict:
    """Extract profile data from Next.js embedded JSON."""
    return {
        "name": profile_data.get("name"),
        "title": profile_data.get("title"),
        "hourly_rate": profile_data.get("hourlyRate", {}).get("amount") if isinstance(profile_data.get("hourlyRate"), dict) else profile_data.get("hourlyRate"),
        "skills": [s.get("prettyName", s.get("name", "")) for s in profile_data.get("skills", [])],
        "job_success_score": profile_data.get("jobSuccessScore"),
        "total_earnings": profile_data.get("totalEarnings"),
        "total_hours": profile_data.get("totalHours"),
        "country": profile_data.get("location", {}).get("country") if isinstance(profile_data.get("location"), dict) else None,
        "availability": profile_data.get("availability"),
        "url": url,
    }

Anti-Bot Measures: The Full Picture

Upwork's bot detection is aggressive compared to most job marketplaces — they're protecting commercial data with real monetary value. Here's a systematic breakdown of what you're dealing with.

Layer 1: Cloudflare

The first obstacle. Most requests from datacenter IPs (AWS, DigitalOcean, Hetzner, Vultr ranges) get challenged or silently 403'd before they reach Upwork's servers. Cloudflare validates:

TLS fingerprint — the cipher suite ordering in your TLS handshake. Python's requests has a characteristic fingerprint that differs from browser TLS. Tools like curl_cffi mimic Chrome's TLS fingerprint to bypass this.
IP reputation — datacenter IP ranges have poor reputation scores with Cloudflare. Residential IPs from real ISPs score much better.
HTTP/2 fingerprint — the order and values of HTTP/2 headers differ between browsers and HTTP clients.

Layer 2: Session Validation

Upwork checks that your cookies, localStorage tokens, and request patterns look like a real browser session. A cold HTTP request with no prior cookies gets flagged quickly. When you visit Upwork for the first time, their JavaScript sets several tracking cookies and localStorage values that subsequent requests are expected to carry.

Layer 3: Behavioral Analysis

Continuous requests at uniform intervals look automated. Browser users click, scroll, pause, navigate — the timing patterns are irregular. Very regular request intervals (like a scraper with time.sleep(5)) can trigger increased scrutiny even with legitimate session cookies.

Layer 4: IP Reputation Scoring

Upwork maintains reputation scores for IP ranges. IPs that have accessed Upwork at scale before — even if they haven't violated rate limits — may have elevated risk scores. This is why IP rotation matters even when individual requests are within rate limits.

Using `curl_cffi` for TLS Fingerprint Spoofing

from curl_cffi import requests as cffi_requests
import time
import random

def scrape_upwork_profile_cffi(
    profile_url: str,
    proxy: Optional[str] = None,
) -> dict:
    """
    Scrape Upwork profile using curl_cffi to mimic Chrome's TLS fingerprint.
    Significantly reduces Cloudflare detection vs. standard requests/httpx.
    """
    headers = {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
        "accept-language": "en-US,en;q=0.9",
        "accept-encoding": "gzip, deflate, br",
        "sec-ch-ua": '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": '"Windows"',
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "upgrade-insecure-requests": "1",
    }

    kwargs = {
        "headers": headers,
        "impersonate": "chrome124",  # Mimic Chrome 124's full TLS+HTTP2 fingerprint
        "timeout": 25,
    }
    if proxy:
        kwargs["proxies"] = {"http": proxy, "https": proxy}

    try:
        resp = cffi_requests.get(profile_url, **kwargs)
    except Exception as e:
        return {"error": str(e)}

    if resp.status_code != 200:
        return {"status_code": resp.status_code}

    soup = BeautifulSoup(resp.text, "html.parser")
    return _extract_from_html(soup, profile_url)

Install with: uv pip install curl-cffi

Proxy Strategy with ThorData

Upwork's bot detection requires residential proxies — datacenter ranges are consistently blocked at Cloudflare. ThorData's residential proxy network provides IPs from real ISP ranges that pass Cloudflare's IP reputation checks.

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "gate.thordata.net"
THORDATA_PORT = 9000

def make_proxy(country: str = "us", session_id: str = None) -> str:
    """
    Build a ThorData residential proxy URL.

    Use session_id for sticky sessions (same IP across multiple requests).
    Sticky sessions are important for Upwork to maintain session continuity.
    """
    user = f"{THORDATA_USER}-country-{country}"
    if session_id:
        user += f"-session-{session_id}"
    return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"


def scrape_upwork_batch(
    profile_urls: list[str],
    delay_range: tuple = (5, 12),
) -> list[dict]:
    """
    Scrape a batch of Upwork profiles with randomized delays and proxy rotation.
    Rotate proxy per profile but use sticky session within profile scrape.
    """
    import random
    import string

    results = []

    for url in profile_urls:
        # Fresh sticky session per profile
        session_id = "".join(random.choices(string.ascii_lowercase, k=8))
        proxy = make_proxy(country="us", session_id=session_id)

        profile = scrape_freelancer_page(url, proxy=proxy)
        if profile:
            results.append(profile)
            print(f"Got: {profile.get('name', 'unknown')} @ {profile.get('hourly_rate', '?')}/hr")

        # Randomized delay mimics human browsing patterns
        delay = random.uniform(*delay_range)
        time.sleep(delay)

    return results

Rotate the proxy per profile scrape, not per individual request within a profile. Reusing the same residential IP for 50+ consecutive requests still triggers rate limits, but using a fresh IP per profile while keeping the same IP for the page load chain of a single profile prevents session mismatch detection.

Storing Data in SQLite

import sqlite3
import json
from datetime import datetime, timezone

def init_db(db_path: str = "upwork_data.db") -> sqlite3.Connection:
    """Initialize the Upwork data database."""
    conn = sqlite3.connect(db_path)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS freelancers (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            username TEXT UNIQUE,
            profile_url TEXT,
            name TEXT,
            title TEXT,
            hourly_rate TEXT,
            hourly_rate_numeric REAL,
            skills TEXT,
            job_success_score TEXT,
            total_earnings TEXT,
            total_hours TEXT,
            country TEXT,
            availability TEXT,
            first_seen TEXT,
            last_updated TEXT
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS rate_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            username TEXT NOT NULL,
            hourly_rate REAL,
            job_success_score REAL,
            recorded_at TEXT NOT NULL
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS jobs (
            id TEXT PRIMARY KEY,
            title TEXT,
            description TEXT,
            budget TEXT,
            job_type TEXT,
            duration TEXT,
            skills TEXT,
            posted TEXT,
            client_country TEXT,
            client_total_spent REAL,
            scraped_at TEXT NOT NULL
        )
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_freelancers_country ON freelancers(country)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_rate_snapshots_user ON rate_snapshots(username)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_jobs_posted ON jobs(posted)")

    conn.commit()
    return conn


def parse_rate(rate_str: str) -> Optional[float]:
    """Parse '$125.00' or '125' to float."""
    if not rate_str:
        return None
    try:
        cleaned = rate_str.replace("$", "").replace(",", "").split("/")[0].strip()
        return float(cleaned)
    except (ValueError, AttributeError):
        return None


def save_freelancer(conn: sqlite3.Connection, profile: dict):
    """Save or update a freelancer profile, tracking rate history."""
    now = datetime.now(timezone.utc).isoformat()
    username = profile.get("username") or (profile.get("url", "").split("/")[-1])
    rate_numeric = parse_rate(profile.get("hourly_rate"))

    # Track rate snapshot
    if username and rate_numeric:
        conn.execute("""
            INSERT INTO rate_snapshots (username, hourly_rate, job_success_score, recorded_at)
            VALUES (?,?,?,?)
        """, (username, rate_numeric, profile.get("job_success_score"), now))

    conn.execute("""
        INSERT INTO freelancers
        (username, profile_url, name, title, hourly_rate, hourly_rate_numeric,
         skills, job_success_score, total_earnings, total_hours,
         country, availability, first_seen, last_updated)
        VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?)
        ON CONFLICT(username) DO UPDATE SET
            name=excluded.name,
            title=excluded.title,
            hourly_rate=excluded.hourly_rate,
            hourly_rate_numeric=excluded.hourly_rate_numeric,
            skills=excluded.skills,
            job_success_score=excluded.job_success_score,
            total_earnings=excluded.total_earnings,
            total_hours=excluded.total_hours,
            availability=excluded.availability,
            last_updated=excluded.last_updated
    """, (
        username,
        profile.get("profile_url") or profile.get("url"),
        profile.get("name"),
        profile.get("title"),
        profile.get("hourly_rate"),
        rate_numeric,
        json.dumps(profile.get("skills", [])),
        str(profile.get("job_success_score", "")),
        profile.get("total_earnings"),
        profile.get("total_hours"),
        profile.get("country"),
        profile.get("availability"),
        now,
        now,
    ))
    conn.commit()


def save_jobs(conn: sqlite3.Connection, jobs: list[dict]):
    """Save job postings to the database."""
    now = datetime.now(timezone.utc).isoformat()

    for job in jobs:
        budget = job.get("budget")
        budget_numeric = None
        if isinstance(budget, (int, float)):
            budget_numeric = float(budget)
        elif isinstance(budget, str):
            budget_numeric = parse_rate(budget)

        conn.execute("""
            INSERT OR REPLACE INTO jobs
            (id, title, description, budget, job_type, duration, skills,
             posted, client_country, client_total_spent, scraped_at)
            VALUES (?,?,?,?,?,?,?,?,?,?,?)
        """, (
            str(job.get("job_id", "")),
            job.get("title"),
            job.get("description"),
            str(budget) if budget else None,
            job.get("job_type"),
            job.get("duration"),
            json.dumps(job.get("skills", [])),
            job.get("posted"),
            job.get("client_country"),
            budget_numeric,
            now,
        ))

    conn.commit()

Market Rate Analysis

With freelancer data in SQLite, you can build useful market intelligence:

def rate_distribution_by_skill(conn: sqlite3.Connection, skill: str) -> dict:
    """
    Analyze hourly rate distribution for freelancers with a specific skill.
    Useful for understanding market rates in a niche.
    """
    rows = conn.execute("""
        SELECT hourly_rate_numeric, country
        FROM freelancers
        WHERE skills LIKE ?
          AND hourly_rate_numeric IS NOT NULL
          AND hourly_rate_numeric > 0
        ORDER BY hourly_rate_numeric
    """, (f"%{skill}%",)).fetchall()

    if not rows:
        return {}

    rates = [r[0] for r in rows]
    rates.sort()

    n = len(rates)
    p25 = rates[int(n * 0.25)]
    p50 = rates[int(n * 0.50)]
    p75 = rates[int(n * 0.75)]
    p90 = rates[int(n * 0.90)]

    # Rate by country
    from collections import defaultdict
    by_country = defaultdict(list)
    for rate, country in rows:
        if country:
            by_country[country].append(rate)

    country_medians = {
        c: sorted(r)[len(r)//2]
        for c, r in by_country.items()
        if len(r) >= 3
    }

    return {
        "skill": skill,
        "sample_size": n,
        "p25_rate": p25,
        "median_rate": p50,
        "p75_rate": p75,
        "p90_rate": p90,
        "avg_rate": round(sum(rates) / n, 2),
        "country_medians": dict(sorted(country_medians.items(), key=lambda x: x[1], reverse=True)[:10]),
    }


def trending_skills_from_jobs(conn: sqlite3.Connection, days: int = 30) -> list[dict]:
    """
    Find skills that appear most frequently in recent job postings.
    Higher frequency = more demand = potentially higher rates.
    """
    from datetime import datetime, timedelta
    cutoff = (datetime.now(timezone.utc) - timedelta(days=days)).isoformat()

    rows = conn.execute("""
        SELECT skills FROM jobs
        WHERE posted >= ? AND skills IS NOT NULL
    """, (cutoff,)).fetchall()

    from collections import Counter
    skill_counts = Counter()

    for row in rows:
        try:
            skills = json.loads(row[0])
            for skill in skills:
                if isinstance(skill, str) and skill.strip():
                    skill_counts[skill.strip()] += 1
        except (json.JSONDecodeError, TypeError):
            pass

    return [
        {"skill": skill, "job_count": count}
        for skill, count in skill_counts.most_common(50)
    ]

Ethical Considerations and Rate Limiting

Upwork's ToS prohibits scraping — that's standard. For the API, you're on firmer ground since they've given you authorized access, but the rate limits exist for a reason. Guidelines:

API: Stay under 30 requests per minute. Upwork will hard-block your key if you repeatedly exceed this.
Web scraping: Minimum 5 seconds between profile requests, ideally 8-12 with randomization.
Data use: Hourly rates and skills are publicly visible. Job success scores appear on public profiles. Earnings are aggregated, not transaction-level. None of this is particularly sensitive in its published form.
Volume: Building a dataset for research or tooling is reasonable. Building a database to sell profiles is not.
Residential proxies: Required for web scraping at scale due to Cloudflare. ThorData works well for this — their residential pool has good coverage and IPs don't flag as fast as datacenter ranges.

# Complete pipeline example
if __name__ == "__main__":
    oauth = get_oauth_session()
    conn = init_db()

    # Collect jobs via API
    print("Fetching Python ML jobs...")
    jobs = search_jobs("python machine learning", max_results=50, oauth=oauth)
    save_jobs(conn, jobs)
    print(f"Saved {len(jobs)} jobs")

    # Collect freelancers via API
    print("Fetching ML freelancers...")
    freelancers = search_all_freelancers("machine learning python", max_results=50, oauth=oauth)
    for f in freelancers:
        save_freelancer(conn, f)
    print(f"Saved {len(freelancers)} freelancers")

    # Analyze rates
    stats = rate_distribution_by_skill(conn, "machine learning")
    print(f"\nMachine Learning rates (n={stats.get('sample_size', 0)}):")
    print(f"  25th pct: ${stats.get('p25_rate', 0):.0f}/hr")
    print(f"  Median: ${stats.get('median_rate', 0):.0f}/hr")
    print(f"  75th pct: ${stats.get('p75_rate', 0):.0f}/hr")
    print(f"  90th pct: ${stats.get('p90_rate', 0):.0f}/hr")

    # Trending skills
    print("\nTop 10 trending skills in last 30 days:")
    trending = trending_skills_from_jobs(conn)
    for item in trending[:10]:
        print(f"  {item['skill']}: {item['job_count']} jobs")

    conn.close()

Where This Goes

The interesting use cases for Upwork data aren't one-off scrapes — they're longitudinal. Rate tracking over time shows how category markets move. Skill demand correlation with job posting volume tells you what's hot before LinkedIn does. And since Upwork is global with rates in USD, it's one of the few sources for cross-country freelance rate benchmarking without currency conversion headaches.

Start with the API for volume, use scraping (with curl_cffi + ThorData residential proxies) for the profile details the API misses, and keep your request pacing generous. The data compounds over time — a 6-month dataset of rate snapshots tells a much richer story than a single snapshot.

How to Scrape Upwork Freelancer Data in 2026: Profiles, Rates & Job Postings

How to Scrape Upwork Freelancer Data in 2026: Profiles, Rates & Job Postings

Why Upwork Data Is Valuable

The Upwork API: OAuth 1.0a Setup

Setting Up OAuth Authentication

Fetching Freelancer Profiles via API

Searching Freelancers

Searching Job Postings via API

Web Scraping Fallback: When the API Isn't Enough

Profile Page Structure

Anti-Bot Measures: The Full Picture

Layer 1: Cloudflare

Layer 2: Session Validation

Layer 3: Behavioral Analysis

Layer 4: IP Reputation Scoring

Using curl_cffi for TLS Fingerprint Spoofing

Proxy Strategy with ThorData

Storing Data in SQLite

Market Rate Analysis

Ethical Considerations and Rate Limiting

Where This Goes

Using `curl_cffi` for TLS Fingerprint Spoofing