LinkedIn is the world's largest professional network with over 1 billion members across 200 countries. The profile data on LinkedIn -- job titles, employers, career histories, skills, education, and professional connections -- is some of the most valuable professional data available anywhere. For recruiters, sales teams, market researchers, and data analysts, access to this data at scale can transform workflows that would otherwise take hundreds of manual hours.
The challenge is that LinkedIn guards this data more aggressively than almost any other platform. Their official API requires a partner program application that rejects most independent developers, and even approved partners get extremely limited data access. Meanwhile, LinkedIn's bot detection system is one of the most sophisticated on the web, capable of detecting and blocking automated access within just a handful of requests.
But there is a practical path forward. LinkedIn public profiles contain structured data in their HTML -- Open Graph meta tags and JSON-LD schema markup -- that was designed for search engines and social media link previews. This data is served in the initial HTML response, before any JavaScript executes, and it includes names, job titles, employers, and profile photos. For use cases that need this subset of profile data, it is accessible without any API key or authenticated session.
This guide covers the complete workflow for extracting LinkedIn profile data in 2026: from the simplest meta tag approach to full headless browser scraping, including the specific anti-detection techniques that work against LinkedIn's defenses, proxy rotation strategies, error handling, and real-world use cases. Every code example is tested and working.
LinkedIn shut down most of its public API access years ago. What remains is the Marketing and Compliance APIs, which have steep requirements:
The Marketing API is designed for advertising platforms and HR tech companies with established businesses and compliance departments. If you are an independent developer building a research tool, a startup validating a market, or an analyst who needs professional data for a one-off project, the official API route is effectively closed to you.
This is where the publicly available HTML data becomes useful. LinkedIn serves structured data in every public profile page for the explicit purpose of search engine indexing and social media link previews. This is the same data Google crawls, the same data that appears when someone shares a LinkedIn profile on Twitter or Slack, and the same data the profile owner chose to make public by setting their profile visibility to "public."
When you load a LinkedIn public profile in a browser, the page source contains Open Graph meta tags and optionally JSON-LD structured data. These are present in the initial HTML response -- no JavaScript rendering required.
<!-- Available on every public LinkedIn profile -->
<meta property="og:title" content="Jane Smith - VP of Engineering at TechCorp">
<meta property="og:description" content="Experience: VP of Engineering at TechCorp. Education: MIT...">
<meta property="og:image" content="https://media.licdn.com/dms/image/v2/...">
<meta property="og:url" content="https://www.linkedin.com/in/janesmith">
<meta property="og:type" content="profile">
<meta property="profile:first_name" content="Jane">
<meta property="profile:last_name" content="Smith">
<!-- Present on ~60-70% of public profiles -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Person",
"name": "Jane Smith",
"jobTitle": "VP of Engineering",
"worksFor": {
"@type": "Organization",
"name": "TechCorp"
},
"url": "https://www.linkedin.com/in/janesmith",
"image": "https://media.licdn.com/dms/image/v2/...",
"address": {
"@type": "PostalAddress",
"addressLocality": "San Francisco Bay Area"
}
}
</script>
| Data | Meta Tags | JSON-LD | Headless Browser |
|---|---|---|---|
| Full name | Yes | Yes | Yes |
| Current job title | In og:title | Yes | Yes |
| Current employer | In og:title | Yes | Yes |
| Profile photo URL | Yes | Yes | Yes |
| Location | Partial | Yes | Yes |
| Summary/bio | Truncated | No | Yes |
| Full work history | No | No | Yes |
| Education | Partial | No | Yes |
| Skills list | No | No | Yes |
| Connection count | No | No | Sometimes |
| Contact info | No | No | Auth required |
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://www.linkedin.com/in/satyanadella/'], 'maxItems': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://www.linkedin.com/in/satyanadella/'], 'maxItems': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
This approach fetches the raw HTML of a public profile and extracts data from meta tags and JSON-LD. It is fast (sub-second per profile), lightweight (no browser needed), and works for the subset of data that LinkedIn serves in the initial HTML.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://www.linkedin.com/in/satyanadella/'], 'maxItems': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
When a profile includes JSON-LD (roughly 60-70% of public profiles), it follows the schema.org Person specification. This is the cleanest data source on the page because it uses a standardized format that LinkedIn maintains for SEO purposes.
Here is the full range of what the JSON-LD block can contain:
{
"@context": "https://schema.org",
"@type": "Person",
"name": "Jane Smith",
"jobTitle": "VP of Engineering",
"worksFor": {
"@type": "Organization",
"name": "TechCorp"
},
"url": "https://www.linkedin.com/in/janesmith",
"image": "https://media.licdn.com/dms/image/v2/...",
"address": {
"@type": "PostalAddress",
"addressLocality": "San Francisco Bay Area"
},
"alumniOf": {
"@type": "EducationalOrganization",
"name": "Massachusetts Institute of Technology"
},
"sameAs": [
"https://twitter.com/janesmith",
"https://github.com/janesmith"
]
}
A robust JSON-LD parser that handles all the variants I have encountered:
def parse_json_ld_person(data: dict) -> dict:
"""Parse a schema.org Person JSON-LD object into clean fields."""
result = {}
# Basic identity
result["full_name"] = data.get("name", "")
result["current_title"] = data.get("jobTitle", "")
# Current employer -- can be string or Organization object
works_for = data.get("worksFor", {})
if isinstance(works_for, dict):
result["current_company"] = works_for.get("name", "")
elif isinstance(works_for, str):
result["current_company"] = works_for
# Location -- can be string or PostalAddress object
address = data.get("address", {})
if isinstance(address, dict):
result["location"] = address.get("addressLocality", "")
elif isinstance(address, str):
result["location"] = address
# Education -- can be single object or list
alumni_of = data.get("alumniOf", [])
if isinstance(alumni_of, dict):
alumni_of = [alumni_of]
result["education"] = []
for edu in alumni_of:
if isinstance(edu, dict):
result["education"].append(edu.get("name", ""))
elif isinstance(edu, str):
result["education"].append(edu)
# Social links
same_as = data.get("sameAs", [])
if isinstance(same_as, str):
same_as = [same_as]
result["social_links"] = same_as
# Profile image
result["image_url"] = data.get("image", "")
# Profile URL
result["profile_url"] = data.get("url", "")
return result
When you need more than what meta tags provide -- full work history, education details, skills, and about section -- you need a headless browser that renders the full JavaScript application.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://www.linkedin.com/in/satyanadella/'], 'maxItems': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Here is the complete output schema for a fully scraped LinkedIn profile. Fields are populated based on which scraping method you use:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://www.linkedin.com/in/satyanadella/'], 'maxItems': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
For batch results, wrap in a container:
{
"scrape_run": {
"timestamp": "2026-03-29T10:00:00Z",
"total_profiles": 100,
"successful": 87,
"auth_walled": 8,
"blocked": 5,
"method": "meta_tags"
},
"profiles": [ ... ],
"errors": [
{
"url": "https://www.linkedin.com/in/example",
"error": "LinkedIn 999 -- bot detection",
"timestamp": "2026-03-29T10:05:23Z"
}
]
}
LinkedIn's bot detection is arguably the most aggressive of any major platform. Here are the specific techniques that matter:
LinkedIn blocks datacenter IPs almost instantly. You will get HTTP 999 on your first request from an AWS or GCP IP. Residential proxies route through real ISP connections that LinkedIn cannot easily distinguish from real users.
ThorData's rotating residential proxies work well for LinkedIn specifically. Their pool includes IPs from ISPs that LinkedIn does not flag as aggressively as typical proxy network ranges. The per-GB pricing makes sense when you are fetching individual profile pages.
LinkedIn checks for missing or inconsistent headers. A real browser sends 12+ headers. Your scraper should too:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://www.linkedin.com/in/satyanadella/'], 'maxItems': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
# Use the managed scraper — no maintenance, no blocks, no auth headaches
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN') # get yours at apify.com
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://linkedin.com/in/example'], 'maxResults': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Real users do not navigate directly to profile URLs. They come from Google, LinkedIn search, or other LinkedIn pages. Setting a plausible referrer improves success rates:
# Option 1: Come from Google (most natural for public profiles)
headers["Referer"] = f"https://www.google.com/search?q={profile_name}+linkedin"
# Option 2: Come from LinkedIn search (for bulk scraping)
headers["Referer"] = "https://www.linkedin.com/search/results/people/"
LinkedIn is one of the hardest targets for proxy-based scraping. Here are the specific requirements:
| Proxy Type | LinkedIn Success Rate | Cost | Recommended |
|---|---|---|---|
| Datacenter | 0-5% | $1-5/GB | No |
| Residential rotating | 50-70% | $5-15/GB | Yes |
| ISP (static residential) | 70-85% | $15-30/GB | Best |
| Mobile | 80-90% | $20-40/GB | Overkill |
LinkedIn's rate limits for unauthenticated profile access:
# Use the managed scraper — no maintenance, no blocks, no auth headaches
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN') # get yours at apify.com
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://linkedin.com/in/example'], 'maxResults': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Cause: LinkedIn's custom status code meaning "we know you are a bot." Triggered by datacenter IPs, missing headers, or too many requests.
Fix: Switch to residential proxy. Ensure full header set. Add 8+ second delays between requests. Do not retry from the same IP for at least 2 hours.
Cause: LinkedIn redirected to login page. Can happen even for public profiles when the IP has low reputation or the request lacks proper headers.
Fix: Add complete Sec-Fetch-* headers. Use a residential proxy from the US or EU. Include a plausible Referer header. Wait 30 minutes before retrying from the same IP.
Cause: The profile is not set to public, or LinkedIn served a minimal page without meta tags to this specific request.
Fix: Verify the profile is actually public by checking in a real browser. Try the Playwright approach which renders JavaScript and may get more data. Some profiles genuinely have no public data.
Cause: Not all profiles include JSON-LD. It is present on roughly 60-70% of public profiles.
Fix: This is expected. Fall back to meta tag parsing which is available on all public profiles. The meta tags give you name, headline, photo, and a truncated description.
Cause: LinkedIn CDN URLs are time-limited and IP-restricted. The URL you scraped may have expired or only works from specific IPs.
Fix: Download the photo immediately during scraping. Do not store the URL for later download. Use the same proxy for the image request that you used for the profile page.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/linkedin-profile-scraper').call(
run_input={'profileUrls': ['https://www.linkedin.com/in/satyanadella/'], 'maxItems': 10}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Sales teams use LinkedIn profile data to enrich their CRM with up-to-date job titles, employers, and locations. When a lead's title changes from "Engineering Manager" to "VP of Engineering," that is a trigger for a sales conversation about tools for larger teams. Key fields: current_title, current_company, location.
Recruiters search for candidates matching specific criteria and need structured data to filter and rank them. Scraping public profiles for people in specific roles at specific companies creates candidate pipelines faster than manual LinkedIn searching. Key fields: current_title, experience, education, location.
Analyzing the professional backgrounds of employees at competitor companies reveals hiring patterns, team structures, and strategic priorities. If a fintech startup hires 15 ML engineers in three months, they are likely building an AI product. Key fields: current_company, current_title, experience.
VCs and investors analyze founder backgrounds, team composition, and employee growth as part of due diligence. The professional history of a founding team -- where they worked before, what they studied, how long they have been in the industry -- is a signal for startup viability. Key fields: experience, education, full_name.
Researchers study labor market dynamics, career mobility, gender representation in leadership, and professional network structures using LinkedIn data. Systematic collection of public profile data enables large-scale empirical studies that would be impossible manually. Key fields: all fields, especially experience history for career trajectory analysis.
If you need LinkedIn profile data at scale without maintaining proxy rotation, browser fingerprinting, and rate limit logic yourself, managed scrapers handle it.
I built a LinkedIn Profile Scraper on Apify that extracts public profile data including name, headline, current position, location, and profile image. It handles proxy rotation, retries, and LinkedIn's bot detection internally. You pass in profile URLs and get structured JSON back.
The advantage is maintenance. LinkedIn changes their bot detection every few weeks and their page structure every few months. A managed actor absorbs those changes so your pipeline does not break. The cost per profile is typically a fraction of a cent, which is almost always cheaper than the engineering time to maintain your own scraper.
LinkedIn scraping exists in a legal gray area. Here is what you need to know:
For production use cases, genuinely evaluate whether the official API partner program could work for you before choosing the scraping route. If scraping is the right choice, keep volumes reasonable, respect rate limits, and have a clear legitimate purpose for the data you collect.
LinkedIn profile scraping in 2026 has two viable paths: meta tag extraction for lightweight data (name, title, company, photo) and headless browser scraping for comprehensive data (full work history, education, about section). Both require residential proxies and careful rate limiting.
The meta tag approach is the right starting point for most use cases. It is fast, lightweight, and gives you the most commonly needed fields. Graduate to Playwright only when you specifically need full work history or education details.
For anything beyond a few dozen profiles, invest in a proper proxy rotation setup or use a managed scraping service. LinkedIn's bot detection is too aggressive to fight with a single IP and basic headers. The engineering time you save by using the right infrastructure from the start will pay for itself quickly.
Built by Crypto Volume Signal Scanner -- tools for developers who work with web data. See also: Scrape Google Search Results | Scraping AliExpress Products | YouTube Stats Without the API
Try Apify free — the platform powering these scrapers. Get started →