Crunchbase used to have a genuinely useful free API. From 2012 until early 2022 developers could sign up, get a key, and pull company profiles, funding rounds, and investor records at a generous rate. That is gone. As of 2026 the free "Open Data Map" tier no longer accepts new registrations, and the full REST API is part of Crunchbase Enterprise -- priced around $2,000/month at the low end, with real seat counts starting closer to $15k/year.
For most developers and researchers that pricing is a non-starter. But Crunchbase company pages, funding round pages, and investor profiles are still served as public HTML. This post covers what you can realistically extract with Python in April 2026 and where the walls are.
The current developer portal lists two products:
The free tier that existed before 2022 -- 200 calls/min with full funding data -- does not exist anymore. Legacy API keys from before the pricing change have been rotated out.
What this means in practice: if you want to build a deal-flow dashboard, a competitor tracker, a sector-level funding heatmap, or a research dataset, the official API path costs $24k/year minimum. The data itself is mostly on the public site.
Crunchbase serves three useful public surfaces without login:
crunchbase.com/organization/<slug> renders the overview, description, founded date, headquarters, total funding, last funding round, and top investors.crunchbase.com/funding_round/<uuid> shows the round type, amount, date, and participating investors.crunchbase.com/organization/<slug> for the investor side lists portfolio companies and recent investments.What requires login: the full investor list beyond the first 3-5, detailed cap table entries, advanced search filters, and the "similar companies" graph. The login wall is lazy -- most of the JSON data is embedded in the initial HTML payload before the login check fires.
The cleanest approach exploits a simple fact: Crunchbase is a Next.js app. Every public page embeds a __NEXT_DATA__ script tag containing the pre-rendered props as JSON. That JSON is a near-complete dump of what the React tree will render, including fields that are hidden behind paywalls in the UI.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/crunchbase-scraper').call(
run_input={'companyUrls': ['https://www.crunchbase.com/organization/openai'], 'maxItems': 25}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
For the funding round detail, the same __NEXT_DATA__ trick applies:
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/crunchbase-scraper').call(
run_input={'companyUrls': ['https://www.crunchbase.com/organization/openai'], 'maxItems': 25}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
__NEXT_DATA__ payload is what hydrates the React tree on initial load. It is updated in lockstep with the UI, so when Crunchbase changes the rendered layout the field names in the JSON tend to stay the same. Much more stable than scraping rendered DOM.
Crunchbase sits behind Cloudflare Bot Management. That means JA3/JA4 TLS fingerprinting, Turnstile challenges on suspicious requests, and aggressive IP reputation checks. What works in 2026:
curl_cffi with impersonate="chrome120" or similar.cf_clearance cookie, once obtained, is good for ~30 minutes of requests from the same IP.# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/crunchbase-scraper').call(
run_input={'companyUrls': ['https://www.crunchbase.com/organization/openai'], 'maxItems': 25}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
A minimal schema for most deal-tracking and research use cases:
{
"name": "Stripe",
"description": "Financial infrastructure for the internet.",
"fundingTotal": 9200000000,
"lastFundingDate": "2023-03-15",
"lastFundingType": "series_i",
"investors": ["Andreessen Horowitz", "Sequoia Capital", "Founders Fund"],
"url": "https://www.crunchbase.com/organization/stripe"
}
| Field | Source | Always present? |
|---|---|---|
| name | __NEXT_DATA__ / identifier | Yes |
| description | short_description | Yes |
| fundingTotal | funding_total.value_usd | If disclosed |
| lastFundingDate | last_funding_at | If any rounds |
| investors | investors_headline card | Top 3-5 only |
| full investor list | paywalled | Login required |
The DIY approach works for small-scale research -- a few hundred companies pulled once. Above that, the operational cost of proxies, TLS fingerprint maintenance, Cloudflare challenge handling, and cookie warmup becomes a real engineering effort.
Our Crunchbase Scraper on Apify handles the full operational layer. You pass a list of slugs, a search URL, or a category filter, and it returns structured JSON: name, description, funding total, last funding round, investor list, and headquarters. Pay-per-result pricing means a one-time pull of 10,000 companies does not commit you to an Enterprise contract.
| Approach | Cost | Volume | Reliability |
|---|---|---|---|
| Crunchbase Enterprise API | $2k+/mo | Unlimited | High |
| Crunchbase Basic | $49/mo | Limited | No funding data |
| DIY __NEXT_DATA__ scrape | Proxy cost | Low-Medium | Brittle |
| Managed actor (Apify) | Pay-per-result | High | High |
Crunchbase pulled up the ladder on free API access in 2022, but the data itself is still served as public HTML with a clean JSON payload in __NEXT_DATA__. With residential proxies, a proper TLS fingerprint, and gentle rate limiting, extracting company profiles and funding rounds is very workable in 2026.
Build defensively: pin your extraction on __NEXT_DATA__ rather than DOM selectors, persist the raw JSON when parsing fails, and treat 403 as a rotate-and-retry signal rather than an error. Or use a managed scraper and skip the operational surface entirely.
Try Apify free — the platform powering these scrapers. Get started →