Crunchbase Data Without the Enterprise API: Python Guide 2026

April 17, 2026 · 10 min read

Contents The Crunchbase API situation What data is publicly accessible Python code for company data Rate limits and anti-bot measures Output schema Managed alternative Conclusion

Crunchbase used to have a genuinely useful free API. From 2012 until early 2022 developers could sign up, get a key, and pull company profiles, funding rounds, and investor records at a generous rate. That is gone. As of 2026 the free "Open Data Map" tier no longer accepts new registrations, and the full REST API is part of Crunchbase Enterprise -- priced around $2,000/month at the low end, with real seat counts starting closer to $15k/year.

For most developers and researchers that pricing is a non-starter. But Crunchbase company pages, funding round pages, and investor profiles are still served as public HTML. This post covers what you can realistically extract with Python in April 2026 and where the walls are.

The Crunchbase API situation

The current developer portal lists two products:

Crunchbase Basic: self-serve, ~$49/mo. Limited to lookups by name, no bulk search, 1,000 calls/day, no funding round endpoint. Useful for hobby projects, not for anything commercial.
Crunchbase Enterprise: starts at $2k/month, scales to $15k+/year for real seat counts. Full API, bulk CSV exports, search endpoints, funding round detail.

The free tier that existed before 2022 -- 200 calls/min with full funding data -- does not exist anymore. Legacy API keys from before the pricing change have been rotated out.

What this means in practice: if you want to build a deal-flow dashboard, a competitor tracker, a sector-level funding heatmap, or a research dataset, the official API path costs $24k/year minimum. The data itself is mostly on the public site.

Legal note: Crunchbase's Terms of Service prohibit scraping. The hiQ v. LinkedIn ruling gives some cover for public data under the CFAA, but Crunchbase has sent cease-and-desist letters to commercial scrapers, and at least one company (2021) settled under contract-law claims. Research and personal use are widely tolerated; commercial redistribution is higher risk.

What data is publicly accessible

Crunchbase serves three useful public surfaces without login:

Company profile: crunchbase.com/organization/<slug> renders the overview, description, founded date, headquarters, total funding, last funding round, and top investors.
Funding round detail: crunchbase.com/funding_round/<uuid> shows the round type, amount, date, and participating investors.
Investor profile: crunchbase.com/organization/<slug> for the investor side lists portfolio companies and recent investments.

What requires login: the full investor list beyond the first 3-5, detailed cap table entries, advanced search filters, and the "similar companies" graph. The login wall is lazy -- most of the JSON data is embedded in the initial HTML payload before the login check fires.

Python code for extracting Crunchbase company data

The cleanest approach exploits a simple fact: Crunchbase is a Next.js app. Every public page embeds a __NEXT_DATA__ script tag containing the pre-rendered props as JSON. That JSON is a near-complete dump of what the React tree will render, including fields that are hidden behind paywalls in the UI.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/crunchbase-scraper').call(
    run_input={'companyUrls': ['https://www.crunchbase.com/organization/openai'], 'maxItems': 25}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

For the funding round detail, the same __NEXT_DATA__ trick applies:

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/crunchbase-scraper').call(
    run_input={'companyUrls': ['https://www.crunchbase.com/organization/openai'], 'maxItems': 25}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Tip: The __NEXT_DATA__ payload is what hydrates the React tree on initial load. It is updated in lockstep with the UI, so when Crunchbase changes the rendered layout the field names in the JSON tend to stay the same. Much more stable than scraping rendered DOM.

Handling rate limits and anti-bot measures

Crunchbase sits behind Cloudflare Bot Management. That means JA3/JA4 TLS fingerprinting, Turnstile challenges on suspicious requests, and aggressive IP reputation checks. What works in 2026:

Rate limit to ~10 requests/min per IP. Above that, expect Turnstile challenges within a few minutes.
Use residential or mobile proxies. Datacenter IPs from AWS/GCP/Azure hit a Turnstile page on almost every request.
Match TLS fingerprint to the User-Agent. A Chrome UA with Python's default TLS signature is flagged instantly. Use curl_cffi with impersonate="chrome120" or similar.
Persist cookies per session. The cf_clearance cookie, once obtained, is good for ~30 minutes of requests from the same IP.
Back off on 403/429. A 403 with a Turnstile page means you have ~5 minutes to rotate IP and warm up a new session.
Cache aggressively. Company profiles change at weekly/monthly cadence, not hourly. Re-scraping the same slug 100 times a day is pure waste.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/crunchbase-scraper').call(
    run_input={'companyUrls': ['https://www.crunchbase.com/organization/openai'], 'maxItems': 25}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Output schema

A minimal schema for most deal-tracking and research use cases:

{
  "name": "Stripe",
  "description": "Financial infrastructure for the internet.",
  "fundingTotal": 9200000000,
  "lastFundingDate": "2023-03-15",
  "lastFundingType": "series_i",
  "investors": ["Andreessen Horowitz", "Sequoia Capital", "Founders Fund"],
  "url": "https://www.crunchbase.com/organization/stripe"
}

Field	Source	Always present?
name	__NEXT_DATA__ / identifier	Yes
description	short_description	Yes
fundingTotal	funding_total.value_usd	If disclosed
lastFundingDate	last_funding_at	If any rounds
investors	investors_headline card	Top 3-5 only
full investor list	paywalled	Login required

Managed alternative

The DIY approach works for small-scale research -- a few hundred companies pulled once. Above that, the operational cost of proxies, TLS fingerprint maintenance, Cloudflare challenge handling, and cookie warmup becomes a real engineering effort.

Our Crunchbase Scraper on Apify handles the full operational layer. You pass a list of slugs, a search URL, or a category filter, and it returns structured JSON: name, description, funding total, last funding round, investor list, and headquarters. Pay-per-result pricing means a one-time pull of 10,000 companies does not commit you to an Enterprise contract.

Approach	Cost	Volume	Reliability
Crunchbase Enterprise API	$2k+/mo	Unlimited	High
Crunchbase Basic	$49/mo	Limited	No funding data
DIY __NEXT_DATA__ scrape	Proxy cost	Low-Medium	Brittle
Managed actor (Apify)	Pay-per-result	High	High

Conclusion

Crunchbase pulled up the ladder on free API access in 2022, but the data itself is still served as public HTML with a clean JSON payload in __NEXT_DATA__. With residential proxies, a proper TLS fingerprint, and gentle rate limiting, extracting company profiles and funding rounds is very workable in 2026.

Build defensively: pin your extraction on __NEXT_DATA__ rather than DOM selectors, persist the raw JSON when parsing fails, and treat 403 as a rotate-and-retry signal rather than an error. Or use a managed scraper and skip the operational surface entirely.

📚 Free Resource

Want to master web scraping end-to-end? The Complete Web Scraping Playbook 2026 covers proxies, anti-bot bypass, data pipelines, and selling data — all in one PDF guide.

Get the Playbook — $9 →

📊 Data Available

Need SaaS market intelligence? The Indie SaaS Market Intelligence dataset covers 4,500 products with pricing, categories, and growth signals — ready to download.

Get the Dataset — $39 →