Instagram Profile Data Without the Meta API: Python Guide 2026

April 14, 2026 · 10 min read

Contents The Meta API problem What data is still public Approach 1: web_profile_info endpoint Approach 2: __additionalDataLoaded parsing Extracting posts and engagement Stories, reels, and tagged posts Anti-bot tactics in 2026 Managed scraper option

Meta's Instagram Graph API requires business verification, a Facebook Page connected to the Instagram account, and approval from Meta for any useful permission scope. Even after approval, you can only read data from accounts you own or manage -- not arbitrary public profiles. For competitive research, influencer analytics, or brand monitoring, the official API is effectively useless.

Instagram's public web pages, however, still render everything you need to know about a public profile. The trick is knowing which internal JSON endpoints the web app uses and how to call them without getting flagged by Meta's anti-bot systems.

The Meta API problem

2018: Original Instagram API deprecated after the Cambridge Analytica fallout.
2020: Legacy endpoints like ?__a=1 (the old JSON twin of any profile URL) started requiring authentication.
2022: Meta tightened access further -- the Graph API only returns data for business/creator accounts you own.
2023: The ?__a=1&__d=dis workaround was patched. Anonymous access to profile JSON stopped working.
2025: Login walls became more aggressive. Logged-out browsers see a limited profile view.
2026: A few internal endpoints still return full data with a valid x-ig-app-id header. That's what this guide uses.

What data is still public

For any non-private Instagram account, the following fields are still accessible without login:

Username, full name, biography, external URL
Follower count, following count, post count
Profile picture (HD version)
Verified status, business category, business contact info
Recent posts with captions, like counts, comment counts
Tagged locations, product tags, mentions

Private accounts, story views, direct messages, and analytics require auth. Don't attempt to scrape these -- it's both a ToS violation and a CFAA risk in the US.

Approach 1: the web_profile_info endpoint

Instagram's web client uses an internal endpoint at /api/v1/users/web_profile_info/. It returns the full profile JSON including the first 12 posts. No auth required if you send the right headers.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/instagram-profile-scraper').call(
    run_input={'usernames': ['nasa'], 'resultsPerUser': 30}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Why the iPhone UA? Instagram's mobile web app is older code that has less aggressive bot detection than the desktop app. Using a mobile User-Agent, even from a desktop environment, routes you through gentler rate-limit buckets.

Approach 2: __additionalDataLoaded parsing

If the internal API call starts failing (Meta rotates the endpoint paths periodically), you can fall back to parsing the public profile page HTML. The first 12 posts and basic profile data are embedded in a JavaScript payload called window.__additionalDataLoaded.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/instagram-profile-scraper').call(
    run_input={'usernames': ['nasa'], 'resultsPerUser': 30}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

The HTML approach is brittle -- Meta changes the JSON structure on the profile page roughly every 6 months. Keep the parsing logic isolated so you can swap it out when it breaks, and always have the web_profile_info path as your primary.

Extracting posts and engagement

The web_profile_info response includes the first 12 posts under edge_owner_to_timeline_media. Each post has a shortcode, caption, like count, comment count, and media URLs.

# Managed actor returns posts already parsed — no need to walk Meta's internal JSON
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/instagram-profile-scraper').call(
    run_input={'usernames': ['nasa'], 'resultsPerUser': 12}
)

for post in client.dataset(run['defaultDatasetId']).iterate_items():
    print(f"[{post['likes']:,} likes] {post['caption'][:60]}")

Paginating beyond the first 12

The edge_owner_to_timeline_media.page_info.end_cursor field is a pagination token. To fetch the next batch you'd use the GraphQL endpoint at /graphql/query/ with the cursor and a query hash. That endpoint is heavily rate-limited for logged-out clients in 2026 -- if you need deep post history, use a managed scraper rather than building your own pagination loop.

Stories, reels, and tagged posts

Stories: Require login. No public anonymous access in 2026.
Reels: Visible in the profile feed. Identified by product_type == "clips" on post nodes.
Tagged posts: Separate endpoint (/api/v1/users/<id>/user_tagged_feed/). Often blocked without auth.
Highlights: Story highlight metadata is in the profile JSON under edge_highlight_reels, but the individual highlight contents require auth.

Anti-bot tactics in 2026

Meta's bot detection stack has improved significantly since 2023. What works now:

Mobile User-Agent + mobile IP = far fewer challenges than desktop setups
Residential proxies are mandatory for any volume beyond a few profiles per hour
Session affinity: re-use the same proxy for a batch of requests, then rotate
Rate: ~1 request per 3-5 seconds per IP is the practical ceiling
Avoid datacenter IPs entirely: AWS/GCP/Hetzner ranges are instantly challenged

For residential proxy infrastructure, ScraperAPI and Bright Data are the go-to options -- both offer mobile and residential IP pools with session affinity that stay within Instagram's rate limits without requiring you to source or rotate proxies manually.

The 401 trap: If you hit the endpoint too fast or from a flagged IP, Meta returns a 401 and starts requiring a CSRF token + session cookie. Once this happens, that IP is burnt for that endpoint for ~30 minutes. Back off aggressively on the first 401 rather than retrying.

Legal note: Instagram's ToS prohibits automated scraping. The hiQ v. LinkedIn ruling established that scraping public data is not CFAA-violating in the US, but ToS-based contract claims still apply. For commercial products, get legal advice or use a service that handles compliance.

Managed scraper option

If you're doing competitive analysis, influencer vetting, or brand monitoring and don't want to maintain a proxy pool + endpoint rotation logic, a managed actor is the right call.

Our Instagram Profile Scraper on Apify handles the full pipeline: residential proxy rotation, session management, fallback logic when internal endpoints rotate, and post pagination. You pass a list of usernames and get back structured profile data plus recent posts -- no Meta App Review, no business verification, no Page setup.

Approach	Cost	Setup	Scope
Meta Graph API	Free	App review + Page	Own accounts only
web_profile_info (direct)	Proxy cost	None	Public profiles, first 12 posts
Playwright + login	Proxy + accounts	Account farming	More data, risk of bans
Managed actor (Apify)	Pay-per-use	API key	Full profile + posts

For a one-off research project, the web_profile_info endpoint with a mobile UA and a residential proxy is the fastest path. For anything ongoing -- weekly follower tracking, post engagement monitoring, influencer discovery -- a managed scraper absorbs the operational cost of Meta's constantly changing anti-bot measures.

Whatever you build, log raw responses when parsing fails, cache aggressively, and treat the integration as a moving target. Instagram breaks scrapers on a schedule -- roughly every 3-6 months a key field gets renamed or a rate limit tightens.

📚 Free Resource

Want to master web scraping end-to-end? The Complete Web Scraping Playbook 2026 covers proxies, anti-bot bypass, data pipelines, and selling data — all in one PDF guide.

Get the Playbook — $9 →