← All posts

Instagram Profile Data Without the Meta API: Python Guide 2026

April 14, 2026 · 10 min read
Contents The Meta API problem What data is still public Approach 1: web_profile_info endpoint Approach 2: __additionalDataLoaded parsing Extracting posts and engagement Stories, reels, and tagged posts Anti-bot tactics in 2026 Managed scraper option

Meta's Instagram Graph API requires business verification, a Facebook Page connected to the Instagram account, and approval from Meta for any useful permission scope. Even after approval, you can only read data from accounts you own or manage -- not arbitrary public profiles. For competitive research, influencer analytics, or brand monitoring, the official API is effectively useless.

Instagram's public web pages, however, still render everything you need to know about a public profile. The trick is knowing which internal JSON endpoints the web app uses and how to call them without getting flagged by Meta's anti-bot systems.

The Meta API problem

What data is still public

For any non-private Instagram account, the following fields are still accessible without login:

Private accounts, story views, direct messages, and analytics require auth. Don't attempt to scrape these -- it's both a ToS violation and a CFAA risk in the US.

Approach 1: the web_profile_info endpoint

Instagram's web client uses an internal endpoint at /api/v1/users/web_profile_info/. It returns the full profile JSON including the first 12 posts. No auth required if you send the right headers.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/instagram-profile-scraper').call(
    run_input={'usernames': ['nasa'], 'resultsPerUser': 30}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)
Why the iPhone UA? Instagram's mobile web app is older code that has less aggressive bot detection than the desktop app. Using a mobile User-Agent, even from a desktop environment, routes you through gentler rate-limit buckets.

Approach 2: __additionalDataLoaded parsing

If the internal API call starts failing (Meta rotates the endpoint paths periodically), you can fall back to parsing the public profile page HTML. The first 12 posts and basic profile data are embedded in a JavaScript payload called window.__additionalDataLoaded.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/instagram-profile-scraper').call(
    run_input={'usernames': ['nasa'], 'resultsPerUser': 30}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

The HTML approach is brittle -- Meta changes the JSON structure on the profile page roughly every 6 months. Keep the parsing logic isolated so you can swap it out when it breaks, and always have the web_profile_info path as your primary.

Extracting posts and engagement

The web_profile_info response includes the first 12 posts under edge_owner_to_timeline_media. Each post has a shortcode, caption, like count, comment count, and media URLs.

# Managed actor returns posts already parsed — no need to walk Meta's internal JSON
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/instagram-profile-scraper').call(
    run_input={'usernames': ['nasa'], 'resultsPerUser': 12}
)

for post in client.dataset(run['defaultDatasetId']).iterate_items():
    print(f"[{post['likes']:,} likes] {post['caption'][:60]}")

Paginating beyond the first 12

The edge_owner_to_timeline_media.page_info.end_cursor field is a pagination token. To fetch the next batch you'd use the GraphQL endpoint at /graphql/query/ with the cursor and a query hash. That endpoint is heavily rate-limited for logged-out clients in 2026 -- if you need deep post history, use a managed scraper rather than building your own pagination loop.

Stories, reels, and tagged posts

Anti-bot tactics in 2026

Meta's bot detection stack has improved significantly since 2023. What works now:

  1. Mobile User-Agent + mobile IP = far fewer challenges than desktop setups
  2. Residential proxies are mandatory for any volume beyond a few profiles per hour
  3. Session affinity: re-use the same proxy for a batch of requests, then rotate
  4. Rate: ~1 request per 3-5 seconds per IP is the practical ceiling
  5. Avoid datacenter IPs entirely: AWS/GCP/Hetzner ranges are instantly challenged
The 401 trap: If you hit the endpoint too fast or from a flagged IP, Meta returns a 401 and starts requiring a CSRF token + session cookie. Once this happens, that IP is burnt for that endpoint for ~30 minutes. Back off aggressively on the first 401 rather than retrying.

Legal note: Instagram's ToS prohibits automated scraping. The hiQ v. LinkedIn ruling established that scraping public data is not CFAA-violating in the US, but ToS-based contract claims still apply. For commercial products, get legal advice or use a service that handles compliance.

Managed scraper option

If you're doing competitive analysis, influencer vetting, or brand monitoring and don't want to maintain a proxy pool + endpoint rotation logic, a managed actor is the right call.

Our Instagram Profile Scraper on Apify handles the full pipeline: residential proxy rotation, session management, fallback logic when internal endpoints rotate, and post pagination. You pass a list of usernames and get back structured profile data plus recent posts -- no Meta App Review, no business verification, no Page setup.

ApproachCostSetupScope
Meta Graph APIFreeApp review + PageOwn accounts only
web_profile_info (direct)Proxy costNonePublic profiles, first 12 posts
Playwright + loginProxy + accountsAccount farmingMore data, risk of bans
Managed actor (Apify)Pay-per-useAPI keyFull profile + posts

For a one-off research project, the web_profile_info endpoint with a mobile UA and a residential proxy is the fastest path. For anything ongoing -- weekly follower tracking, post engagement monitoring, influencer discovery -- a managed scraper absorbs the operational cost of Meta's constantly changing anti-bot measures.

Whatever you build, log raw responses when parsing fails, cache aggressively, and treat the integration as a moving target. Instagram breaks scrapers on a schedule -- roughly every 3-6 months a key field gets renamed or a rate limit tightens.


Try Apify free — the platform powering these scrapers. Get started →