How Instagram's Mobile API Works (And How to Use It Without a Browser)

2026-03-29 scraping instagram python api data-collection

Instagram killed window._sharedData around 2024. If you've been following any scraping tutorial written before that, you're already chasing a ghost. The endpoint is gone, the embedded JSON blobs are gone, and browser-based scraping now requires maintaining a full session with cookies just to get public profile data.

But Instagram's mobile app still talks to a clean JSON API. The app has to get data somehow — and it does, over HTTP, with predictable headers. Here's how to use it directly.

The Endpoint

https://i.instagram.com/api/v1/users/web_profile_info/?username={user}

Two headers are required:

User-Agent: Must look like a mobile Instagram app, not a browser
x-ig-app-id: 936619743392459

That app ID is the public identifier for the official Instagram Android app. It doesn't change often. The User-Agent needs to match what a real Android device would send — Instagram validates this on the server side.

A working User-Agent looks like:

Instagram 275.0.0.27.98 Android (33/13; 420dpi; 1080x2400; samsung; SM-G991B; o1s; exynos2100)

The format is: Instagram {version} Android ({API_level}/{version_string}; {dpi}; {resolution}; {manufacturer}; {model}; {codename}; {chipset}). You don't need to generate these dynamically — a static string works fine for low-volume use. For higher volume, rotate between a pool of realistic device strings.

What You Get Back

The response is clean JSON. The data.user object contains:

full_name, username, biography
edge_followed_by.count — followers
edge_follow.count — following
edge_owner_to_timeline_media.count — total posts
media_count
is_verified, is_private, is_business_account
profile_pic_url_hd
edge_owner_to_timeline_media.edges — recent posts (up to ~12)

Each post node includes like counts, comment counts, timestamps, shortcodes (for constructing post URLs), and caption text.

Complete Instagram Profile Scraper

Here's a full working scraper with proxy support, rate limiting, batch processing, and CSV/JSON export:

#!/usr/bin/env python3
"""
Instagram Profile Scraper via Mobile API

Fetches public profile data and recent posts without browser automation.
Uses Instagram's internal mobile API endpoint.

Usage:
    python ig_scraper.py username1 username2 ...
    python ig_scraper.py --file usernames.txt --format csv --output profiles.csv
    python ig_scraper.py instagram --format json
"""

import httpx
import json
import csv
import time
import random
import argparse
import sys
from datetime import datetime, timezone
from pathlib import Path


# Instagram mobile API configuration
API_URL = "https://i.instagram.com/api/v1/users/web_profile_info/"
APP_ID = "936619743392459"

# Pool of realistic Android device User-Agent strings
# Rotate these to avoid fingerprint-based blocks
USER_AGENTS = [
    "Instagram 275.0.0.27.98 Android (33/13; 420dpi; 1080x2400; samsung; SM-G991B; o1s; exynos2100)",
    "Instagram 275.0.0.27.98 Android (34/14; 440dpi; 1080x2340; Google; Pixel 8; shiba; tensor)",
    "Instagram 275.0.0.27.98 Android (33/13; 480dpi; 1440x3200; samsung; SM-S918B; dm3q; qcom)",
    "Instagram 275.0.0.27.98 Android (34/14; 420dpi; 1080x2400; OnePlus; CPH2449; aston; qcom)",
    "Instagram 275.0.0.27.98 Android (33/13; 420dpi; 1080x2340; Xiaomi; 23049RAD8C; fuxi; qcom)",
]


class InstagramScraper:
    def __init__(self, proxy_url: str = None, delay_range: tuple = (2.0, 5.0)):
        self.delay_range = delay_range
        self.request_count = 0
        self.client_kwargs = {
            "timeout": 15,
            "follow_redirects": True,
            "http2": True,  # Instagram negotiates HTTP/2
        }
        if proxy_url:
            self.client_kwargs["proxy"] = proxy_url

    def _get_headers(self) -> dict:
        """Generate request headers with a random User-Agent."""
        return {
            "User-Agent": random.choice(USER_AGENTS),
            "x-ig-app-id": APP_ID,
            "Accept": "*/*",
            "Accept-Language": "en-US,en;q=0.9",
            "X-Requested-With": "XMLHttpRequest",
        }

    def _rate_limit(self):
        """Apply rate limiting between requests."""
        self.request_count += 1
        # Longer pause every 25 requests
        if self.request_count % 25 == 0:
            pause = random.uniform(15, 30)
            print(f"  Pausing {pause:.0f}s after {self.request_count} requests...")
            time.sleep(pause)
        else:
            time.sleep(random.uniform(*self.delay_range))

    def get_profile(self, username: str) -> dict | None:
        """
        Fetch a single Instagram profile.
        Returns structured profile data or None on failure.
        """
        headers = self._get_headers()
        params = {"username": username}

        try:
            with httpx.Client(**self.client_kwargs) as client:
                resp = client.get(API_URL, headers=headers, params=params)

                if resp.status_code == 404:
                    print(f"  @{username}: not found (404)")
                    return None
                elif resp.status_code == 429:
                    print(f"  @{username}: rate limited (429) - backing off")
                    time.sleep(random.uniform(30, 60))
                    return None
                elif resp.status_code == 401:
                    print(f"  @{username}: unauthorized - IP may be blocked")
                    return None

                resp.raise_for_status()
                data = resp.json()

        except httpx.HTTPStatusError as e:
            print(f"  @{username}: HTTP error {e.response.status_code}")
            return None
        except (httpx.RequestError, json.JSONDecodeError) as e:
            print(f"  @{username}: request failed - {e}")
            return None

        user = data.get("data", {}).get("user")
        if not user:
            print(f"  @{username}: no user data in response (may be login-walled)")
            return None

        return self._parse_profile(user)

    def _parse_profile(self, user: dict) -> dict:
        """Extract structured fields from the raw API user object."""
        profile = {
            "username": user.get("username", ""),
            "full_name": user.get("full_name", ""),
            "biography": user.get("biography", ""),
            "followers": user.get("edge_followed_by", {}).get("count", 0),
            "following": user.get("edge_follow", {}).get("count", 0),
            "post_count": user.get("edge_owner_to_timeline_media", {}).get("count", 0),
            "is_verified": user.get("is_verified", False),
            "is_private": user.get("is_private", False),
            "is_business": user.get("is_business_account", False),
            "business_category": user.get("category_name", ""),
            "external_url": user.get("external_url", ""),
            "profile_pic_url": user.get("profile_pic_url_hd", ""),
            "user_id": user.get("id", ""),
            "scraped_at": datetime.now(timezone.utc).isoformat(),
        }

        # Extract recent posts
        posts = []
        edges = (
            user.get("edge_owner_to_timeline_media", {}).get("edges", [])
        )
        for edge in edges:
            node = edge.get("node", {})
            caption_edges = node.get("edge_media_to_caption", {}).get("edges", [])
            caption = caption_edges[0]["node"]["text"] if caption_edges else ""
            posts.append({
                "shortcode": node.get("shortcode", ""),
                "url": f"https://www.instagram.com/p/{node.get('shortcode', '')}/",
                "type": node.get("__typename", ""),
                "likes": node.get("edge_liked_by", {}).get("count", 0),
                "comments": node.get("edge_media_to_comment", {}).get("count", 0),
                "caption": caption,
                "timestamp": node.get("taken_at_timestamp", 0),
                "date": datetime.fromtimestamp(
                    node.get("taken_at_timestamp", 0), tz=timezone.utc
                ).isoformat() if node.get("taken_at_timestamp") else "",
                "display_url": node.get("display_url", ""),
                "is_video": node.get("is_video", False),
                "video_view_count": node.get("video_view_count", 0),
            })

        profile["recent_posts"] = posts
        profile["avg_likes"] = (
            sum(p["likes"] for p in posts) // len(posts) if posts else 0
        )
        profile["avg_comments"] = (
            sum(p["comments"] for p in posts) // len(posts) if posts else 0
        )
        profile["engagement_rate"] = 0.0
        if profile["followers"] > 0 and posts:
            avg_engagement = profile["avg_likes"] + profile["avg_comments"]
            profile["engagement_rate"] = round(
                (avg_engagement / profile["followers"]) * 100, 3
            )

        return profile

    def scrape_batch(self, usernames: list[str]) -> list[dict]:
        """Scrape multiple profiles with rate limiting."""
        results = []
        total = len(usernames)

        for i, username in enumerate(usernames):
            username = username.strip().lstrip("@")
            if not username:
                continue

            print(f"[{i+1}/{total}] Fetching @{username}...")
            profile = self.get_profile(username)

            if profile:
                results.append(profile)
                print(f"  @{username}: {profile['followers']:,} followers, "
                      f"{profile['post_count']} posts, "
                      f"{profile['engagement_rate']}% engagement")
            if i < total - 1:
                self._rate_limit()

        print(f"\nDone. Successfully scraped {len(results)}/{total} profiles.")
        return results


def export_json(profiles: list[dict], filename: str):
    """Export profiles to JSON."""
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(profiles, f, indent=2, ensure_ascii=False)
    print(f"Exported {len(profiles)} profiles to {filename}")


def export_csv(profiles: list[dict], filename: str):
    """Export profiles to CSV (flattened, without nested posts)."""
    if not profiles:
        return

    # Flatten: remove nested posts for CSV, keep summary stats
    flat = []
    for p in profiles:
        row = {k: v for k, v in p.items() if k != "recent_posts"}
        row["top_post_likes"] = max(
            (post["likes"] for post in p.get("recent_posts", [])), default=0
        )
        row["top_post_url"] = ""
        if p.get("recent_posts"):
            top = max(p["recent_posts"], key=lambda x: x["likes"])
            row["top_post_url"] = top["url"]
        flat.append(row)

    fieldnames = list(flat[0].keys())
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(flat)
    print(f"Exported {len(flat)} profiles to {filename}")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Scrape Instagram profiles via mobile API")
    parser.add_argument("usernames", nargs="*", help="Instagram usernames to scrape")
    parser.add_argument("--file", "-f", help="File with usernames (one per line)")
    parser.add_argument("--format", choices=["json", "csv"], default="json")
    parser.add_argument("--output", "-o", help="Output filename")
    parser.add_argument("--proxy", help="Proxy URL (http://user:pass@host:port)")
    args = parser.parse_args()

    usernames = list(args.usernames)
    if args.file:
        usernames.extend(Path(args.file).read_text().strip().splitlines())

    if not usernames:
        print("No usernames provided. Usage: python ig_scraper.py username1 username2")
        sys.exit(1)

    scraper = InstagramScraper(proxy_url=args.proxy)
    profiles = scraper.scrape_batch(usernames)

    out_file = args.output or f"ig_profiles.{args.format}"
    if args.format == "csv":
        export_csv(profiles, out_file)
    else:
        export_json(profiles, out_file)

Expected output

Running python ig_scraper.py instagram cristiano natgeo --format json:

[
  {
    "username": "instagram",
    "full_name": "Instagram",
    "biography": "Bringing you closer to the people and things you love.",
    "followers": 678432100,
    "following": 82,
    "post_count": 7842,
    "is_verified": true,
    "is_private": false,
    "is_business": true,
    "business_category": "Product/Service",
    "engagement_rate": 0.142,
    "avg_likes": 892340,
    "avg_comments": 12450,
    "recent_posts": [
      {
        "shortcode": "DAx7...",
        "url": "https://www.instagram.com/p/DAx7.../",
        "type": "GraphImage",
        "likes": 1245000,
        "comments": 18200,
        "caption": "Your favorites, all in one place...",
        "date": "2026-03-28T14:30:00+00:00",
        "is_video": false
      }
    ]
  }
]

CSV output (flattened):

username,full_name,followers,following,post_count,is_verified,engagement_rate,avg_likes,avg_comments,top_post_likes,top_post_url
instagram,Instagram,678432100,82,7842,True,0.142,892340,12450,1245000,https://www.instagram.com/p/DAx7.../
cristiano,Cristiano Ronaldo,642000000,580,3812,True,0.385,2100000,67000,4500000,https://www.instagram.com/p/DBm2.../

Use Case 1: Competitor Engagement Tracker

Compare engagement metrics across competitor accounts over time:

"""
Instagram Competitive Analysis
Tracks follower counts and engagement rates across a set of accounts.
Run weekly via cron to build a time-series dataset.
"""

def competitive_analysis(accounts: list[str], output_dir: str = "."):
    """Scrape accounts and append to a time-series tracking file."""
    scraper = InstagramScraper()
    profiles = scraper.scrape_batch(accounts)

    # Append to running CSV log
    today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
    log_file = Path(output_dir) / "ig_competitive_log.csv"
    file_exists = log_file.exists()

    with open(log_file, "a", newline="", encoding="utf-8") as f:
        fieldnames = [
            "date", "username", "followers", "following", "posts",
            "avg_likes", "avg_comments", "engagement_rate"
        ]
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        if not file_exists:
            writer.writeheader()

        for p in profiles:
            writer.writerow({
                "date": today,
                "username": p["username"],
                "followers": p["followers"],
                "following": p["following"],
                "posts": p["post_count"],
                "avg_likes": p["avg_likes"],
                "avg_comments": p["avg_comments"],
                "engagement_rate": p["engagement_rate"],
            })

    print(f"\nCompetitive snapshot saved to {log_file}")
    print(f"\n{'Username':<20} {'Followers':>12} {'Eng. Rate':>10} {'Avg Likes':>12}")
    print("-" * 56)
    for p in sorted(profiles, key=lambda x: x["followers"], reverse=True):
        print(f"{p['username']:<20} {p['followers']:>12,} {p['engagement_rate']:>9.3f}% "
              f"{p['avg_likes']:>12,}")

    return profiles

# Usage: competitive_analysis(["nike", "adidas", "puma", "newbalance"])

Use Case 2: Influencer Discovery by Engagement

Find accounts with high engagement rates (often micro-influencers) from a seed list:

"""
Influencer Discovery Tool
Filters profiles by engagement rate to find high-performing accounts.
"""

def find_influencers(usernames: list[str], min_followers: int = 10000,
                     max_followers: int = 500000, min_engagement: float = 2.0):
    """Find accounts matching influencer criteria."""
    scraper = InstagramScraper()
    profiles = scraper.scrape_batch(usernames)

    influencers = [
        p for p in profiles
        if (min_followers <= p["followers"] <= max_followers
            and p["engagement_rate"] >= min_engagement
            and not p["is_private"])
    ]

    influencers.sort(key=lambda x: x["engagement_rate"], reverse=True)

    print(f"\nInfluencer Candidates ({len(influencers)} found):")
    print(f"Criteria: {min_followers:,}-{max_followers:,} followers, "
          f">={min_engagement}% engagement\n")

    for p in influencers:
        print(f"  @{p['username']}")
        print(f"    Followers: {p['followers']:,} | Engagement: {p['engagement_rate']}%")
        print(f"    Avg likes: {p['avg_likes']:,} | Category: {p['business_category']}")
        print(f"    Bio: {p['biography'][:80]}")
        print()

    return influencers

Use Case 3: Post Performance Analyzer

Analyze what types of content perform best on any public account:

"""
Instagram Post Performance Analyzer
Examines recent posts to identify content patterns that drive engagement.
"""

def analyze_post_performance(username: str):
    """Analyze recent post performance for an account."""
    scraper = InstagramScraper()
    profile = scraper.get_profile(username)

    if not profile:
        print(f"Could not fetch @{username}")
        return

    posts = profile.get("recent_posts", [])
    if not posts:
        print(f"No public posts found for @{username}")
        return

    # Separate videos and images
    videos = [p for p in posts if p["is_video"]]
    images = [p for p in posts if not p["is_video"]]

    print(f"\n=== Post Performance: @{username} ===")
    print(f"Followers: {profile['followers']:,}")
    print(f"Recent posts analyzed: {len(posts)}")

    if images:
        avg_img_likes = sum(p["likes"] for p in images) // len(images)
        print(f"\nImages ({len(images)} posts):")
        print(f"  Avg likes: {avg_img_likes:,}")

    if videos:
        avg_vid_likes = sum(p["likes"] for p in videos) // len(videos)
        avg_views = sum(p["video_view_count"] for p in videos) // len(videos)
        print(f"\nVideos ({len(videos)} posts):")
        print(f"  Avg likes: {avg_vid_likes:,}")
        print(f"  Avg views: {avg_views:,}")

    # Best and worst performing
    best = max(posts, key=lambda p: p["likes"])
    worst = min(posts, key=lambda p: p["likes"])
    print(f"\nBest post: {best['url']}")
    print(f"  {best['likes']:,} likes | {best['caption'][:60]}")
    print(f"\nWeakest post: {worst['url']}")
    print(f"  {worst['likes']:,} likes | {worst['caption'][:60]}")

    # Caption length analysis
    short_captions = [p for p in posts if len(p["caption"]) < 100]
    long_captions = [p for p in posts if len(p["caption"]) >= 100]
    if short_captions and long_captions:
        avg_short = sum(p["likes"] for p in short_captions) // len(short_captions)
        avg_long = sum(p["likes"] for p in long_captions) // len(long_captions)
        print(f"\nCaption length impact:")
        print(f"  Short (<100 chars): avg {avg_short:,} likes")
        print(f"  Long (100+ chars): avg {avg_long:,} likes")

    return profile

Limitations You Need to Know

Rate limiting. Instagram throttles aggressively after roughly 100-200 requests from the same IP within a short window. The response code is 429 or sometimes a 200 with an error body. The scraper above handles this with automatic backoff.

Datacenter IPs are blocked fast. AWS, GCP, DigitalOcean, Hetzner — Instagram has all the major datacenter ranges flagged. Requests from these IPs will fail or get challenged almost immediately, even with correct headers.

Private profiles. For private accounts, the endpoint returns is_private: true and almost nothing else — no follower count, no posts. You need a valid authenticated session cookie (sessionid) to see anything more. Getting and maintaining those session cookies at scale is a separate (and legally murky) problem.

No post pagination. The edge_owner_to_timeline_media field only returns the most recent ~12 posts. There's no cursor-based pagination from this endpoint. If you need a full post history, you're looking at the GraphQL API (authenticated) or Playwright-based browser automation.

Schema drift. Instagram changes response field names and nesting without notice. The edge_followed_by key has been follower_count in some API versions. Build defensively: use .get() with fallbacks, and log warnings when expected fields are missing.

Proxies for Any Real Volume

If you're running this against more than a handful of usernames, residential proxies are not optional — they're the cost of doing business. Residential IPs look like real mobile users on home ISPs, which is exactly what Instagram expects to see.

ThorData provides rotating residential proxies with good coverage of mobile ISP ranges. For Instagram specifically, you want IPs that rotate per-request or per-session, not sticky IPs that accumulate request history.

# ThorData proxy integration
scraper = InstagramScraper(
    proxy_url="http://USER:[email protected]:9000"
)
profiles = scraper.scrape_batch(["nike", "adidas", "puma"])

Ready-to-Use Solution

If you need retries, cookie handling, fallback logic, and structured output without building it yourself, the Instagram Profile Scraper on Apify handles all of this. It manages session rotation, rate limit backoff, and exports clean JSON or CSV. Useful if you need data now rather than an engineering project.

The Bigger Picture

This endpoint works today. It may not work the same way in six months. Instagram's anti-bot infrastructure is not static — they run detection experiments, roll out TLS fingerprinting checks, and adjust rate limits based on traffic patterns.

The headers covered here handle the application layer. But Instagram also inspects the TLS handshake itself — the cipher suites you advertise, the TLS extensions in your ClientHello, the order of everything. This is JA3 fingerprinting, and httpx with its default settings produces a fingerprint that looks nothing like a real Android app.

If you start hitting blocks that don't correlate with request volume, the TLS layer is where to look next. Libraries like curl_cffi or tls-client let you spoof JA3 fingerprints, but that's a whole separate guide.