How to Scrape Apple App Store Data in 2026 (Python Guide)
How to Scrape Apple App Store Data in 2026 (Python Guide)
The App Store is a goldmine of structured data. Whether you're doing competitor analysis, tracking review sentiment for your own app, building an ASO (App Store Optimization) tool, or training models on mobile app data — Apple's public endpoints give you access to app metadata, ratings, pricing, chart rankings, and customer reviews with minimal friction.
Most of this works without an API key. Apple has left these endpoints running for over a decade because they were originally designed for affiliate marketing and partner integrations. They're not officially supported for third-party developers, but they're stable, public, and return clean JSON.
This guide covers every endpoint, how to parse the responses, rate limit strategies, multi-country scraping, and proxy integration for volume collection.
Why Scrape the App Store?
The practical use cases are broad:
- Competitor analysis — track pricing changes, rating trends, version update frequency for rival apps
- Review monitoring — surface customer complaints and feature requests in near-real-time
- Market research — understand category dynamics, dominant players, keyword patterns in descriptions
- App Store Optimization — track keyword ranking changes over time without paying $500/month for tools
- Investment research — correlate app ratings/reviews with company growth signals
- Data science — large corpus of structured app descriptions, reviews, and metadata for NLP tasks
Apple doesn't offer an official developer API for any of this. What they provide is a set of public iTunes endpoints that have been quietly serving data since the App Store's launch in 2008.
The Complete Endpoint Map
| Endpoint | Purpose | Rate Limit |
|---|---|---|
itunes.apple.com/search |
Search apps by keyword | ~40 req/min |
itunes.apple.com/lookup |
Metadata by app ID or bundle ID | ~40 req/min |
rss.applemarketingtools.com/api/v2/ |
Top charts (new, cleaner format) | generous |
itunes.apple.com/{country}/rss/ |
Top charts (legacy format) | generous |
itunes.apple.com/rss/customerreviews/ |
Customer reviews (paginated) | ~20 req/min |
iTunes Search API
The Search API is the front door. No authentication, free, supports all countries.
import httpx
import json
import time
import random
from typing import Optional, List
BASE_SEARCH = "https://itunes.apple.com/search"
BASE_LOOKUP = "https://itunes.apple.com/lookup"
def build_headers(country="us"):
return {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
),
"Accept": "application/json",
"Accept-Language": f"{country}-US,{country};q=0.9,en;q=0.8",
"Referer": "https://apps.apple.com/",
}
def search_apps(
term: str,
country: str = "us",
limit: int = 50,
media: str = "software",
entity: str = "software",
proxy_url: str = None,
) -> List[dict]:
"""
Search the App Store by keyword.
entity options: software (iOS), macSoftware, iPadSoftware
"""
params = {
"term": term,
"country": country,
"media": media,
"entity": entity,
"limit": min(limit, 200), # API maximum
"lang": "en_us",
}
client_kwargs = {
"headers": build_headers(country),
"timeout": 20,
"follow_redirects": True,
}
if proxy_url:
client_kwargs["proxies"] = {"all://": proxy_url}
with httpx.Client(**client_kwargs) as client:
try:
resp = client.get(BASE_SEARCH, params=params)
resp.raise_for_status()
except httpx.HTTPStatusError as e:
print(f"Search error: {e.response.status_code}")
return []
data = resp.json()
apps = []
for result in data.get("results", []):
apps.append(normalize_result(result))
return apps
def normalize_result(r: dict) -> dict:
"""Normalize a raw iTunes API result into a clean dict."""
return {
"app_id": str(r.get("trackId", "")),
"bundle_id": r.get("bundleId"),
"name": r.get("trackName"),
"developer": r.get("artistName"),
"developer_id": str(r.get("artistId", "")),
"developer_url": r.get("artistViewUrl"),
"price": r.get("price", 0.0),
"formatted_price": r.get("formattedPrice", "Free"),
"currency": r.get("currency", "USD"),
"rating": r.get("averageUserRating"),
"rating_count": r.get("userRatingCount"),
"rating_current_version": r.get("averageUserRatingForCurrentVersion"),
"rating_count_current_version": r.get("userRatingCountForCurrentVersion"),
"version": r.get("version"),
"minimum_os": r.get("minimumOsVersion"),
"size_bytes": int(r.get("fileSizeBytes", 0)),
"description": r.get("description", ""),
"release_notes": r.get("releaseNotes", ""),
"primary_genre": r.get("primaryGenreName"),
"genres": r.get("genres", []),
"genre_ids": r.get("genreIds", []),
"release_date": r.get("releaseDate"),
"current_version_date": r.get("currentVersionReleaseDate"),
"screenshot_urls": r.get("screenshotUrls", []),
"ipad_screenshot_urls": r.get("ipadScreenshotUrls", []),
"icon_60": r.get("artworkUrl60"),
"icon_100": r.get("artworkUrl100"),
"icon_512": r.get("artworkUrl512"),
"content_rating": r.get("contentAdvisoryRating"),
"supported_devices": r.get("supportedDevices", []),
"features": r.get("features", []),
"languages": r.get("languageCodesISO2A", []),
"in_app_purchases": r.get("isSocialProfileEnabled"),
"seller": r.get("sellerName"),
"store_url": r.get("trackViewUrl"),
}
# Example: search for password manager apps
results = search_apps("password manager", country="us", limit=25)
for app in results[:5]:
rating_str = f"{app['rating']:.1f}★" if app["rating"] else "N/A"
price_str = app["formatted_price"] or "Free"
print(f"{app['name']} — {price_str} — {rating_str} ({app.get('rating_count', 0):,} ratings)")
iTunes Lookup API
Fetch full metadata for apps you already have IDs for. Supports batch lookups of up to 200 IDs:
def lookup_apps(
app_ids: list = None,
bundle_ids: list = None,
country: str = "us",
proxy_url: str = None,
) -> dict:
"""
Lookup app metadata by App Store ID or bundle ID.
Returns dict keyed by app_id string.
"""
if not app_ids and not bundle_ids:
raise ValueError("Provide either app_ids or bundle_ids")
params = {"country": country, "entity": "software"}
if app_ids:
params["id"] = ",".join(str(i) for i in app_ids[:200])
else:
params["bundleId"] = ",".join(bundle_ids[:200])
client_kwargs = {
"headers": build_headers(country),
"timeout": 20,
}
if proxy_url:
client_kwargs["proxies"] = {"all://": proxy_url}
with httpx.Client(**client_kwargs) as client:
resp = client.get(BASE_LOOKUP, params=params)
resp.raise_for_status()
results = {}
for r in resp.json().get("results", []):
app_id = str(r.get("trackId", ""))
if app_id:
results[app_id] = normalize_result(r)
return results
def lookup_by_bundle_id(bundle_id: str, country: str = "us") -> Optional[dict]:
"""Look up a single app by its bundle ID."""
results = lookup_apps(bundle_ids=[bundle_id], country=country)
return next(iter(results.values()), None)
# Single lookup
app = lookup_by_bundle_id("com.agilebits.onepassword-ios")
if app:
size_mb = app["size_bytes"] // 1_048_576
print(f"{app['name']} v{app['version']}")
print(f" Rating: {app['rating']:.2f} ({app['rating_count']:,} total, {app['rating_count_current_version']:,} current)")
print(f" Size: {size_mb} MB | Min iOS: {app['minimum_os']}")
print(f" Genres: {', '.join(app['genres'])}")
print(f" Last updated: {app['current_version_date']}")
# Batch lookup
batch_ids = ["389801252", "324684580", "835599320", "310633997"]
all_apps = lookup_apps(app_ids=batch_ids)
for app_id, meta in all_apps.items():
print(f"{meta['name']}: {meta['rating']:.2f}★")
Top Charts API
Two chart APIs are available — the new Apple Marketing Tools version and the legacy iTunes format. Both work:
def get_top_charts_new(
country: str = "us",
chart: str = "top-free",
limit: int = 100,
genre_id: int = None,
proxy_url: str = None,
) -> list:
"""
Fetch top charts via the newer Apple Marketing Tools RSS API.
Charts: top-free, top-paid, top-grossing
Limit: 10, 25, 50, 100, or 200
genre_id: App Store category genre ID
"""
limit = min(limit, 200)
# Round to nearest valid limit
for valid in [200, 100, 50, 25, 10]:
if limit >= valid:
limit = valid
break
base = f"https://rss.applemarketingtools.com/api/v2/{country}/apps/{chart}/{limit}/apps.json"
client_kwargs = {"headers": build_headers(country), "timeout": 20}
if proxy_url:
client_kwargs["proxies"] = {"all://": proxy_url}
with httpx.Client(**client_kwargs) as client:
resp = client.get(base)
resp.raise_for_status()
results = []
for i, item in enumerate(resp.json().get("feed", {}).get("results", []), 1):
results.append({
"rank": i,
"app_id": item["id"],
"name": item["name"],
"developer": item.get("artistName"),
"category": item.get("genres", [{}])[0].get("name", ""),
"category_id": item.get("genres", [{}])[0].get("genreId", ""),
"price": item.get("offers", [{}])[0].get("price", "0") if item.get("offers") else "0",
"icon": item.get("artworkUrl100"),
"url": item.get("url"),
})
return results
def get_top_charts_legacy(
country: str = "us",
chart: str = "topfreeapplications",
limit: int = 100,
proxy_url: str = None,
) -> list:
"""
Legacy iTunes chart RSS feed.
Charts: topfreeapplications, toppaidapplications, topgrossingapplications,
topfreeipadapplications, toppaidipadapplications, newapplications
"""
url = f"https://itunes.apple.com/{country}/rss/{chart}/limit={limit}/json"
client_kwargs = {"headers": build_headers(country), "timeout": 20}
if proxy_url:
client_kwargs["proxies"] = {"all://": proxy_url}
with httpx.Client(**client_kwargs) as client:
resp = client.get(url)
resp.raise_for_status()
entries = resp.json().get("feed", {}).get("entry", [])
results = []
for rank, entry in enumerate(entries, 1):
results.append({
"rank": rank,
"app_id": entry["id"]["attributes"]["im:id"],
"name": entry["im:name"]["label"],
"developer": entry["im:artist"]["label"],
"developer_id": entry["im:artist"]["attributes"].get("href", "").split("/id")[-1],
"category": entry["category"]["attributes"]["label"],
"category_id": entry["category"]["attributes"]["im:id"],
"price": entry["im:price"]["label"],
"release_date": entry["im:releaseDate"]["attributes"]["label"],
"icon": entry["im:image"][-1]["label"] if entry.get("im:image") else None,
"store_url": entry["id"]["label"],
})
return results
# Compare top 10 across three chart types
free_apps = get_top_charts_new("us", "top-free", 10)
paid_apps = get_top_charts_new("us", "top-paid", 10)
grossing = get_top_charts_new("us", "top-grossing", 10)
print("Top 5 Free:")
for app in free_apps[:5]:
print(f" #{app['rank']} {app['name']} ({app['developer']})")
print("\nTop 5 Grossing:")
for app in grossing[:5]:
print(f" #{app['rank']} {app['name']} ({app['developer']})")
App Store Genre IDs
# Major category genre IDs for filtering charts
GENRE_IDS = {
"Games": 6014,
"Entertainment": 6016,
"Education": 6017,
"Utilities": 6002,
"Business": 6000,
"Productivity": 6007,
"Social Networking": 6005,
"Finance": 6015,
"Health & Fitness": 6013,
"Travel": 6003,
"Music": 6011,
"Photo & Video": 6008,
"Shopping": 6024,
"News": 6009,
"Books": 6018,
"Medical": 6020,
"Food & Drink": 6023,
"Navigation": 6010,
"Lifestyle": 6012,
"Weather": 6001,
"Sports": 6004,
"Developer Tools": 6026,
"Reference": 6006,
"Graphics & Design": 6027,
}
# Top 50 productivity apps
top_productivity = get_top_charts_legacy("us", "topfreeapplications", 50)
# (Legacy endpoint doesn't support direct genre filtering in URL — filter client-side)
productivity_apps = [a for a in top_productivity if a.get("category_id") == "6007"]
Customer Reviews
The review endpoint has been running since 2008. Each page returns up to 50 reviews; Apple caps accessible reviews at ~500 per app regardless:
def get_reviews(
app_id: str,
country: str = "us",
max_pages: int = 10,
sort: str = "mostRecent",
proxy_url: str = None,
) -> list:
"""
Fetch customer reviews via the iTunes RSS review endpoint.
sort: mostRecent, mostHelpful
"""
reviews = []
client_kwargs = {
"headers": build_headers(country),
"timeout": 15,
}
if proxy_url:
client_kwargs["proxies"] = {"all://": proxy_url}
with httpx.Client(**client_kwargs) as client:
for page in range(1, max_pages + 1):
url = (
f"https://itunes.apple.com/rss/customerreviews/"
f"id={app_id}/sortBy={sort}/page={page}/json"
)
try:
resp = client.get(url)
except httpx.TimeoutException:
print(f"Timeout on page {page}")
break
if resp.status_code == 404:
break # No more pages
if resp.status_code != 200:
print(f"Page {page}: HTTP {resp.status_code}")
break
data = resp.json()
feed = data.get("feed", {})
entries = feed.get("entry", [])
# Skip app metadata entry (first entry on page 1)
if page == 1 and entries:
entries = entries[1:]
if not entries:
break
for entry in entries:
try:
reviews.append({
"review_id": entry["id"]["label"],
"title": entry["title"]["label"],
"body": entry["content"]["label"],
"rating": int(entry["im:rating"]["label"]),
"app_version": entry["im:version"]["label"],
"author": entry["author"]["name"]["label"],
"author_url": entry["author"]["uri"]["label"],
"updated": entry["updated"]["label"],
"helpful": int(entry.get("im:voteCount", {}).get("label", 0)),
"total_votes": int(entry.get("im:voteSum", {}).get("label", 0)),
"app_id": str(app_id),
"country": country,
})
except (KeyError, ValueError) as e:
print(f" Parse error on review: {e}")
continue
print(f" Page {page}: {len(entries)} reviews")
if len(entries) < 10:
break # Partial page = last page
time.sleep(random.uniform(1.5, 3.0))
return reviews
def analyze_reviews(reviews: list) -> dict:
"""Quick statistical analysis of a review set."""
if not reviews:
return {}
ratings = [r["rating"] for r in reviews]
from collections import Counter
dist = Counter(ratings)
return {
"total": len(reviews),
"avg_rating": sum(ratings) / len(ratings),
"distribution": {f"{k}_star": v for k, v in sorted(dist.items())},
"most_helpful": sorted(reviews, key=lambda r: r["helpful"], reverse=True)[:3],
"latest": sorted(reviews, key=lambda r: r["updated"], reverse=True)[:3],
}
# Get Notion's reviews
reviews = get_reviews("1232780281", max_pages=5)
stats = analyze_reviews(reviews)
print(f"\nNotion App Store Reviews Analysis:")
print(f"Total: {stats['total']} | Avg: {stats['avg_rating']:.2f}★")
print("Distribution:", stats["distribution"])
print("\nMost helpful reviews:")
for r in stats["most_helpful"]:
print(f" [{r['rating']}★] {r['title'][:60]} ({r['helpful']} helpful)")
Multi-Country Scraping
Apple has 175+ country storefronts. Ratings, rankings, and even review content differ by country:
APPLE_COUNTRIES = {
"us": "United States",
"gb": "United Kingdom",
"ca": "Canada",
"au": "Australia",
"de": "Germany",
"fr": "France",
"jp": "Japan",
"kr": "South Korea",
"cn": "China",
"br": "Brazil",
"mx": "Mexico",
"in": "India",
"ru": "Russia",
"es": "Spain",
"it": "Italy",
"nl": "Netherlands",
"pl": "Poland",
"se": "Sweden",
"no": "Norway",
"dk": "Denmark",
}
def scrape_global_rankings(
app_id: str,
countries: list = None,
proxy_url: str = None,
) -> dict:
"""
Fetch app metadata across multiple countries.
Useful for: comparing ratings, checking availability, monitoring pricing.
"""
if countries is None:
countries = list(APPLE_COUNTRIES.keys())
results = {}
client_kwargs = {"timeout": 20}
if proxy_url:
client_kwargs["proxies"] = {"all://": proxy_url}
with httpx.Client(**client_kwargs) as client:
for country in countries:
params = {"id": app_id, "country": country, "entity": "software"}
headers = build_headers(country)
try:
resp = client.get(BASE_LOOKUP, params=params, headers=headers)
data = resp.json()
items = data.get("results", [])
if items:
r = items[0]
results[country] = {
"available": True,
"price": r.get("price", 0),
"currency": r.get("currency"),
"rating": r.get("averageUserRating"),
"rating_count": r.get("userRatingCount"),
"version": r.get("version"),
}
else:
results[country] = {"available": False}
except Exception as e:
results[country] = {"error": str(e)}
time.sleep(random.uniform(0.8, 1.5))
return results
# Check Spotify's ratings across markets
global_data = scrape_global_rankings("324684580", countries=["us", "gb", "de", "jp", "au"])
print("\nSpotify global ratings:")
for country, data in global_data.items():
if data.get("available") and data.get("rating"):
flag = {"us": "🇺🇸", "gb": "🇬🇧", "de": "🇩🇪", "jp": "🇯🇵", "au": "🇦🇺"}.get(country, country)
print(f" {flag} {APPLE_COUNTRIES.get(country, country)}: "
f"{data['rating']:.2f}★ ({data.get('rating_count', 0):,} ratings) | {data.get('currency')} {data.get('price', 0)}")
Anti-Bot Measures and Rate Limiting
Apple's public endpoints are relatively permissive, but there are real limits:
What Triggers Rate Limits
- More than ~40 requests/minute on Search or Lookup endpoints from a single IP
- More than ~20 requests/minute on the Review endpoint
- Python's default
python-requests/2.x.xUser-Agent (returns reduced data or blocks) - Consistent machine-perfect timing between requests
Mitigation Strategies
Use a realistic User-Agent — always:
USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Mobile/15E148 Safari/604.1",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"iTunes/12.12.8 (Macintosh; OS X 14.5) AppleWebKit/617.3.4",
]
Add jitter — humans don't request at exactly 1.000-second intervals:
def polite_delay(base=1.5, variance=1.0, long_pause_chance=0.05):
"""Human-like delay with occasional longer pauses."""
if random.random() < long_pause_chance:
time.sleep(random.uniform(8, 20))
else:
time.sleep(base + random.gauss(0, variance))
Rotate proxies for volume scraping — For production monitoring of hundreds of apps across multiple countries, single-IP rate limits become a real constraint. ThorData's residential proxy network rotates through a large pool of consumer IPs, keeping each IP's request count low enough to avoid Apple's rate limiting:
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
def get_proxy(session_id=None, country="us"):
"""
ThorData proxy URL with optional sticky session.
Use sticky sessions for multi-step lookups (search -> lookup -> reviews)
on the same app to avoid looking like multiple sources.
"""
if session_id:
user = f"{THORDATA_USER}-session-{session_id}-country-{country}"
else:
user = f"{THORDATA_USER}-country-{country}"
return f"http://{user}:{THORDATA_PASS}@proxy.thordata.com:9000"
def rate_limited_lookup(app_ids, country="us", requests_per_minute=30):
"""Batch lookup with rate limiting and proxy rotation."""
results = {}
delay = 60.0 / requests_per_minute
# Process in chunks of 50
chunks = [app_ids[i:i+50] for i in range(0, len(app_ids), 50)]
for i, chunk in enumerate(chunks):
# Rotate proxy per chunk
proxy = get_proxy(session_id=i * 100 + random.randint(1, 99))
batch = lookup_apps(app_ids=chunk, country=country, proxy_url=proxy)
results.update(batch)
print(f"Chunk {i+1}/{len(chunks)}: {len(batch)} results")
time.sleep(delay + random.uniform(0, 2))
return results
Retry on 429:
import functools
def with_retry(max_attempts=4, base_wait=10):
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_attempts):
result = func(*args, **kwargs)
if result:
return result
wait = base_wait * (2 ** attempt) + random.uniform(0, 5)
print(f"Empty/failed result, waiting {wait:.1f}s (attempt {attempt+1})...")
time.sleep(wait)
return None
return wrapper
return decorator
Developer Apps and Portfolio Lookup
Fetch all apps from a developer by developer ID:
def get_developer_apps(developer_id: str, country: str = "us") -> list:
"""Get all apps published by a developer."""
params = {
"id": developer_id,
"entity": "software",
"country": country,
}
with httpx.Client(headers=build_headers(country), timeout=20) as client:
resp = client.get(BASE_LOOKUP, params=params)
resp.raise_for_status()
results = resp.json().get("results", [])
# First result is the developer, remaining are apps
apps = [normalize_result(r) for r in results if r.get("wrapperType") == "software"]
return apps
# Get all apps from a developer
# Find developer ID first via search, then lookup all their apps
search_results = search_apps("headspace meditation", limit=1)
if search_results:
dev_id = search_results[0]["developer_id"]
dev_apps = get_developer_apps(dev_id)
print(f"Developer has {len(dev_apps)} apps:")
for app in dev_apps:
print(f" {app['name']} — {app['primary_genre']} — {app['formatted_price']}")
Building a Competitor Tracker
Practical monitoring setup with SQLite:
import sqlite3
from pathlib import Path
from datetime import datetime
def init_db(db_path="appstore_tracker.db"):
conn = sqlite3.connect(db_path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS apps (
app_id TEXT PRIMARY KEY,
bundle_id TEXT,
name TEXT,
developer TEXT,
developer_id TEXT,
primary_genre TEXT,
first_seen TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
app_id TEXT,
country TEXT,
price REAL,
rating REAL,
rating_count INTEGER,
rating_current_version REAL,
version TEXT,
size_bytes INTEGER,
captured_at TEXT DEFAULT (datetime('now')),
FOREIGN KEY (app_id) REFERENCES apps(app_id)
);
CREATE TABLE IF NOT EXISTS reviews (
review_id TEXT,
app_id TEXT,
country TEXT,
rating INTEGER,
title TEXT,
body TEXT,
author TEXT,
app_version TEXT,
review_date TEXT,
helpful INTEGER DEFAULT 0,
scraped_at TEXT DEFAULT (datetime('now')),
PRIMARY KEY (review_id, country)
);
CREATE TABLE IF NOT EXISTS chart_positions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
app_id TEXT,
chart_type TEXT,
country TEXT,
rank INTEGER,
captured_at TEXT DEFAULT (datetime('now')),
FOREIGN KEY (app_id) REFERENCES apps(app_id)
);
CREATE INDEX IF NOT EXISTS idx_snapshots ON snapshots(app_id, captured_at);
CREATE INDEX IF NOT EXISTS idx_reviews ON reviews(app_id, review_date);
CREATE INDEX IF NOT EXISTS idx_charts ON chart_positions(app_id, captured_at);
""")
conn.commit()
return conn
def save_snapshot(conn, app_id, country, metadata):
"""Save a metadata snapshot."""
# Upsert app record
conn.execute("""
INSERT OR IGNORE INTO apps (app_id, bundle_id, name, developer, developer_id, primary_genre)
VALUES (?, ?, ?, ?, ?, ?)
""", (
app_id, metadata.get("bundle_id"), metadata.get("name"),
metadata.get("developer"), metadata.get("developer_id"),
metadata.get("primary_genre"),
))
conn.execute("""
INSERT INTO snapshots (app_id, country, price, rating, rating_count,
rating_current_version, version, size_bytes)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
app_id, country,
metadata.get("price"), metadata.get("rating"),
metadata.get("rating_count"), metadata.get("rating_current_version"),
metadata.get("version"), metadata.get("size_bytes"),
))
conn.commit()
def save_reviews(conn, reviews):
"""Bulk save reviews, ignore duplicates."""
conn.executemany("""
INSERT OR IGNORE INTO reviews
(review_id, app_id, country, rating, title, body,
author, app_version, review_date, helpful)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", [
(
r["review_id"], r["app_id"], r["country"],
r["rating"], r["title"], r["body"],
r["author"], r["app_version"], r["updated"],
r.get("helpful", 0),
)
for r in reviews
])
conn.commit()
def get_rating_trend(conn, app_id, country="us", days=30):
"""Get rating trend over time."""
cursor = conn.execute("""
SELECT
date(captured_at) as day,
AVG(rating) as avg_rating,
MAX(rating_count) as max_count
FROM snapshots
WHERE app_id = ? AND country = ?
AND captured_at > datetime('now', '-' || ? || ' days')
GROUP BY day
ORDER BY day
""", (app_id, country, days))
return cursor.fetchall()
def run_daily_tracker(app_ids, db_path="appstore_tracker.db"):
"""Daily tracking run: metadata + reviews for all tracked apps."""
conn = init_db(db_path)
proxy = get_proxy(country="us")
# Bulk metadata lookup
print(f"Fetching metadata for {len(app_ids)} apps...")
metadata = lookup_apps(app_ids=app_ids, proxy_url=proxy)
for app_id, meta in metadata.items():
save_snapshot(conn, app_id, "us", meta)
print(f" {meta['name']}: {meta['rating']:.2f}★ ({meta['rating_count']:,} ratings)")
time.sleep(random.uniform(2, 4))
# Reviews for each app
for app_id in app_ids:
print(f"\nFetching reviews for {app_id}...")
reviews = get_reviews(str(app_id), max_pages=3, proxy_url=proxy)
save_reviews(conn, reviews)
print(f" Saved {len(reviews)} reviews")
time.sleep(random.uniform(3, 6))
# Check chart positions
print("\nFetching chart positions...")
for chart in ["top-free", "top-paid", "top-grossing"]:
rankings = get_top_charts_new("us", chart, 200)
rank_map = {r["app_id"]: r["rank"] for r in rankings}
now = datetime.utcnow().isoformat()
for app_id in app_ids:
rank = rank_map.get(str(app_id))
if rank:
conn.execute(
"INSERT INTO chart_positions (app_id, chart_type, country, rank, captured_at) VALUES (?, ?, ?, ?, ?)",
(str(app_id), chart, "us", rank, now)
)
conn.commit()
print(f"\nTracking run complete for {len(app_ids)} apps")
# Run it
TARGET_APPS = [
"389801252", # Instagram
"324684580", # Spotify
"835599320", # Notion
"1232780281", # Notion (alt ID check)
"310633997", # WhatsApp
]
run_daily_tracker(TARGET_APPS)
Keyword Search Rank Tracking
Monitor where your app appears for target keywords:
def track_keyword_ranks(
app_id: str,
keywords: list,
country: str = "us",
conn = None,
) -> dict:
"""
Check keyword search rankings for a specific app.
Returns {keyword: rank_or_None}
"""
results = {}
proxy = get_proxy(country=country)
for keyword in keywords:
ranking_results = search_apps(keyword, country=country, limit=50, proxy_url=proxy)
app_positions = {r["app_id"]: pos+1 for pos, r in enumerate(ranking_results)}
rank = app_positions.get(str(app_id))
results[keyword] = rank
if conn:
conn.execute("""
INSERT INTO keyword_ranks (app_id, keyword, rank, country)
VALUES (?, ?, ?, ?)
""", (str(app_id), keyword, rank, country))
status = f"#{rank}" if rank else "not in top 50"
print(f" '{keyword}': {status}")
time.sleep(random.uniform(1.5, 3.0))
if conn:
conn.commit()
return results
# Track keyword positions for your app
keywords = ["meditation app", "mindfulness", "stress relief", "sleep sounds"]
positions = track_keyword_ranks("1099571240", keywords) # Headspace example
Complete Pipeline
def full_appstore_pipeline(
target_apps: list,
monitor_keywords: list,
countries: list = None,
output_db: str = "appstore_intelligence.db",
):
"""
Full App Store intelligence pipeline.
Run daily via cron or scheduler.
"""
if countries is None:
countries = ["us", "gb", "ca", "au"]
conn = init_db(output_db)
proxy = get_proxy(country="us")
print("=== App Store Intelligence Pipeline ===\n")
# 1. Chart snapshots
print("Phase 1: Top charts")
for country in countries:
for chart in ["top-free", "top-grossing"]:
rankings = get_top_charts_new(country, chart, 100)
rank_map = {r["app_id"]: r["rank"] for r in rankings}
now = datetime.utcnow().isoformat()
# Record positions for our tracked apps
for app_id in target_apps:
rank = rank_map.get(str(app_id))
conn.execute(
"INSERT INTO chart_positions (app_id, chart_type, country, rank, captured_at) VALUES (?, ?, ?, ?, ?)",
(str(app_id), chart, country, rank, now)
)
conn.commit()
print(f" {country}/{chart}: {len(rankings)} apps charted")
time.sleep(random.uniform(1, 2))
# 2. Metadata snapshots (all target apps, all countries)
print("\nPhase 2: Metadata snapshots")
for country in countries:
meta_batch = lookup_apps(app_ids=target_apps, country=country, proxy_url=proxy)
for app_id, meta in meta_batch.items():
save_snapshot(conn, app_id, country, meta)
print(f" {country}: {len(meta_batch)} apps updated")
time.sleep(random.uniform(2, 3))
# 3. Reviews (US only, recent)
print("\nPhase 3: Reviews")
for app_id in target_apps:
reviews = get_reviews(str(app_id), "us", max_pages=3, proxy_url=proxy)
save_reviews(conn, reviews)
print(f" App {app_id}: {len(reviews)} reviews")
time.sleep(random.uniform(3, 5))
# 4. Keyword positions
print("\nPhase 4: Keyword rankings")
for app_id in target_apps:
track_keyword_ranks(str(app_id), monitor_keywords, "us", conn)
time.sleep(random.uniform(2, 4))
print("\nPipeline complete.")
# Print summary stats
cursor = conn.execute("SELECT COUNT(*) FROM apps")
print(f"Database: {cursor.fetchone()[0]} apps tracked")
if __name__ == "__main__":
TARGET_APPS = ["389801252", "324684580", "835599320"]
KEYWORDS = ["social media", "photo sharing", "music streaming"]
full_appstore_pipeline(TARGET_APPS, KEYWORDS)
Closing Notes
The iTunes API endpoints have been remarkably stable for over 15 years. They're not documented as a third-party developer API, but Apple has never restricted them. Terms of service matter — don't hammer the endpoints, build in proper rate limiting from day one, and don't resell the raw data commercially.
For low-volume personal projects, User-Agent rotation and polite delays are sufficient. For multi-country monitoring of hundreds of apps with daily frequency, budget for residential proxies — ThorData integrates cleanly via standard proxy configuration and keeps each IP's request count well below Apple's rate limits.
The data is genuinely useful. A month of daily snapshots gives you rating trend data that ASO tools charge hundreds per month to provide.