Scraping App Store Rankings & Reviews in 2026: iTunes RSS, Google Play, and Beyond

2026-04-09 [scraping app-store google-play python mobile-analytics]

Scraping App Store Rankings & Reviews in 2026

If you're building a mobile app, you need competitive intelligence. What keywords are your competitors ranking for? How are their reviews trending? What's their estimated download count? Which categories are they climbing?

Tools like SensorTower and data.ai charge $500–$2,000/month for this data. The underlying sources are publicly accessible — you just need to know where to look and how to pull from them reliably.

This guide covers the full picture: Apple's iTunes RSS feeds and search API, Google Play's internal batch endpoint, review extraction, keyword rank tracking, and proxy integration for volume scraping.

The Data Sources

Apple and Google expose app data through different channels:

Apple App Store

Source	Data	Auth Required
iTunes RSS feeds	Top charts, new apps — clean JSON	None
iTunes Lookup API	Full metadata for specific apps	None
iTunes Search API	Keyword search results	None
Review RSS endpoint	Customer reviews (paginated)	None

Google Play

Source	Data	Auth Required
Play Store HTML	App listings, descriptions, ratings	None
Internal batch API	Reviews (bulk, structured)	None
Google Shopping	Partial app data	None

The iTunes RSS feed is the most underrated data source. It returns clean JSON with no authentication, no rate limiting headaches for moderate use, and up to 200 results per category — updated hourly.

Apple App Store: iTunes RSS Charts

The RSS feed gives you top free, top paid, and top grossing charts by country and category:

import httpx
import json
import time
import random
from dataclasses import dataclass, field
from typing import Optional, List

@dataclass
class AppEntry:
    rank: int
    app_id: str
    name: str
    developer: str
    developer_id: str
    category: str
    category_id: str
    price_label: str
    icon_url: str
    store_url: str

CHART_TYPES = {
    "top_free": "top-free",
    "top_paid": "top-paid",
    "top_grossing": "top-grossing",
    "new_free": "new-apps-we-love",
    "new_paid": "new-games-we-love",
}

def get_top_charts(
    country: str = "us",
    chart_type: str = "top_free",
    limit: int = 100,
    genre_id: Optional[int] = None,
) -> List[AppEntry]:
    """
    Fetch App Store top charts via the new Apple Marketing Tools RSS API.
    The legacy itunes.apple.com/rss endpoint still works but this is preferred.
    """
    genre_path = f"genre/{genre_id}/" if genre_id else ""
    url = (
        f"https://rss.applemarketingtools.com/api/v2/{country}/apps/"
        f"{CHART_TYPES.get(chart_type, chart_type)}/{limit}/apps.json"
    )

    headers = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
        "Accept": "application/json",
        "Referer": "https://apps.apple.com/",
    }

    resp = httpx.get(url, headers=headers, timeout=20)
    resp.raise_for_status()
    data = resp.json()

    entries = []
    for i, item in enumerate(data.get("feed", {}).get("results", []), 1):
        entries.append(AppEntry(
            rank=i,
            app_id=item["id"],
            name=item["name"],
            developer=item.get("artistName", ""),
            developer_id=item.get("artistId", ""),
            category=item.get("genres", [{}])[0].get("name", ""),
            category_id=item.get("genres", [{}])[0].get("genreId", ""),
            price_label=item.get("offers", [{}])[0].get("price", "0") if item.get("offers") else "0",
            icon_url=item.get("artworkUrl100", ""),
            store_url=item.get("url", f"https://apps.apple.com/app/id{item['id']}"),
        ))

    return entries

# Top 100 free apps in the US
rankings = get_top_charts("us", "top_free", 100)
for app in rankings[:5]:
    print(f"#{app.rank}: {app.name} by {app.developer} (ID: {app.app_id})")

Legacy iTunes RSS Format

The older format still works and returns slightly different data:

def get_charts_legacy(country="us", chart="topfreeapplications", limit=100):
    """
    Legacy iTunes RSS chart feed.
    chart options: topfreeapplications, toppaidapplications,
                   topgrossingapplications, topfreeipadapplications,
                   toppaidipadapplications
    """
    url = f"https://itunes.apple.com/{country}/rss/{chart}/limit={limit}/json"

    resp = httpx.get(url, timeout=20)
    resp.raise_for_status()
    data = resp.json()

    entries = data.get("feed", {}).get("entry", [])
    results = []

    for rank, entry in enumerate(entries, 1):
        results.append({
            "rank": rank,
            "app_id": entry["id"]["attributes"]["im:id"],
            "name": entry["im:name"]["label"],
            "artist": entry["im:artist"]["label"],
            "category": entry["category"]["attributes"]["label"],
            "category_id": entry["category"]["attributes"]["im:id"],
            "price": entry["im:price"]["label"],
            "release_date": entry["im:releaseDate"]["label"],
            "store_url": entry["id"]["label"],
            "icon": entry["im:image"][-1]["label"] if entry.get("im:image") else None,
        })

    return results

Apple App Store: Metadata Lookup

Enrich chart entries with full metadata — ratings, description, screenshots, pricing:

def get_app_metadata(
    app_ids: list,
    country: str = "us",
    chunk_size: int = 100,
    proxy_url: str = None,
) -> dict:
    """
    Fetch full metadata for multiple apps via iTunes Lookup API.
    Handles chunking for large lists (API max: 200 per request).
    Returns dict keyed by app_id.
    """
    all_results = {}
    chunks = [app_ids[i:i+chunk_size] for i in range(0, len(app_ids), chunk_size)]

    client_kwargs = {"timeout": 20}
    if proxy_url:
        client_kwargs["proxies"] = {"all://": proxy_url}

    with httpx.Client(**client_kwargs) as client:
        for chunk in chunks:
            url = "https://itunes.apple.com/lookup"
            params = {
                "id": ",".join(str(i) for i in chunk),
                "country": country,
                "entity": "software",
            }

            resp = client.get(url, params=params)
            if resp.status_code != 200:
                print(f"Lookup failed: {resp.status_code}")
                continue

            for result in resp.json().get("results", []):
                app_id = str(result.get("trackId", ""))
                all_results[app_id] = {
                    "app_id": app_id,
                    "bundle_id": result.get("bundleId"),
                    "name": result.get("trackName"),
                    "developer": result.get("artistName"),
                    "developer_id": str(result.get("artistId", "")),
                    "price": result.get("price", 0),
                    "formatted_price": result.get("formattedPrice"),
                    "currency": result.get("currency"),
                    "rating": result.get("averageUserRating"),
                    "rating_current_version": result.get("averageUserRatingForCurrentVersion"),
                    "rating_count": result.get("userRatingCount"),
                    "rating_count_current": result.get("userRatingCountForCurrentVersion"),
                    "version": result.get("version"),
                    "minimum_os": result.get("minimumOsVersion"),
                    "size_bytes": result.get("fileSizeBytes"),
                    "description": result.get("description", ""),
                    "release_notes": result.get("releaseNotes", ""),
                    "genres": result.get("genres", []),
                    "primary_genre": result.get("primaryGenreName"),
                    "screenshot_urls": result.get("screenshotUrls", []),
                    "ipad_screenshots": result.get("ipadScreenshotUrls", []),
                    "icon_60": result.get("artworkUrl60"),
                    "icon_512": result.get("artworkUrl512"),
                    "release_date": result.get("releaseDate"),
                    "current_version_release_date": result.get("currentVersionReleaseDate"),
                    "content_rating": result.get("contentAdvisoryRating"),
                    "supported_devices": result.get("supportedDevices", []),
                    "languages": result.get("languageCodesISO2A", []),
                    "in_app_purchases": result.get("isVppDeviceBasedLicensingEnabled"),
                    "store_url": result.get("trackViewUrl"),
                }

            time.sleep(0.5)

    return all_results

# Enrich top 100 with full metadata
top_ids = [app.app_id for app in rankings[:100]]
metadata = get_app_metadata(top_ids)
for app_id, meta in list(metadata.items())[:3]:
    print(f"\n{meta['name']}:")
    print(f"  Rating: {meta['rating']:.2f} ({meta['rating_count']:,} ratings)")
    print(f"  Version: {meta['version']} | Min iOS: {meta['minimum_os']}")
    size_mb = int(meta['size_bytes'] or 0) // 1_048_576
    print(f"  Size: {size_mb} MB")

Apple App Store: Review Scraping

Reviews come through the unofficial (but stable) RSS-style JSON endpoint:

def get_app_reviews(
    app_id: str,
    country: str = "us",
    max_pages: int = 10,
    proxy_url: str = None,
) -> list:
    """
    Fetch App Store customer reviews.
    Returns up to max_pages * 50 reviews (Apple caps at ~500 total per app).
    """
    reviews = []
    client_kwargs = {"timeout": 15}
    if proxy_url:
        client_kwargs["proxies"] = {"all://": proxy_url}

    with httpx.Client(**client_kwargs) as client:
        for page in range(1, max_pages + 1):
            url = (
                f"https://itunes.apple.com/rss/customerreviews/"
                f"id={app_id}/sortBy=mostRecent/page={page}/json"
            )
            headers = {
                "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
                "Accept-Language": f"{country}-US,{country};q=0.9",
            }

            resp = client.get(url, headers=headers)

            if resp.status_code == 404:
                break  # No more pages

            if resp.status_code != 200:
                print(f"Page {page}: HTTP {resp.status_code}")
                break

            data = resp.json()
            feed = data.get("feed", {})
            entries = feed.get("entry", [])

            # First entry on page 1 is the app metadata, skip it
            if page == 1 and entries:
                entries = entries[1:]

            if not entries:
                break

            for entry in entries:
                reviews.append({
                    "review_id": entry["id"]["label"],
                    "title": entry["title"]["label"],
                    "body": entry["content"]["label"],
                    "rating": int(entry["im:rating"]["label"]),
                    "app_version": entry["im:version"]["label"],
                    "author": entry["author"]["name"]["label"],
                    "author_url": entry["author"]["uri"]["label"],
                    "date": entry["updated"]["label"],
                    "helpful_votes": int(entry.get("im:voteCount", {}).get("label", 0)),
                    "total_votes": int(entry.get("im:voteSum", {}).get("label", 0)),
                    "app_id": app_id,
                    "country": country,
                })

            print(f"  Page {page}: {len(entries)} reviews")

            if len(entries) < 10:
                break  # Last partial page

            time.sleep(random.uniform(1.5, 3.0))

    return reviews

# Get reviews for Spotify
reviews = get_app_reviews("324684580", max_pages=5)
print(f"Total reviews: {len(reviews)}")

# Rating distribution
from collections import Counter
dist = Counter(r["rating"] for r in reviews)
for stars in range(5, 0, -1):
    bar = "█" * (dist[stars] * 20 // max(dist.values()))
    print(f"  {stars}★ {bar} {dist[stars]}")

Google Play Store: App Metadata

Google Play has no public JSON API. Scrape the HTML pages directly:

from selectolax.parser import HTMLParser
import re

def scrape_play_app(package_id: str, proxy_url: str = None) -> dict:
    """Scrape app metadata from Google Play listing page."""
    url = f"https://play.google.com/store/apps/details?id={package_id}&hl=en&gl=us"

    headers = {
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml",
        "Referer": "https://play.google.com/",
    }

    client_kwargs = {"headers": headers, "timeout": 20, "follow_redirects": True}
    if proxy_url:
        client_kwargs["proxies"] = {"all://": proxy_url}

    with httpx.Client(**client_kwargs) as client:
        resp = client.get(url)

    if resp.status_code != 200:
        return {"error": f"HTTP {resp.status_code}", "package_id": package_id}

    tree = HTMLParser(resp.text)

    # Try structured data first (JSON-LD)
    for script in tree.css('script[type="application/ld+json"]'):
        try:
            data = json.loads(script.text())
            if data.get("@type") in ("SoftwareApplication", "MobileApplication"):
                return {
                    "package_id": package_id,
                    "name": data.get("name"),
                    "description": data.get("description", "")[:500],
                    "rating": data.get("aggregateRating", {}).get("ratingValue"),
                    "rating_count": data.get("aggregateRating", {}).get("ratingCount"),
                    "price": data.get("offers", {}).get("price"),
                    "developer": data.get("author", {}).get("name"),
                    "category": data.get("applicationCategory"),
                    "content_rating": data.get("contentRating"),
                    "operating_system": data.get("operatingSystem"),
                    "source": "json_ld",
                }
        except json.JSONDecodeError:
            continue

    # Fallback: DOM parsing
    # App name
    name_el = tree.css_first("h1")
    name = name_el.text(strip=True) if name_el else None

    # Rating (in aria-label)
    rating = None
    rating_count = None
    for el in tree.css("[aria-label]"):
        aria = el.attributes.get("aria-label", "")
        if "Rated" in aria and "out of" in aria:
            m = re.search(r"Rated ([\d.]+)", aria)
            if m:
                rating = float(m.group(1))
        if "ratings" in aria.lower():
            m = re.search(r"([\d,]+) ratings", aria)
            if m:
                rating_count = int(m.group(1).replace(",", ""))

    # Download count
    downloads = None
    # Google uses spans with specific text patterns for downloads
    for el in tree.css("span"):
        text = el.text(strip=True)
        if re.match(r"[\d,]+\+$", text) or re.match(r"[\d.]+[MBK]\+$", text):
            downloads = text
            break

    # Developer
    developer = None
    dev_links = tree.css("a[href*='/store/apps/developer']")
    if dev_links:
        developer = dev_links[0].text(strip=True)

    return {
        "package_id": package_id,
        "name": name,
        "rating": rating,
        "rating_count": rating_count,
        "downloads": downloads,
        "developer": developer,
        "source": "dom",
    }

app = scrape_play_app("com.spotify.music")
print(f"{app['name']}: {app.get('rating')}★ ({app.get('rating_count'):,} ratings)")
print(f"Downloads: {app.get('downloads')}")

Google Play Reviews: Internal Batch API

Google Play serves reviews through a batch API endpoint. This is more reliable than scraping individual pages:

import json

def get_play_reviews(
    package_id: str,
    count: int = 100,
    sort: int = 3,
    proxy_url: str = None,
) -> list:
    """
    Fetch Google Play reviews via the internal batchexecute endpoint.
    sort: 1=most_relevant, 2=newest, 3=rating
    """
    url = "https://play.google.com/_/PlayStoreUi/data/batchexecute"

    # This payload structure is stable but may need updating if Google changes it
    # sort=2 for newest, sort=3 for rating
    inner_payload = json.dumps([
        None, None,
        [2, count, [None, None, None, None, None, None, None, None, [2]],
         None, None, None, None, None, None, None, None, None, None, None,
         [None, sort]],
        [package_id, 7]
    ])
    payload_str = json.dumps([[["UsvDTd", inner_payload, None, "generic"]]])

    headers = {
        "Content-Type": "application/x-www-form-urlencoded",
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
        ),
        "Origin": "https://play.google.com",
        "Referer": f"https://play.google.com/store/apps/details?id={package_id}",
        "X-Same-Domain": "1",
    }

    client_kwargs = {"headers": headers, "timeout": 20}
    if proxy_url:
        client_kwargs["proxies"] = {"all://": proxy_url}

    with httpx.Client(**client_kwargs) as client:
        resp = client.post(
            url,
            data={"f.req": payload_str},
        )

    if resp.status_code != 200:
        print(f"Batch API: HTTP {resp.status_code}")
        return []

    # Parse Google's wrapped response format
    text = resp.text
    if text.startswith(")]}\'\n"):
        text = text[5:]

    reviews = []
    try:
        outer = json.loads(text)
        inner_json = outer[0][2]
        if not inner_json:
            return []

        inner = json.loads(inner_json)
        review_list = inner[0] if inner else []

        for r in review_list:
            if not isinstance(r, list) or len(r) < 5:
                continue
            try:
                reviews.append({
                    "review_id": r[0] if r[0] else None,
                    "author": r[1][0] if r[1] else "Anonymous",
                    "author_image": r[1][1][3][2] if r[1] and len(r[1]) > 1 else None,
                    "rating": r[2],
                    "body": r[4],
                    "date_timestamp": r[5][0] if r[5] else None,
                    "helpful_count": r[6] if len(r) > 6 else 0,
                    "developer_reply": r[7][1] if len(r) > 7 and r[7] else None,
                    "app_version": r[10] if len(r) > 10 else None,
                })
            except (IndexError, TypeError):
                continue

    except (json.JSONDecodeError, IndexError, TypeError) as e:
        print(f"Parse error: {e}")

    return reviews

# Get 50 newest reviews for a Play Store app
reviews = get_play_reviews("com.netflix.mediaclient", count=50, sort=2)
print(f"Got {len(reviews)} reviews")
for r in reviews[:3]:
    print(f"  [{r['rating']}★] {(r['body'] or '')[:100]}")

Google Play: Residential Proxy Integration

Google Play aggressively blocks datacenter IPs. For scraping more than a handful of apps, residential proxies are essential.

ThorData's residential proxy network works well for Google Play — their rotating residential IPs avoid the ASN-based blocking that kills datacenter proxies on Google properties. The pool size is large enough that each IP sees only a few requests per day, staying well under Google's per-IP rate limits.

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"

def get_proxy(session_id=None, country="us"):
    """Build ThorData proxy URL."""
    if session_id:
        user = f"{THORDATA_USER}-session-{session_id}-country-{country}"
    else:
        user = f"{THORDATA_USER}-country-{country}"
    return f"http://{user}:{THORDATA_PASS}@proxy.thordata.com:9000"

def scrape_play_batch(package_ids: list, country="us") -> dict:
    """Scrape multiple Play Store apps with proxy rotation."""
    results = {}

    for i, pkg in enumerate(package_ids):
        # Rotate proxy every 10 requests
        session_id = (i // 10) * 1000 + random.randint(1, 999)
        proxy = get_proxy(session_id=session_id, country=country)

        try:
            data = scrape_play_app(pkg, proxy_url=proxy)
            results[pkg] = data
            print(f"[{i+1}/{len(package_ids)}] {pkg}: {data.get('name', '?')} {data.get('rating', '?')}★")
        except Exception as e:
            print(f"[{i+1}/{len(package_ids)}] {pkg}: error — {e}")
            results[pkg] = {"error": str(e)}

        time.sleep(random.uniform(2.0, 4.0))

    return results

Keyword Rank Tracking

Track where specific apps appear in search results over time:

def get_search_rankings(keyword: str, country: str = "us", limit: int = 50) -> list:
    """
    Check App Store keyword rankings — which apps appear and in what order.
    """
    url = "https://itunes.apple.com/search"
    params = {
        "term": keyword,
        "country": country,
        "media": "software",
        "limit": limit,
        "entity": "software",
    }
    headers = {
        "User-Agent": "iTunes/12.12.8",
        "Accept-Language": f"{country}-US",
    }

    resp = httpx.get(url, params=params, headers=headers, timeout=15)
    resp.raise_for_status()
    data = resp.json()

    return [
        {
            "rank": i + 1,
            "app_id": str(result["trackId"]),
            "name": result["trackName"],
            "developer": result["artistName"],
            "rating": result.get("averageUserRating"),
            "rating_count": result.get("userRatingCount"),
            "price": result.get("price", 0),
        }
        for i, result in enumerate(data.get("results", []))
    ]

def track_keyword_rankings(
    app_ids: list,
    keywords: list,
    db_conn,
    country: str = "us",
):
    """
    For each keyword, check if target apps appear and at what rank.
    Stores results for trend analysis.
    """
    for keyword in keywords:
        results = get_search_rankings(keyword, country)
        app_ranks = {r["app_id"]: r["rank"] for r in results}

        for app_id in app_ids:
            rank = app_ranks.get(str(app_id))
            print(f"  '{keyword}': App {app_id} rank = {rank or 'not in top 50'}")

            db_conn.execute("""
                INSERT INTO keyword_rankings (app_id, keyword, rank, country, checked_at)
                VALUES (?, ?, ?, ?, datetime('now'))
            """, (str(app_id), keyword, rank, country))

        db_conn.commit()
        time.sleep(random.uniform(1.5, 3.0))

Estimating Download Counts

Neither store publishes exact download numbers, but you can estimate:

Google Play Method

def parse_play_downloads(download_str: str) -> dict:
    """
    Parse Google Play download range string to numeric estimates.
    "10M+" -> {"min": 10_000_000, "label": "10M+"}
    "500K+" -> {"min": 500_000, "label": "500K+"}
    """
    if not download_str:
        return {"min": None, "label": None}

    # Normalize
    s = download_str.strip().replace(",", "")
    multipliers = {"K": 1_000, "M": 1_000_000, "B": 1_000_000_000}

    m = re.match(r"([\d.]+)([KMB]?)\+?$", s, re.IGNORECASE)
    if m:
        num = float(m.group(1))
        mult = multipliers.get(m.group(2).upper(), 1) if m.group(2) else 1
        return {"min": int(num * mult), "label": download_str}

    return {"min": None, "label": download_str}

Apple Estimation Formula

Apple doesn't show any download numbers. Analysts use rating count velocity as a proxy:

def estimate_ios_downloads(
    rating_count: int,
    category: str,
    chart_position: int = None,
) -> dict:
    """
    Rough iOS download estimation based on rating count.
    Reviews-to-downloads ratios by category (industry estimates):
    - Games: 1 review per 50-80 downloads
    - Utilities: 1 review per 80-120 downloads
    - Social: 1 review per 60-100 downloads
    """
    ratios = {
        "Games": (50, 80),
        "Utilities": (80, 120),
        "Social Networking": (60, 100),
        "Productivity": (70, 100),
        "default": (60, 100),
    }

    low_mult, high_mult = ratios.get(category, ratios["default"])

    return {
        "estimated_min": rating_count * low_mult,
        "estimated_max": rating_count * high_mult,
        "methodology": "rating_count_multiplier",
    }

Data Storage

import sqlite3

def init_app_db(db_path="app_intelligence.db"):
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS apps (
            app_id TEXT PRIMARY KEY,
            platform TEXT NOT NULL,
            bundle_id TEXT,
            name TEXT,
            developer TEXT,
            developer_id TEXT,
            category TEXT,
            primary_genre TEXT,
            price REAL,
            last_seen TEXT
        );

        CREATE TABLE IF NOT EXISTS chart_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            app_id TEXT NOT NULL,
            platform TEXT NOT NULL,
            chart_type TEXT NOT NULL,
            country TEXT NOT NULL,
            rank INTEGER,
            captured_at TEXT DEFAULT (datetime('now')),
            FOREIGN KEY (app_id) REFERENCES apps(app_id)
        );

        CREATE TABLE IF NOT EXISTS ratings_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            app_id TEXT NOT NULL,
            platform TEXT NOT NULL,
            country TEXT,
            rating REAL,
            rating_count INTEGER,
            rating_current_version REAL,
            version TEXT,
            captured_at TEXT DEFAULT (datetime('now')),
            FOREIGN KEY (app_id) REFERENCES apps(app_id)
        );

        CREATE TABLE IF NOT EXISTS keyword_rankings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            app_id TEXT NOT NULL,
            keyword TEXT NOT NULL,
            rank INTEGER,
            country TEXT NOT NULL,
            platform TEXT DEFAULT 'ios',
            checked_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS reviews (
            review_id TEXT,
            app_id TEXT,
            platform TEXT,
            rating INTEGER,
            title TEXT,
            body TEXT,
            author TEXT,
            app_version TEXT,
            review_date TEXT,
            helpful_count INTEGER DEFAULT 0,
            country TEXT,
            scraped_at TEXT DEFAULT (datetime('now')),
            PRIMARY KEY (review_id, platform)
        );

        CREATE INDEX IF NOT EXISTS idx_charts_app ON chart_snapshots(app_id, captured_at);
        CREATE INDEX IF NOT EXISTS idx_ratings_app ON ratings_snapshots(app_id, captured_at);
        CREATE INDEX IF NOT EXISTS idx_keywords ON keyword_rankings(keyword, checked_at);
        CREATE INDEX IF NOT EXISTS idx_reviews_app ON reviews(app_id, platform);
    """)
    conn.commit()
    return conn

def save_chart_snapshot(conn, rankings, platform, chart_type, country):
    """Save a set of chart rankings."""
    now = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())

    for app in rankings:
        # Upsert app record
        conn.execute("""
            INSERT OR REPLACE INTO apps (app_id, platform, name, developer, category, last_seen)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (
            str(app.app_id), platform, app.name,
            app.developer, app.category, now,
        ))

        # Insert snapshot
        conn.execute("""
            INSERT INTO chart_snapshots (app_id, platform, chart_type, country, rank, captured_at)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (str(app.app_id), platform, chart_type, country, app.rank, now))

    conn.commit()

Building a Competitive Intelligence Monitor

Daily monitoring setup:

import json
from pathlib import Path

def run_daily_monitor(
    target_app_ids: list,
    keywords: list,
    countries: list = None,
    db_path: str = "app_intel.db",
):
    """
    Daily competitor monitoring: charts, ratings, reviews, keyword ranks.
    """
    if countries is None:
        countries = ["us", "gb", "ca", "au"]

    conn = init_app_db(db_path)
    proxy = get_proxy(country="us")

    # 1. Top chart snapshots
    print("=== Fetching chart rankings ===")
    for country in countries[:2]:  # Limit to top markets
        for chart_type in ["top_free", "top_paid", "top_grossing"]:
            rankings = get_top_charts(country, chart_type, 100)
            save_chart_snapshot(conn, rankings, "ios", chart_type, country)
            print(f"  {country}/{chart_type}: {len(rankings)} apps")
            time.sleep(random.uniform(1, 2))

    # 2. Enrich target apps with latest metadata
    print("\n=== Fetching metadata for target apps ===")
    metadata = get_app_metadata(target_app_ids)
    now = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
    for app_id, meta in metadata.items():
        conn.execute("""
            INSERT INTO ratings_snapshots
            (app_id, platform, country, rating, rating_count,
             rating_current_version, version, captured_at)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """, (
            app_id, "ios", "us",
            meta.get("rating"), meta.get("rating_count"),
            meta.get("rating_current_version"), meta.get("version"), now,
        ))
    conn.commit()
    print(f"  Updated metadata for {len(metadata)} apps")

    # 3. Latest reviews
    print("\n=== Fetching recent reviews ===")
    for app_id in target_app_ids:
        reviews = get_app_reviews(str(app_id), max_pages=2)
        for r in reviews:
            conn.execute("""
                INSERT OR IGNORE INTO reviews
                (review_id, app_id, platform, rating, title, body,
                 author, app_version, review_date, helpful_count, country)
                VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            """, (
                r["review_id"], r["app_id"], "ios",
                r["rating"], r["title"], r["body"],
                r["author"], r["app_version"], r["date"],
                r.get("helpful_votes", 0), r["country"],
            ))
        conn.commit()
        print(f"  App {app_id}: {len(reviews)} reviews")
        time.sleep(random.uniform(2, 4))

    # 4. Keyword rankings
    print("\n=== Checking keyword rankings ===")
    track_keyword_rankings(
        [str(i) for i in target_app_ids],
        keywords, conn, country="us",
    )

    print("\nDaily monitor complete.")

# Example run
target_apps = [
    "389801252",  # Instagram
    "324684580",  # Spotify
    "835599320",  # Notion
]
keywords = ["note taking", "productivity", "task manager"]
run_daily_monitor(target_apps, keywords)

Rate Limiting Strategy

Apple is generous; Google is not. Different strategies for each:

class RateLimiter:
    """Simple rate limiter with per-domain configuration."""

    LIMITS = {
        "itunes.apple.com": {"min_delay": 0.5, "max_delay": 1.5},
        "rss.applemarketingtools.com": {"min_delay": 0.3, "max_delay": 1.0},
        "play.google.com": {"min_delay": 3.0, "max_delay": 7.0},
    }

    def __init__(self):
        self._last_request = {}

    def wait(self, domain):
        limits = self.LIMITS.get(domain, {"min_delay": 1.0, "max_delay": 3.0})
        last = self._last_request.get(domain, 0)
        elapsed = time.time() - last
        required = random.uniform(limits["min_delay"], limits["max_delay"])

        if elapsed < required:
            time.sleep(required - elapsed)

        self._last_request[domain] = time.time()

limiter = RateLimiter()

Summary

App store intelligence in 2026:

Apple (easy): - iTunes RSS feeds → clean JSON, hourly updates, no auth, generous rate limits - iTunes Lookup API → full metadata, batch up to 200 IDs per request - Review RSS → paginated reviews, stable for years

Google Play (harder): - HTML scraping → DOM + JSON-LD, works for single apps - Internal batch API → reviews and bulk data, changes format periodically - Always requires residential proxies — ThorData for consistent access without datacenter IP blocks

Build the SQLite schema from day one, save rankings as time-series snapshots, and you have a competitive intelligence platform that rivals $500/month ASO tools within 4–6 weeks of daily data collection.