Pulling World Bank Economic Data with Python (2026)

2026-04-09 ["world bank" "web scraping" "python" "economics" "api" "data pipeline"]

Pulling World Bank Economic Data with Python (2026)

The World Bank Open Data API is one of the friendlier public data sources you'll encounter. No authentication required, clean JSON responses, well-documented indicator codes, and data spanning 200+ countries going back decades. It covers GDP, population, inflation, trade, poverty, health, and education metrics — the kind of macro dataset that underpins serious economic research.

The catch: at scale, bulk downloads and high-frequency polling will hit undocumented rate limits. The API is public and generous for moderate use, but if you're building a data pipeline that pulls hundreds of indicators across all countries and years, you'll need to handle throttling carefully and design your pipeline for idempotent incremental updates.

This guide covers the full API structure, practical data retrieval patterns, async bulk collection, rate limit handling, SQLite storage, and automated refresh pipelines.

What Data Is Available

The World Bank tracks thousands of indicators across the full development spectrum:

National accounts and output - GDP (current USD), GDP per capita, GDP growth rate (annual %) - GNI, GNI per capita (Atlas method and PPP) - Gross capital formation, gross savings - Industry/services/agriculture as % of GDP

Population and demographics - Total population, urban and rural population - Population growth rate, fertility rate, birth/death rates - Age dependency ratio, median age - Urban population % of total

Inflation and monetary - CPI, inflation rate (annual %) - Lending interest rate, real interest rate - Broad money (M2) as % of GDP

Trade and balance of payments - Exports and imports of goods and services - Trade balance, current account balance as % of GDP - Foreign direct investment (net inflows, % of GDP) - External debt, total reserves

Poverty and inequality - Poverty headcount ratio at $2.15/day (2017 PPP) - Poverty headcount ratio at national poverty lines - Gini coefficient (income inequality measure) - Income share held by lowest/highest quintiles

Education - Literacy rate (adult and youth) - School enrollment (primary, secondary, tertiary) - Government expenditure on education (% of GDP) - Pupil-teacher ratios

Health - Life expectancy at birth - Infant mortality rate (per 1,000 live births) - Maternal mortality ratio - Hospital beds per 1,000 people - Health expenditure (% of GDP, per capita)

Environment and infrastructure - CO2 emissions per capita - Renewable energy % of total - Access to electricity (% of population) - Internet users % of population - Mobile subscriptions per 100 people

The full catalog is browsable at data.worldbank.org/indicator — over 1,600 indicators total.

The World Bank API Structure

Base URL: https://api.worldbank.org/v2/

All responses default to XML. Always pass format=json.

Key URL patterns:

# Single indicator for one country
GET /v2/country/{code}/indicator/{indicator}?format=json

# Specific date range
GET /v2/country/{code}/indicator/{indicator}?format=json&date=2010:2023

# Multiple countries (semicolon-separated, max ~50)
GET /v2/country/US;CN;DE/indicator/NY.GDP.MKTP.CD?format=json

# All countries
GET /v2/country/all/indicator/NY.GDP.MKTP.CD?format=json

# Most recent N values
GET /v2/country/{code}/indicator/{indicator}?format=json&mrv=5

# Indicator metadata
GET /v2/indicator/{indicator}?format=json

Response structure. Every response is a two-element JSON array: - [0] — metadata: total, page, pages, per_page, lastupdated - [1] — data array of observation objects

Each observation has: country.id, country.value, indicator.id, indicator.value, date, value, unit, obs_status, decimal.

Pagination. Default per_page is 50. Max is 1000. Use page parameter to iterate. The metadata object tells you pages (total page count) and total (total records).

Common indicator codes:

Code	Description
`NY.GDP.MKTP.CD`	GDP (current USD)
`NY.GDP.PCAP.CD`	GDP per capita (current USD)
`NY.GDP.MKTP.KD.ZG`	GDP growth rate (annual %)
`NY.GDP.PCAP.PP.CD`	GDP per capita, PPP (current international $)
`SP.POP.TOTL`	Population, total
`SP.URB.TOTL.IN.ZS`	Urban population (% of total)
`FP.CPI.TOTL.ZG`	Inflation, consumer prices (annual %)
`NE.TRD.GNFS.ZS`	Trade (% of GDP)
`BX.KLT.DINV.WD.GD.ZS`	Foreign direct investment, net inflows (% of GDP)
`SI.POV.NAHC`	Poverty headcount ratio at national lines
`SI.POV.GINI`	Gini index
`SE.ADT.LITR.ZS`	Literacy rate, adult total
`SE.ENR.PRSC.FM.ZS`	School enrollment, primary (gross %)
`SP.DYN.LE00.IN`	Life expectancy at birth, total (years)
`SP.DYN.IMRT.IN`	Mortality rate, infant (per 1,000 live births)
`EN.ATM.CO2E.PC`	CO2 emissions (metric tons per capita)
`EG.ELC.ACCS.ZS`	Access to electricity (% of population)
`IT.NET.USER.ZS`	Individuals using the Internet (% of population)

Basic Data Retrieval

Fetching GDP data for a set of countries with clean error handling:

import httpx
import time
from typing import Optional

BASE_URL = "https://api.worldbank.org/v2"


def fetch_indicator(
    indicator: str,
    countries: list[str],
    start_year: int,
    end_year: int,
    per_page: int = 1000,
) -> list[dict]:
    """
    Fetch a single indicator for one or more countries over a date range.

    Returns a list of observation dicts with country_code, country_name,
    indicator, year, and value fields.
    """
    country_str = ";".join(countries)
    url = f"{BASE_URL}/country/{country_str}/indicator/{indicator}"
    params = {
        "format": "json",
        "date": f"{start_year}:{end_year}",
        "per_page": per_page,
        "page": 1,
    }

    all_rows = []

    while True:
        try:
            response = httpx.get(url, params=params, timeout=30)
            response.raise_for_status()
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 429:
                print(f"Rate limited, waiting 60s...")
                time.sleep(60)
                continue
            raise

        payload = response.json()

        # Validate response structure
        if not isinstance(payload, list) or len(payload) < 2:
            print(f"Unexpected response format for {indicator}")
            break

        meta, data = payload[0], payload[1]

        if data is None:
            break

        for entry in data:
            if entry.get("value") is None:
                continue
            all_rows.append({
                "country_code": entry["country"]["id"],
                "country_name": entry["country"]["value"],
                "indicator_code": indicator,
                "year": int(entry["date"]),
                "value": float(entry["value"]),
                "unit": entry.get("unit", ""),
            })

        # Check if there are more pages
        total_pages = meta.get("pages", 1)
        if params["page"] >= total_pages:
            break

        params["page"] += 1
        time.sleep(0.3)  # gentle pacing

    return all_rows


# Fetch GDP for G7 countries, 2015–2023
g7 = ["US", "GB", "DE", "FR", "JP", "CA", "IT"]
gdp_data = fetch_indicator("NY.GDP.MKTP.CD", g7, 2015, 2023)

for row in sorted(gdp_data, key=lambda x: (x["country_code"], x["year"])):
    gdp_trillions = row["value"] / 1e12
    print(f"{row['country_code']} {row['year']}: ${gdp_trillions:.2f}T")

Async Bulk Collection

For pulling many indicators in parallel, asyncio with httpx reduces wall-clock time dramatically:

import asyncio
import httpx
import pandas as pd
from typing import Optional

BASE_URL = "https://api.worldbank.org/v2"

INDICATORS = {
    "NY.GDP.MKTP.CD": "gdp_usd",
    "NY.GDP.PCAP.CD": "gdp_per_capita",
    "NY.GDP.MKTP.KD.ZG": "gdp_growth_pct",
    "NY.GDP.PCAP.PP.CD": "gdp_per_capita_ppp",
    "SP.POP.TOTL": "population",
    "SP.URB.TOTL.IN.ZS": "urban_pct",
    "FP.CPI.TOTL.ZG": "inflation_pct",
    "NE.TRD.GNFS.ZS": "trade_pct_gdp",
    "BX.KLT.DINV.WD.GD.ZS": "fdi_pct_gdp",
    "SP.DYN.LE00.IN": "life_expectancy",
    "SP.DYN.IMRT.IN": "infant_mortality",
    "SI.POV.GINI": "gini_index",
    "EN.ATM.CO2E.PC": "co2_per_capita",
    "IT.NET.USER.ZS": "internet_users_pct",
    "EG.ELC.ACCS.ZS": "electricity_access_pct",
}


async def fetch_all_pages_async(
    client: httpx.AsyncClient,
    url: str,
    params: dict,
) -> list[dict]:
    """Fetch all paginated results for a World Bank API endpoint."""
    all_data = []
    page = 1

    while True:
        params_copy = {**params, "page": page}

        for attempt in range(3):
            try:
                resp = await client.get(url, params=params_copy, timeout=30)
                resp.raise_for_status()
                break
            except httpx.HTTPStatusError as e:
                if e.response.status_code == 429 and attempt < 2:
                    await asyncio.sleep(30 * (attempt + 1))
                    continue
                raise
            except httpx.TimeoutException:
                if attempt < 2:
                    await asyncio.sleep(5)
                    continue
                raise

        payload = resp.json()
        if not isinstance(payload, list) or len(payload) < 2 or payload[1] is None:
            break

        meta, data = payload[0], payload[1]
        all_data.extend(data)

        if page >= meta.get("pages", 1):
            break
        page += 1

    return all_data


async def fetch_indicator_df(
    client: httpx.AsyncClient,
    indicator_code: str,
    column_name: str,
    start_year: int,
    end_year: int,
    semaphore: asyncio.Semaphore,
) -> pd.DataFrame:
    """Fetch one indicator for all countries and return as DataFrame."""
    async with semaphore:
        url = f"{BASE_URL}/country/all/indicator/{indicator_code}"
        params = {
            "format": "json",
            "date": f"{start_year}:{end_year}",
            "per_page": 1000,
        }

        data = await fetch_all_pages_async(client, url, params)

        records = []
        for entry in data:
            if entry.get("value") is None:
                continue
            # Filter out regional/aggregate codes (length > 3 or all caps multi-word)
            country_id = entry["country"]["id"]
            if len(country_id) != 2:
                continue
            records.append({
                "country_code": country_id,
                "country_name": entry["country"]["value"],
                "year": int(entry["date"]),
                column_name: float(entry["value"]),
            })

        await asyncio.sleep(0.2)  # light pacing between indicator fetches
        return pd.DataFrame(records)


async def collect_all_indicators(
    start_year: int = 2018,
    end_year: int = 2023,
    max_concurrency: int = 4,
) -> pd.DataFrame:
    """
    Pull all configured indicators for all countries in parallel.

    max_concurrency: limit parallel requests to avoid rate limiting.
    Returns a wide DataFrame with one row per country-year.
    """
    semaphore = asyncio.Semaphore(max_concurrency)

    async with httpx.AsyncClient() as client:
        tasks = [
            fetch_indicator_df(client, code, label, start_year, end_year, semaphore)
            for code, label in INDICATORS.items()
        ]
        dfs = await asyncio.gather(*tasks, return_exceptions=True)

    # Merge all indicator DataFrames on country + year
    valid_dfs = [df for df in dfs if isinstance(df, pd.DataFrame) and not df.empty]
    if not valid_dfs:
        return pd.DataFrame()

    merged = valid_dfs[0]
    for df in valid_dfs[1:]:
        merged = merged.merge(
            df, on=["country_code", "country_name", "year"], how="outer"
        )

    return merged.sort_values(["country_code", "year"]).reset_index(drop=True)


# Run the full collection
df = asyncio.run(collect_all_indicators(2018, 2023))
print(f"Collected: {len(df):,} rows, {df['country_code'].nunique()} countries, "
      f"{df.columns.tolist()}")

Rate Limiting in Practice

The World Bank API is permissive for single-user access but has undocumented rate limits that surface during bulk collection. Symptoms: 429 responses, sudden connection resets, or responses that return empty data arrays without error codes. The limits appear to be IP-based rather than key-based.

Observed thresholds: - Single-page requests: fine up to 60/minute per IP - Paginated bulk pulls across all countries: 3-5 concurrent requests is the safe ceiling - Parallel requests beyond that: intermittent 429s and connection drops that require backing off

For production pipelines that poll large indicator sets across all countries on a regular schedule, IP rotation distributes the load across addresses. ThorData's residential proxies work well here — route bulk requests through the pool while keeping your direct IP for low-volume or interactive queries:

import httpx

THORDATA_PROXY = "http://USER:[email protected]:9000"


def make_proxied_client() -> httpx.AsyncClient:
    """Create an httpx async client routed through ThorData residential proxies."""
    return httpx.AsyncClient(
        transport=httpx.AsyncHTTPTransport(proxy=THORDATA_PROXY),
        timeout=30,
        headers={"User-Agent": "WorldBankDataPipeline/1.0"},
    )


async def fetch_with_proxy(indicator: str, countries: list[str], year: int) -> list[dict]:
    """Fetch a single indicator for specific countries via proxy."""
    async with make_proxied_client() as client:
        url = f"{BASE_URL}/country/{';'.join(countries)}/indicator/{indicator}"
        params = {"format": "json", "date": str(year), "per_page": 500}
        resp = await client.get(url, params=params)
        payload = resp.json()
        if len(payload) < 2 or payload[1] is None:
            return []
        return [
            {
                "country_code": e["country"]["id"],
                "year": year,
                indicator: float(e["value"]) if e["value"] else None,
            }
            for e in payload[1]
        ]

Even without proxies, adding asyncio.sleep(0.5) between page fetches and capping concurrency at 3-4 tasks gets you through a full bulk pull without interruptions.

SQLite Storage Schema

For a persistent, queryable data store:

import sqlite3
import json
from datetime import datetime, timezone


def init_world_bank_db(db_path: str = "world_bank.db") -> sqlite3.Connection:
    """Initialize the World Bank data SQLite database."""
    conn = sqlite3.connect(db_path)

    conn.executescript("""
        CREATE TABLE IF NOT EXISTS indicators (
            code        TEXT PRIMARY KEY,
            name        TEXT,
            description TEXT,
            unit        TEXT,
            source      TEXT,
            last_updated TEXT
        );

        CREATE TABLE IF NOT EXISTS observations (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            country_code    TEXT NOT NULL,
            country_name    TEXT NOT NULL,
            indicator_code  TEXT NOT NULL,
            year            INTEGER NOT NULL,
            value           REAL,
            unit            TEXT,
            scraped_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            UNIQUE (country_code, indicator_code, year)
        );

        CREATE TABLE IF NOT EXISTS countries (
            code        TEXT PRIMARY KEY,
            name        TEXT,
            region      TEXT,
            income_group TEXT,
            capital     TEXT,
            iso2        TEXT,
            iso3        TEXT
        );

        CREATE INDEX IF NOT EXISTS idx_obs_country_year
            ON observations(country_code, year);
        CREATE INDEX IF NOT EXISTS idx_obs_indicator
            ON observations(indicator_code);
        CREATE INDEX IF NOT EXISTS idx_obs_year
            ON observations(year);
    """)

    conn.commit()
    return conn


def bulk_insert_observations(conn: sqlite3.Connection, rows: list[dict]):
    """Bulk insert observations, replacing on conflict."""
    conn.executemany("""
        INSERT OR REPLACE INTO observations
            (country_code, country_name, indicator_code, year, value, unit, scraped_at)
        VALUES (?, ?, ?, ?, ?, ?, ?)
    """, [
        (
            r["country_code"],
            r["country_name"],
            r["indicator_code"],
            r["year"],
            r.get("value"),
            r.get("unit", ""),
            datetime.now(timezone.utc).isoformat(),
        )
        for r in rows
    ])
    conn.commit()


def fetch_country_snapshot(conn: sqlite3.Connection, country_code: str, year: int) -> dict:
    """Get all indicators for a country in a given year."""
    rows = conn.execute("""
        SELECT indicator_code, value
        FROM observations
        WHERE country_code = ? AND year = ?
    """, (country_code, year)).fetchall()
    return {row[0]: row[1] for row in rows}


def get_indicator_series(
    conn: sqlite3.Connection,
    indicator_code: str,
    country_codes: list[str],
    start_year: int,
    end_year: int,
) -> list[dict]:
    """Retrieve time series data for specific countries and an indicator."""
    placeholders = ",".join("?" * len(country_codes))
    return conn.execute(f"""
        SELECT country_code, country_name, year, value
        FROM observations
        WHERE indicator_code = ?
          AND country_code IN ({placeholders})
          AND year BETWEEN ? AND ?
        ORDER BY country_code, year
    """, [indicator_code, *country_codes, start_year, end_year]).fetchall()


def get_cross_country_ranking(
    conn: sqlite3.Connection,
    indicator_code: str,
    year: int,
    top_n: int = 20,
) -> list[dict]:
    """Rank countries by indicator value for a specific year."""
    return conn.execute("""
        SELECT country_name, country_code, value
        FROM observations
        WHERE indicator_code = ? AND year = ? AND value IS NOT NULL
        ORDER BY value DESC
        LIMIT ?
    """, (indicator_code, year, top_n)).fetchall()

Country Metadata Collection

The World Bank also provides clean metadata about countries, including region, income group, and ISO codes:

def fetch_country_metadata() -> list[dict]:
    """Fetch country metadata from the World Bank countries endpoint."""
    url = f"{BASE_URL}/country/all"
    params = {"format": "json", "per_page": 300}

    resp = httpx.get(url, params=params, timeout=30)
    payload = resp.json()

    if len(payload) < 2 or payload[1] is None:
        return []

    countries = []
    for c in payload[1]:
        # Skip aggregates/regions — they have multi-character codes
        if len(c["id"]) != 2:
            continue
        countries.append({
            "code": c["id"],
            "name": c["name"],
            "region": c.get("region", {}).get("value", ""),
            "income_group": c.get("incomeLevel", {}).get("value", ""),
            "capital": c.get("capitalCity", ""),
            "iso2": c.get("iso2Code", ""),
        })

    return countries


def populate_country_table(conn: sqlite3.Connection):
    """Fetch and store all country metadata."""
    countries = fetch_country_metadata()
    conn.executemany("""
        INSERT OR REPLACE INTO countries
            (code, name, region, income_group, capital, iso2)
        VALUES (?, ?, ?, ?, ?, ?)
    """, [
        (c["code"], c["name"], c["region"], c["income_group"], c["capital"], c["iso2"])
        for c in countries
    ])
    conn.commit()
    print(f"Stored metadata for {len(countries)} countries")

Building a Country Dashboard Dataset

Combining multiple indicators into a single export-ready CSV for analysis or visualization:

import asyncio
import pandas as pd


async def build_country_dashboard(
    country_codes: list[str],
    year: int = 2022,
    use_proxy: bool = False,
) -> pd.DataFrame:
    """
    Build a single-year wide-format snapshot for a list of countries.

    Returns DataFrame with one row per country, columns for each indicator.
    """
    indicators = {
        "NY.GDP.MKTP.CD": "gdp_usd",
        "NY.GDP.PCAP.CD": "gdp_per_capita",
        "NY.GDP.MKTP.KD.ZG": "gdp_growth_pct",
        "SP.POP.TOTL": "population",
        "FP.CPI.TOTL.ZG": "inflation_pct",
        "NE.TRD.GNFS.ZS": "trade_pct_gdp",
        "SP.DYN.LE00.IN": "life_expectancy",
        "SP.DYN.IMRT.IN": "infant_mortality",
        "SI.POV.GINI": "gini_index",
        "EN.ATM.CO2E.PC": "co2_per_capita",
        "IT.NET.USER.ZS": "internet_pct",
        "EG.ELC.ACCS.ZS": "electricity_access_pct",
    }

    country_str = ";".join(country_codes)
    frames = []

    client_kwargs = {}
    if use_proxy:
        client_kwargs["transport"] = httpx.AsyncHTTPTransport(proxy=THORDATA_PROXY)

    async with httpx.AsyncClient(timeout=30, **client_kwargs) as client:
        for code, label in indicators.items():
            url = f"{BASE_URL}/country/{country_str}/indicator/{code}"
            params = {"format": "json", "date": str(year), "per_page": 500}

            try:
                resp = await client.get(url, params=params)
                payload = resp.json()
            except Exception as e:
                print(f"Failed to fetch {code}: {e}")
                continue

            if len(payload) < 2 or payload[1] is None:
                continue

            for entry in payload[1]:
                if entry.get("value") is None:
                    continue
                frames.append({
                    "country_code": entry["country"]["id"],
                    "country_name": entry["country"]["value"],
                    "year": year,
                    label: float(entry["value"]),
                })

            await asyncio.sleep(0.4)

    if not frames:
        return pd.DataFrame()

    df = pd.DataFrame(frames)
    # Collapse to one row per country (outer merge all columns)
    df = df.groupby(["country_code", "country_name", "year"]).first().reset_index()
    return df.sort_values("country_name")


# Build dashboard for 30 major economies
major_economies = [
    "US", "CN", "DE", "JP", "GB", "FR", "IN", "IT", "CA", "KR",
    "BR", "AU", "ES", "MX", "ID", "NL", "SA", "TR", "CH", "AR",
    "SE", "PL", "BE", "NG", "ZA", "EG", "TH", "PK", "BD", "VN",
]

dashboard = asyncio.run(build_country_dashboard(major_economies, year=2022))
dashboard.to_csv("world_bank_dashboard_2022.csv", index=False)
print(f"Saved {len(dashboard)} countries")
print(dashboard[["country_name", "gdp_per_capita", "life_expectancy", "gini_index"]].to_string(index=False))

Indicator Metadata and Documentation

Every indicator has a full metadata record with description, methodology notes, and source:

def get_indicator_metadata(indicator_code: str) -> dict:
    """Fetch documentation for a specific World Bank indicator."""
    url = f"{BASE_URL}/indicator/{indicator_code}"
    params = {"format": "json"}

    resp = httpx.get(url, params=params, timeout=20)
    payload = resp.json()

    if len(payload) < 2 or not payload[1]:
        return {}

    ind = payload[1][0]
    return {
        "code": ind.get("id"),
        "name": ind.get("name"),
        "unit": ind.get("unit"),
        "source": ind.get("source", {}).get("value"),
        "description": ind.get("sourceNote"),
        "organization": ind.get("sourceOrganization"),
        "topics": [t.get("value") for t in ind.get("topics", [])],
    }


# Example: document GDP per capita indicator
meta = get_indicator_metadata("NY.GDP.PCAP.CD")
print(meta["name"])
print(meta["description"][:200])

Automated Pipeline with Incremental Updates

For production use, track what has already been fetched and only pull new or updated data:

import sqlite3
from datetime import datetime, timezone


def get_last_scraped_year(conn: sqlite3.Connection, indicator_code: str) -> int | None:
    """Return the most recent year stored for an indicator."""
    result = conn.execute("""
        SELECT MAX(year) FROM observations WHERE indicator_code = ?
    """, (indicator_code,)).fetchone()
    return result[0] if result and result[0] else None


def get_api_last_updated(indicator_code: str) -> str | None:
    """Check when the World Bank last updated a specific indicator."""
    url = f"{BASE_URL}/country/all/indicator/{indicator_code}"
    params = {"format": "json", "per_page": 1}

    try:
        resp = httpx.get(url, params=params, timeout=15)
        payload = resp.json()
        return payload[0].get("lastupdated") if payload else None
    except Exception:
        return None


def needs_refresh(conn: sqlite3.Connection, indicator_code: str) -> bool:
    """
    Determine if an indicator should be re-fetched.
    True if API has been updated more recently than our last scrape.
    """
    last_update = get_api_last_updated(indicator_code)
    if not last_update:
        return True

    result = conn.execute("""
        SELECT MAX(scraped_at) FROM observations WHERE indicator_code = ?
    """, (indicator_code,)).fetchone()

    if not result or not result[0]:
        return True

    last_scrape = result[0]
    return last_update > last_scrape[:10]  # compare YYYY-MM-DD


def incremental_refresh(db_path: str = "world_bank.db"):
    """
    Refresh all configured indicators, only pulling those updated since last scrape.
    """
    conn = init_world_bank_db(db_path)

    for indicator_code, column_name in INDICATORS.items():
        if not needs_refresh(conn, indicator_code):
            print(f"Skipping {indicator_code} (up to date)")
            continue

        print(f"Refreshing {indicator_code}...")
        rows = fetch_indicator(indicator_code, ["all"], 2000, 2023)
        bulk_insert_observations(conn, rows)
        print(f"  Stored {len(rows)} observations")
        time.sleep(1.0)

    conn.close()


# Schedule this via cron:
# 0 6 * * 1 python3 /path/to/refresh_world_bank.py

Practical Tips for Production Use

Filter out aggregates. The API returns both individual countries and regional aggregates (e.g., ZG for Sub-Saharan Africa, OED for OECD members). Filter by checking that country.id is exactly 2 characters — those are ISO-3166 alpha-2 codes for actual countries. Anything longer is an aggregate group.

Missing values are normal. Not every country reports every indicator every year. Poverty and inequality metrics (Gini, poverty headcount) often have 5-7 year gaps for developing countries. Always expect sparse data and design your analysis to handle None values without crashing.

Use the mrv parameter for current data. mrv=5 returns the 5 most recent values for an indicator — useful when you want current data without specifying exact years. Combine with gapfill=Y to fill gaps with the most recent available value.

The fields parameter reduces payload size. You can request specific fields: ?fields=country,date,value — useful when you're collecting millions of observations and want to minimize bandwidth.

Quarterly and monthly data exists. Some indicators (inflation, exchange rates) have sub-annual frequency. Use date=2022Q1:2023Q4 or date=2022M01:2023M12 syntax for quarterly and monthly data.

API updates lag. Annual indicators typically appear 12-18 months after the reference year. GDP data for 2023 typically arrives in mid-2024. Don't expect current-year data — the API reflects the latest published World Bank estimates, not real-time statistics.

The World Bank API is genuinely one of the easier public data sources to work with at scale. The combination of broad coverage, no authentication requirement, and clean JSON structure makes it ideal for building economic research tools, country comparison dashboards, and data journalism pipelines. With proper async pagination and rate limit handling, you can collect the full indicator catalog for all 200+ countries in under an hour.