How to Scrape Yahoo Finance with Python (2026)

2026-04-09 [python yahoo-finance scraping stocks financial-data]

How to Scrape Yahoo Finance with Python (2026)

Yahoo Finance is one of the most accessible sources for financial data. While their official API was discontinued years ago, the internal endpoints that power the website are well-known and return clean JSON. You can pull real-time quotes, historical OHLCV data, financial statements, options chains, earnings estimates, and analyst recommendations — all without authentication for public market data.

This guide covers the key Yahoo Finance endpoints, practical Python code for each, SQLite storage for building a financial data pipeline, error handling for production use, and proxy rotation for high-volume fetching.

Why Yahoo Finance Over Paid Data Providers?

Yahoo Finance provides data that typically costs $30-100/month from paid providers:

Real-time quotes (15-minute delay for free users, which is sufficient for most research)
5+ years of daily OHLCV history for any listed security
Quarterly and annual financial statements (income statement, balance sheet, cash flow)
Analyst consensus estimates and EPS forecasts
Options chains with Greeks
Insider transactions and institutional ownership
ESG scores
Earnings calendar data

The endpoints are the same ones used by the Yahoo Finance website itself. No API key, no billing, no rate limit that prevents reasonable use.

The Key Endpoints

Before writing any code, here's what Yahoo Finance exposes:

Endpoint	URL Pattern	Purpose
v8 chart	`/v8/finance/chart/SYMBOL`	Quotes, OHLCV history, splits, dividends
v10 quoteSummary	`/v10/finance/quoteSummary/SYMBOL`	Everything else (financials, profile, earnings)
v7 quote	`/v7/finance/quote`	Batch quotes for multiple symbols
v7 options	`/v7/finance/options/SYMBOL`	Options chain

Real-Time Stock Quotes

# yahoo_finance.py
import httpx
import time
import random
from typing import Optional

YAHOO_BASE = "https://query1.finance.yahoo.com"

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/121.0.0.0 Safari/537.36"
    ),
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://finance.yahoo.com",
    "Referer": "https://finance.yahoo.com/",
}


def make_client(proxy_url: Optional[str] = None) -> httpx.Client:
    """Create an httpx client with optional proxy support."""
    client_kwargs = {
        "headers": HEADERS,
        "follow_redirects": True,
        "timeout": 30,
    }
    if proxy_url:
        client_kwargs["proxy"] = proxy_url
    return httpx.Client(**client_kwargs)


def get_quote(symbol: str, client: Optional[httpx.Client] = None) -> dict:
    """
    Get real-time quote data for a stock symbol.
    Returns price, change, volume, market cap, and trading session info.
    """
    if client is None:
        client = make_client()

    url = f"{YAHOO_BASE}/v8/finance/chart/{symbol}"
    params = {
        "range": "1d",
        "interval": "1m",
        "includePrePost": "true",
        "useYfid": "true",
    }

    resp = client.get(url, params=params)

    if resp.status_code == 429:
        raise RuntimeError(f"Rate limited fetching {symbol}")
    if resp.status_code == 404:
        raise ValueError(f"Symbol not found: {symbol}")
    resp.raise_for_status()

    data = resp.json()

    if data.get("chart", {}).get("error"):
        err = data["chart"]["error"]
        raise ValueError(f"Yahoo Finance error for {symbol}: {err}")

    result = data["chart"]["result"][0]
    meta = result["meta"]

    # Determine current price based on market state
    market_state = meta.get("marketState", "REGULAR")
    if market_state == "PRE":
        current_price = meta.get("chartPreviousClose", meta.get("regularMarketPrice"))
    elif market_state == "POST":
        current_price = meta.get("postMarketPrice", meta.get("regularMarketPrice"))
    else:
        current_price = meta.get("regularMarketPrice")

    prev_close = meta.get("previousClose") or meta.get("chartPreviousClose")
    change = (current_price - prev_close) if current_price and prev_close else None
    change_pct = (change / prev_close * 100) if change and prev_close else None

    return {
        "symbol": meta.get("symbol"),
        "exchange": meta.get("exchangeName"),
        "currency": meta.get("currency"),
        "market_state": market_state,
        "price": current_price,
        "previous_close": prev_close,
        "change": round(change, 4) if change else None,
        "change_pct": round(change_pct, 4) if change_pct else None,
        "open": meta.get("regularMarketOpen"),
        "day_high": meta.get("regularMarketDayHigh"),
        "day_low": meta.get("regularMarketDayLow"),
        "volume": meta.get("regularMarketVolume"),
        "avg_volume": meta.get("averageDailyVolume10Day"),
        "market_cap": meta.get("marketCap"),
        "52w_high": meta.get("fiftyTwoWeekHigh"),
        "52w_low": meta.get("fiftyTwoWeekLow"),
        "timezone": meta.get("timezone"),
    }


quote = get_quote("AAPL")
direction = "+" if (quote["change"] or 0) >= 0 else ""
print(f"{quote['symbol']}: ${quote['price']:.2f} ({direction}{quote['change_pct']:.2f}%)")
print(f"Volume: {quote['volume']:,}")
print(f"Market Cap: ${quote['market_cap']:,.0f}")
print(f"52-Week Range: ${quote['52w_low']:.2f} - ${quote['52w_high']:.2f}")

Batch Quotes for Multiple Tickers

For efficiency, query multiple symbols in one request using the v7 quote endpoint:

def get_batch_quotes(symbols: list, client: Optional[httpx.Client] = None) -> list:
    """
    Get quotes for multiple symbols in a single request.
    More efficient than individual calls when monitoring a portfolio.
    """
    if client is None:
        client = make_client()

    url = f"{YAHOO_BASE}/v7/finance/quote"
    params = {
        "symbols": ",".join(symbols),
        "fields": ",".join([
            "regularMarketPrice",
            "regularMarketChange",
            "regularMarketChangePercent",
            "regularMarketVolume",
            "regularMarketDayHigh",
            "regularMarketDayLow",
            "marketCap",
            "trailingPE",
            "forwardPE",
            "dividendYield",
            "fiftyTwoWeekHigh",
            "fiftyTwoWeekLow",
            "shortName",
            "longName",
        ]),
    }

    resp = client.get(url, params=params)
    resp.raise_for_status()
    data = resp.json()

    results = []
    for q in data.get("quoteResponse", {}).get("result", []):
        change = q.get("regularMarketChange", 0)
        results.append({
            "symbol": q.get("symbol"),
            "name": q.get("shortName") or q.get("longName"),
            "price": q.get("regularMarketPrice"),
            "change": round(change, 4),
            "change_pct": round(q.get("regularMarketChangePercent", 0), 4),
            "volume": q.get("regularMarketVolume"),
            "market_cap": q.get("marketCap"),
            "pe_trailing": q.get("trailingPE"),
            "pe_forward": q.get("forwardPE"),
            "dividend_yield": q.get("dividendYield"),
            "52w_high": q.get("fiftyTwoWeekHigh"),
            "52w_low": q.get("fiftyTwoWeekLow"),
            "day_high": q.get("regularMarketDayHigh"),
            "day_low": q.get("regularMarketDayLow"),
        })

    return results


# Get the S&P 500 megacaps in one request
megacaps = get_batch_quotes(["AAPL", "MSFT", "GOOGL", "AMZN", "NVDA", "META", "TSLA", "BRK-B"])
print(f"{'Symbol':<8} {'Price':>10} {'Change':>10} {'Mkt Cap':>15}")
for q in sorted(megacaps, key=lambda x: x["change_pct"] or 0, reverse=True):
    mc = f"${q['market_cap']/1e12:.2f}T" if q['market_cap'] and q['market_cap'] > 1e12 else ""
    chg = f"{q['change_pct']:+.2f}%" if q['change_pct'] else "N/A"
    print(f"{q['symbol']:<8} ${q['price']:>9.2f} {chg:>10} {mc:>15}")

Historical OHLCV Data

The v8 chart endpoint handles historical data. Change the range and interval parameters:

import datetime

def get_historical(
    symbol: str,
    period: str = "1y",
    interval: str = "1d",
    client: Optional[httpx.Client] = None,
) -> list:
    """
    Fetch historical OHLCV data.

    period: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max
    interval: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo

    Note: Intraday intervals (1m-90m) only available for recent periods.
    1m: max 7 days, 5m: max 60 days, 1h: max 730 days
    """
    if client is None:
        client = make_client()

    url = f"{YAHOO_BASE}/v8/finance/chart/{symbol}"
    params = {
        "range": period,
        "interval": interval,
        "includeAdjustedClose": "true",
        "events": "div,split",  # include dividend and split events
    }

    resp = client.get(url, params=params)
    resp.raise_for_status()
    data = resp.json()

    if data.get("chart", {}).get("error"):
        raise ValueError(f"Error fetching {symbol}: {data['chart']['error']}")

    result = data["chart"]["result"][0]
    timestamps = result.get("timestamp", [])
    ohlcv = result.get("indicators", {}).get("quote", [{}])[0]
    adj_close_data = result.get("indicators", {}).get("adjclose", [{}])[0]
    adj_close = adj_close_data.get("adjclose", []) if adj_close_data else []

    # Extract dividend and split events
    events = result.get("events", {})
    dividends = events.get("dividends", {})
    splits = events.get("splits", {})

    rows = []
    for i, ts in enumerate(timestamps):
        if ts is None:
            continue

        date_str = datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d")

        def safe_round(val, digits=4):
            return round(val, digits) if val is not None else None

        row = {
            "date": date_str,
            "timestamp": ts,
            "open": safe_round(ohlcv.get("open", [None] * (i+1))[i]),
            "high": safe_round(ohlcv.get("high", [None] * (i+1))[i]),
            "low": safe_round(ohlcv.get("low", [None] * (i+1))[i]),
            "close": safe_round(ohlcv.get("close", [None] * (i+1))[i]),
            "adj_close": safe_round(adj_close[i] if i < len(adj_close) else None),
            "volume": ohlcv.get("volume", [None] * (i+1))[i],
            "dividend": None,
            "split_ratio": None,
        }

        # Annotate dividend and split events by timestamp
        for div_ts, div_data in dividends.items():
            if abs(int(div_ts) - ts) < 86400:
                row["dividend"] = div_data.get("amount")

        for split_ts, split_data in splits.items():
            if abs(int(split_ts) - ts) < 86400:
                numerator = split_data.get("numerator", 1)
                denominator = split_data.get("denominator", 1)
                row["split_ratio"] = f"{numerator}:{denominator}"

        rows.append(row)

    return rows


history = get_historical("MSFT", period="3mo", interval="1d")
print(f"Got {len(history)} data points for MSFT")
print(f"\nLast 5 days:")
for row in history[-5:]:
    print(f"  {row['date']}: O={row['open']} H={row['high']} L={row['low']} C={row['close']} Vol={row['volume']:,}")

Financial Statements

The v10 quoteSummary endpoint returns detailed financial data — income statements, balance sheets, cash flow, and more. The modules parameter controls what sections to fetch:

def get_financials(
    symbol: str,
    modules: list = None,
    client: Optional[httpx.Client] = None,
) -> dict:
    """
    Fetch financial statements and company summary via quoteSummary.

    Available modules (request only what you need):
    - incomeStatementHistory, incomeStatementHistoryQuarterly
    - balanceSheetHistory, balanceSheetHistoryQuarterly
    - cashflowStatementHistory, cashflowStatementHistoryQuarterly
    - earnings, earningsTrend
    - financialData  (key financial ratios)
    - defaultKeyStatistics  (valuation metrics)
    - assetProfile  (company description, sector, officers)
    - recommendationTrend  (analyst buy/sell/hold breakdown)
    - earningsHistory  (historical EPS vs estimates)
    - institutionOwnership
    - insiderHolders, insiderTransactions
    - esgScores
    """
    if modules is None:
        modules = [
            "incomeStatementHistory",
            "balanceSheetHistory",
            "cashflowStatementHistory",
            "financialData",
            "defaultKeyStatistics",
            "assetProfile",
            "earningsTrend",
        ]

    if client is None:
        client = make_client()

    url = f"{YAHOO_BASE}/v10/finance/quoteSummary/{symbol}"
    params = {"modules": ",".join(modules)}

    resp = client.get(url, params=params)
    resp.raise_for_status()
    data = resp.json()

    if data.get("quoteSummary", {}).get("error"):
        raise ValueError(f"quoteSummary error for {symbol}: {data['quoteSummary']['error']}")

    result = data["quoteSummary"]["result"]
    if not result:
        return {}

    return result[0]


def extract_key_metrics(fin_data: dict) -> dict:
    """Extract and flatten key financial metrics from quoteSummary response."""

    def fmt(section: str, key: str, field: str = "fmt") -> str:
        return fin_data.get(section, {}).get(key, {}).get(field)

    def raw(section: str, key: str) -> float:
        return fin_data.get(section, {}).get(key, {}).get("raw")

    profile = fin_data.get("assetProfile", {})

    return {
        # Company info
        "company": profile.get("longBusinessSummary", "")[:200],
        "sector": profile.get("sector"),
        "industry": profile.get("industry"),
        "employees": profile.get("fullTimeEmployees"),
        "country": profile.get("country"),

        # Valuation
        "revenue": fmt("financialData", "totalRevenue"),
        "revenue_raw": raw("financialData", "totalRevenue"),
        "gross_profit": fmt("financialData", "grossProfits"),
        "ebitda": fmt("financialData", "ebitda"),
        "net_income": fmt("financialData", "netIncomeToCommon"),
        "profit_margin": fmt("financialData", "profitMargins"),
        "operating_margin": fmt("financialData", "operatingMargins"),
        "return_on_equity": fmt("financialData", "returnOnEquity"),
        "return_on_assets": fmt("financialData", "returnOnAssets"),
        "debt_to_equity": fmt("financialData", "debtToEquity"),
        "current_ratio": fmt("financialData", "currentRatio"),
        "free_cash_flow": fmt("financialData", "freeCashflow"),

        # Per-share and valuation metrics
        "pe_ratio": fmt("defaultKeyStatistics", "forwardPE"),
        "peg_ratio": fmt("defaultKeyStatistics", "pegRatio"),
        "price_to_book": fmt("defaultKeyStatistics", "priceToBook"),
        "eps_trailing": fmt("defaultKeyStatistics", "trailingEps"),
        "eps_forward": fmt("defaultKeyStatistics", "forwardEps"),
        "beta": fmt("defaultKeyStatistics", "beta"),
        "shares_outstanding": fmt("defaultKeyStatistics", "sharesOutstanding"),
        "float_shares": fmt("defaultKeyStatistics", "floatShares"),
        "short_ratio": fmt("defaultKeyStatistics", "shortRatio"),
        "short_pct_float": fmt("defaultKeyStatistics", "shortPercentOfFloat"),
        "dividend_yield": fmt("defaultKeyStatistics", "dividendYield"),
        "payout_ratio": fmt("defaultKeyStatistics", "payoutRatio"),
        "52w_high": fmt("defaultKeyStatistics", "fiftyTwoWeekHigh"),
        "52w_low": fmt("defaultKeyStatistics", "fiftyTwoWeekLow"),
        "50d_avg": fmt("defaultKeyStatistics", "fiftyDayAverage"),
        "200d_avg": fmt("defaultKeyStatistics", "twoHundredDayAverage"),

        # Income statement history (last 4 annual)
        "income_statements": [
            {
                "date": stmt.get("endDate", {}).get("fmt"),
                "revenue": stmt.get("totalRevenue", {}).get("fmt"),
                "gross_profit": stmt.get("grossProfit", {}).get("fmt"),
                "net_income": stmt.get("netIncome", {}).get("fmt"),
                "ebitda": stmt.get("ebitda", {}).get("fmt"),
                "eps_basic": stmt.get("basicEPS", {}).get("fmt"),
                "eps_diluted": stmt.get("dilutedEPS", {}).get("fmt"),
            }
            for stmt in fin_data.get("incomeStatementHistory", {})
                                 .get("incomeStatementHistory", [])
        ],

        # Analyst recommendations
        "analyst_target_price": fmt("financialData", "targetMeanPrice"),
        "analyst_recommendation": fin_data.get("financialData", {}).get("recommendationKey"),
        "analyst_count": raw("financialData", "numberOfAnalystOpinions"),
    }


# Usage
fin = get_financials("NVDA")
metrics = extract_key_metrics(fin)
print(f"Sector: {metrics['sector']} | Industry: {metrics['industry']}")
print(f"Revenue: {metrics['revenue']} | EBITDA: {metrics['ebitda']}")
print(f"P/E (fwd): {metrics['pe_ratio']} | EPS (fwd): {metrics['eps_forward']}")
print(f"Profit Margin: {metrics['profit_margin']} | ROE: {metrics['return_on_equity']}")
print(f"Analyst Target: {metrics['analyst_target_price']} ({metrics['analyst_recommendation']})")

Options Chains

def get_options(
    symbol: str,
    expiration_date: int = None,
    client: Optional[httpx.Client] = None,
) -> dict:
    """
    Fetch options chain for a symbol.
    expiration_date: Unix timestamp (get available dates from the response first).
    """
    if client is None:
        client = make_client()

    url = f"{YAHOO_BASE}/v7/finance/options/{symbol}"
    params = {}
    if expiration_date:
        params["date"] = expiration_date

    resp = client.get(url, params=params)
    resp.raise_for_status()
    data = resp.json()

    result = data["optionChain"]["result"][0]
    expirations = result.get("expirationDates", [])
    options = result.get("options", [{}])[0]

    def parse_contract(c: dict) -> dict:
        return {
            "contract_symbol": c.get("contractSymbol"),
            "strike": c.get("strike"),
            "expiration": datetime.datetime.fromtimestamp(
                c.get("expiration", 0)
            ).strftime("%Y-%m-%d") if c.get("expiration") else None,
            "last_price": c.get("lastPrice"),
            "bid": c.get("bid"),
            "ask": c.get("ask"),
            "change": c.get("change"),
            "pct_change": c.get("percentChange"),
            "volume": c.get("volume"),
            "open_interest": c.get("openInterest"),
            "implied_volatility": round(c.get("impliedVolatility", 0), 4) if c.get("impliedVolatility") else None,
            "in_the_money": c.get("inTheMoney", False),
        }

    quote = result.get("quote", {})
    underlying_price = quote.get("regularMarketPrice")

    return {
        "symbol": symbol,
        "underlying_price": underlying_price,
        "expiration_dates": [
            datetime.datetime.fromtimestamp(ts).strftime("%Y-%m-%d")
            for ts in expirations
        ],
        "calls": sorted(
            [parse_contract(c) for c in options.get("calls", [])],
            key=lambda x: x["strike"] or 0
        ),
        "puts": sorted(
            [parse_contract(c) for c in options.get("puts", [])],
            key=lambda x: x["strike"] or 0
        ),
    }


# Get options for AAPL
opts = get_options("AAPL")
underlying = opts["underlying_price"]
print(f"AAPL @ ${underlying:.2f}")
print(f"Available expirations: {', '.join(opts['expiration_dates'][:5])}")
print(f"\nATM calls (strikes near ${underlying:.0f}):")
atm_calls = [c for c in opts["calls"] if abs((c["strike"] or 0) - underlying) < 10]
for c in atm_calls[:5]:
    print(f"  ${c['strike']:.0f}: last=${c['last_price']}, IV={c['implied_volatility']}, OI={c['open_interest']:,}")

Anti-Bot Measures and Proxy Rotation

Yahoo Finance is more lenient than most sites, but there are still real limits.

Rate limiting. The v8 and v10 endpoints allow roughly 100 requests per minute per IP before throttling. You'll get 429 responses or temporary blocks. The v7 batch endpoint is more efficient — one request for 500 symbols beats 500 individual requests.

IP-based blocking. Datacenter IPs get lower rate limits than residential IPs. If you're scraping from a cloud server, you'll hit limits much faster than from a home connection.

Consent cookies in EU regions. Yahoo requires cookie consent for users in EU-regulated regions and may redirect non-US IPs to localized versions that behave differently.

Crumb authentication. Some endpoints occasionally require a "crumb" token extracted from a cookie. If you get 401 errors that shouldn't be there, you need to fetch the crumb first.

def get_yahoo_crumb(client: httpx.Client) -> str:
    """
    Fetch a Yahoo Finance crumb token.
    Required by some endpoints when cookie consent is enforced.
    """
    # Visit the main Yahoo Finance page to get the cookie
    client.get("https://finance.yahoo.com/")

    # Fetch the crumb from the dedicated endpoint
    resp = client.get(
        "https://query1.finance.yahoo.com/v1/test/getcrumb",
        headers={**HEADERS, "Referer": "https://finance.yahoo.com/"},
    )

    if resp.status_code == 200 and resp.text.strip():
        return resp.text.strip()
    return ""

For high-volume financial data collection — tracking 500+ tickers with intraday data — proxy rotation distributes requests across IPs. ThorData's residential proxy pool handles this well since their rotating IPs come with clean reputation scores that Yahoo doesn't throttle aggressively:

def get_tickers_with_proxy(
    tickers: list,
    proxy_url: str,
    delay_range: tuple = (0.5, 1.5),
) -> dict:
    """Fetch quotes for many tickers using a rotating proxy."""
    client = make_client(proxy_url=proxy_url)
    results = {}

    # Use batch endpoint for efficiency (up to 100 symbols per request)
    batch_size = 50
    for i in range(0, len(tickers), batch_size):
        batch = tickers[i:i+batch_size]
        try:
            quotes = get_batch_quotes(batch, client=client)
            for q in quotes:
                results[q["symbol"]] = q
            print(f"Batch {i//batch_size + 1}: {len(quotes)} quotes fetched")
        except Exception as e:
            print(f"Batch {i//batch_size + 1} error: {e}")

        if i + batch_size < len(tickers):
            time.sleep(random.uniform(*delay_range))

    return results


# Fetch all S&P 500 components
sp500_tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "NVDA", "META", "TSLA"]  # add full list
proxy = "http://user:[email protected]:9000"
all_quotes = get_tickers_with_proxy(sp500_tickers, proxy_url=proxy)
print(f"Fetched {len(all_quotes)} quotes")

SQLite Storage Schema

import sqlite3
import json

def init_finance_db(db_path: str = "yahoo_finance.db") -> sqlite3.Connection:
    """Initialize SQLite database for financial data."""
    conn = sqlite3.connect(db_path)
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA synchronous=NORMAL")
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS tickers (
            symbol          TEXT PRIMARY KEY,
            name            TEXT,
            exchange        TEXT,
            sector          TEXT,
            industry        TEXT,
            country         TEXT,
            currency        TEXT,
            employees       INTEGER,
            description     TEXT,
            added_at        TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE TABLE IF NOT EXISTS price_snapshots (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            symbol          TEXT NOT NULL,
            price           REAL,
            change          REAL,
            change_pct      REAL,
            volume          INTEGER,
            market_cap      REAL,
            market_state    TEXT,
            snapshot_at     TEXT DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (symbol) REFERENCES tickers(symbol)
        );

        CREATE TABLE IF NOT EXISTS ohlcv (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            symbol          TEXT NOT NULL,
            date            TEXT,
            interval_type   TEXT DEFAULT '1d',
            open            REAL,
            high            REAL,
            low             REAL,
            close           REAL,
            adj_close       REAL,
            volume          INTEGER,
            dividend        REAL,
            split_ratio     TEXT,
            UNIQUE(symbol, date, interval_type),
            FOREIGN KEY (symbol) REFERENCES tickers(symbol)
        );

        CREATE TABLE IF NOT EXISTS financials (
            id              INTEGER PRIMARY KEY AUTOINCREMENT,
            symbol          TEXT NOT NULL,
            period_type     TEXT DEFAULT 'annual',
            period_end      TEXT,
            revenue         REAL,
            gross_profit    REAL,
            net_income      REAL,
            ebitda          REAL,
            eps_basic       TEXT,
            eps_diluted     TEXT,
            fetched_at      TEXT DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (symbol) REFERENCES tickers(symbol)
        );

        CREATE TABLE IF NOT EXISTS key_metrics (
            symbol          TEXT PRIMARY KEY,
            pe_ratio        TEXT,
            peg_ratio       TEXT,
            price_to_book   TEXT,
            profit_margin   TEXT,
            operating_margin TEXT,
            roe             TEXT,
            roa             TEXT,
            debt_to_equity  TEXT,
            current_ratio   TEXT,
            beta            TEXT,
            dividend_yield  TEXT,
            short_pct_float TEXT,
            analyst_target  TEXT,
            analyst_reco    TEXT,
            updated_at      TEXT DEFAULT CURRENT_TIMESTAMP
        );

        CREATE INDEX IF NOT EXISTS idx_price_symbol ON price_snapshots (symbol);
        CREATE INDEX IF NOT EXISTS idx_price_at ON price_snapshots (snapshot_at);
        CREATE INDEX IF NOT EXISTS idx_ohlcv_symbol_date ON ohlcv (symbol, date);
    """)
    conn.commit()
    return conn


def save_ticker(conn: sqlite3.Connection, symbol: str, profile: dict = None):
    """Upsert a ticker record with optional profile data."""
    conn.execute(
        """INSERT OR IGNORE INTO tickers (symbol) VALUES (?)""",
        (symbol,)
    )
    if profile:
        conn.execute(
            """UPDATE tickers SET name=?, exchange=?, sector=?, industry=?,
               country=?, currency=?, employees=?, description=?
               WHERE symbol=?""",
            (
                profile.get("shortName") or profile.get("longName"),
                profile.get("exchange"),
                profile.get("sector"),
                profile.get("industry"),
                profile.get("country"),
                profile.get("currency"),
                profile.get("fullTimeEmployees"),
                profile.get("longBusinessSummary", "")[:500],
                symbol,
            ),
        )
    conn.commit()


def save_quote_snapshot(conn: sqlite3.Connection, quote: dict):
    """Record a price snapshot."""
    conn.execute(
        """INSERT INTO price_snapshots
           (symbol, price, change, change_pct, volume, market_cap, market_state)
           VALUES (?, ?, ?, ?, ?, ?, ?)""",
        (
            quote.get("symbol"),
            quote.get("price"),
            quote.get("change"),
            quote.get("change_pct"),
            quote.get("volume"),
            quote.get("market_cap"),
            quote.get("market_state"),
        ),
    )
    conn.commit()


def save_ohlcv_history(conn: sqlite3.Connection, symbol: str, history: list, interval: str = "1d"):
    """Bulk insert OHLCV history. Skips duplicates."""
    for row in history:
        try:
            conn.execute(
                """INSERT OR IGNORE INTO ohlcv
                   (symbol, date, interval_type, open, high, low, close, adj_close, volume, dividend, split_ratio)
                   VALUES (?,?,?,?,?,?,?,?,?,?,?)""",
                (
                    symbol, row.get("date"), interval,
                    row.get("open"), row.get("high"), row.get("low"),
                    row.get("close"), row.get("adj_close"), row.get("volume"),
                    row.get("dividend"), row.get("split_ratio"),
                ),
            )
        except sqlite3.IntegrityError:
            pass
    conn.commit()


def save_key_metrics(conn: sqlite3.Connection, symbol: str, metrics: dict):
    """Upsert key financial metrics for a symbol."""
    conn.execute(
        """INSERT OR REPLACE INTO key_metrics
           (symbol, pe_ratio, peg_ratio, price_to_book, profit_margin, operating_margin,
            roe, roa, debt_to_equity, current_ratio, beta, dividend_yield,
            short_pct_float, analyst_target, analyst_reco, updated_at)
           VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,CURRENT_TIMESTAMP)""",
        (
            symbol,
            metrics.get("pe_ratio"), metrics.get("peg_ratio"), metrics.get("price_to_book"),
            metrics.get("profit_margin"), metrics.get("operating_margin"),
            metrics.get("return_on_equity"), metrics.get("return_on_assets"),
            metrics.get("debt_to_equity"), metrics.get("current_ratio"),
            metrics.get("beta"), metrics.get("dividend_yield"),
            metrics.get("short_pct_float"),
            metrics.get("analyst_target_price"), metrics.get("analyst_recommendation"),
        ),
    )
    conn.commit()

Complete Monitoring Pipeline

def build_financial_database(
    tickers: list,
    db_path: str = "yahoo_finance.db",
    proxy_url: str = None,
    fetch_history: bool = True,
    history_period: str = "2y",
    fetch_financials: bool = True,
    delay_range: tuple = (1.0, 2.5),
):
    """
    Full pipeline: quotes + history + financials -> SQLite storage.
    Designed for daily or weekly runs to build a financial research database.
    """
    conn = init_finance_db(db_path)
    client = make_client(proxy_url=proxy_url)

    print(f"Processing {len(tickers)} tickers...")

    # Phase 1: Batch quotes (efficient)
    print("\nFetching batch quotes...")
    batch_size = 50
    for i in range(0, len(tickers), batch_size):
        batch = tickers[i:i+batch_size]
        try:
            quotes = get_batch_quotes(batch, client=client)
            for q in quotes:
                save_ticker(conn, q["symbol"])
                save_quote_snapshot(conn, q)
            print(f"  Batch {i//batch_size + 1}: {len(quotes)} quotes")
            time.sleep(random.uniform(0.5, 1.0))
        except Exception as e:
            print(f"  Batch error: {e}")

    # Phase 2: History and financials (per-ticker)
    for i, symbol in enumerate(tickers):
        print(f"\n[{i+1}/{len(tickers)}] {symbol}")

        if fetch_history:
            try:
                history = get_historical(symbol, period=history_period, client=client)
                save_ohlcv_history(conn, symbol, history)
                print(f"  History: {len(history)} candles")
            except Exception as e:
                print(f"  History error: {e}")
            time.sleep(random.uniform(*delay_range))

        if fetch_financials:
            try:
                fin = get_financials(symbol, client=client)
                metrics = extract_key_metrics(fin)
                save_key_metrics(conn, symbol, metrics)
                print(f"  Financials: P/E={metrics.get('pe_ratio')} Margin={metrics.get('profit_margin')}")

                # Save income statement history
                for stmt in metrics.get("income_statements", []):
                    if stmt.get("date"):
                        conn.execute(
                            """INSERT OR IGNORE INTO financials
                               (symbol, period_type, period_end, revenue, gross_profit, net_income)
                               VALUES (?,?,?,?,?,?)""",
                            (symbol, "annual", stmt["date"],
                             stmt.get("revenue"), stmt.get("gross_profit"), stmt.get("net_income"))
                        )
                conn.commit()

            except Exception as e:
                print(f"  Financials error: {e}")
            time.sleep(random.uniform(*delay_range))

    # Report
    snap_count = conn.execute("SELECT COUNT(*) FROM price_snapshots").fetchone()[0]
    ohlcv_count = conn.execute("SELECT COUNT(*) FROM ohlcv").fetchone()[0]
    print(f"\nDatabase summary: {snap_count:,} snapshots, {ohlcv_count:,} OHLCV rows")
    conn.close()


# Run it
WATCHLIST = ["AAPL", "MSFT", "GOOGL", "AMZN", "NVDA", "META", "TSLA", "BRK-B", "JPM", "V"]
PROXY = "http://user:[email protected]:9000"

build_financial_database(
    WATCHLIST,
    proxy_url=PROXY,
    fetch_history=True,
    history_period="1y",
    fetch_financials=True,
)

Practical Tips

Use query1 vs query2. Both query1.finance.yahoo.com and query2.finance.yahoo.com serve the same data. If one is slow or returning errors, switch to the other.

Handle None in OHLCV data. Yahoo sometimes returns null for individual OHLCV values for weekends, market holidays, and trading suspensions. Always check for None before doing arithmetic.

Intraday limits. 1-minute interval data only goes back about 7 days. For longer intraday history, you need a paid data provider. For daily OHLCV, Yahoo provides data going back to the IPO date.

Crumb token errors. If you get 401 or unexpected authentication errors, visit finance.yahoo.com first in your HTTP session and extract the crumb from the cookie/page source. Some server-side configurations require it.

The v7 batch endpoint for portfolios. If you monitor a fixed set of tickers (a portfolio watchlist), the batch v7 endpoint in 50-100 symbol chunks is dramatically more efficient than per-symbol calls.

Prefer adjusted close for return calculations. The adj_close field accounts for splits and dividends. Always use it for return calculations — unadjusted close will give wrong results for stocks with splits.

Legal Notes

Yahoo Finance's Terms of Service prohibit scraping and automated access. Their API was officially shut down in 2017, and the unofficial endpoints described here are not sanctioned by Yahoo.

In practice, the financial data returned is widely accessed through the yfinance Python library (which uses the same endpoints under the hood) and considered generally acceptable for personal, research, and non-commercial use. No enforcement actions against individual researchers using these endpoints have been reported.

For commercial applications, licensed data providers like Alpha Vantage (free tier available), Quandl/Nasdaq Data Link, or Polygon.io provide the same data with proper usage rights and SLAs. These are worth the cost if you're building a product rather than doing research.