← Back to blog

How to Scrape Pinterest Boards and Pins in 2026: The Definitive Python Guide

Pinterest is one of the most data-rich platforms on the internet, and yet it's one of the least discussed when it comes to programmatic data extraction. That's partly because its unofficial API is not documented, and partly because scraping it well requires solving a cluster of interlocking problems — session management, fingerprint consistency, pagination, CSRF tokens, and IP reputation — all at once.

This guide solves all of them. It is a definitive, working reference for 2026, written for engineers who need real data, not toy examples.

Why Scrape Pinterest?

Before the code, here is why people actually build Pinterest scrapers — because the use case shapes what data you need and how aggressively you need to collect it.

Trend research and forecasting. Pinterest users save content months before they buy it. The platform has been consistently accurate as an early signal for seasonal trends in home decor, fashion, food, and travel. A board that picks up 10,000 new saves in a week for "quiet luxury interiors" tells you something that no search volume tool can, because it reflects intentional curation, not passive browsing. Retailers, trend agencies, and CPG brands monitor boards at scale to get 2-6 month lead time on consumer demand shifts.

E-commerce competitive intelligence. Product pins expose competitor inventory, pricing strategy, image creative, and which SKUs are gaining traction in the visual search layer. If a competing brand's product pins are accumulating saves across dozens of influencer boards, that's a market signal. Shopping pins also expose domain attribution, so you can see where traffic is flowing even if you can't see the GA data.

Fashion and visual content marketing. Style boards curated by top pinners function as editorial taste-making. Tracking which visual aesthetics (color palettes, composition styles, lifestyle contexts) accumulate the most engagement on fashion-adjacent boards gives content teams a data-backed brief rather than a vibes-based creative direction.

SEO and content strategy. Pinterest ranks in Google image search and has its own internal search engine. Understanding which pin descriptions, keywords, and image styles perform in Pinterest search tells you something about how Google processes visual content at scale — and gives content marketers a second traffic channel to optimize for.

Academic research on visual culture. Scholars studying how aesthetic movements spread through social networks, how visual norms shift across communities, and how platforms shape cultural production need bulk data. Pinterest's curation layer — where people explicitly organize images into named categories — makes it uniquely useful as a structured dataset for visual culture research.

Influencer and audience research. Board composition, save counts, and follower data let you evaluate whether an influencer's audience is real and engaged, or whether their follow metrics are gamed. A pinner with 500K followers but boards averaging 3 saves per pin is a different story than someone with 50K followers whose pins consistently get 200+ saves.


How Pinterest's Internal API Works

Pinterest has no public API worth using in 2026. The official v1 API was shuttered for most developers, and the v3 endpoint is heavily restricted. The good news is that the browser has to get data from somewhere, and that somewhere is a clean, JSON-based internal REST API.

Open Chrome DevTools on any Pinterest page, go to the Network tab, filter by Fetch/XHR, and watch the requests. You will see calls to two main patterns:

https://www.pinterest.com/resource/<ResourceName>/get/
https://api.pinterest.com/v3/...

The resource/ API is the one you want. Each resource call sends a data query parameter containing a JSON-encoded options object. The server responds with a consistent envelope:

{
  "resource": { "name": "BoardFeedResource", "options": {...} },
  "resource_response": {
    "status": "success",
    "http_status": 200,
    "data": [...],
    "bookmark": "SomeOpaqueBase64String=="
  },
  "client_context": {...}
}

The bookmark field is how pagination works. Pass it back in the next request inside options.bookmarks as a single-item array. When bookmark is null or "-end-", you have reached the last page.

The resource names you will use most: - BoardFeedResource — pins on a board - BoardsResource — boards belonging to a user - UserResource — user profile data - BaseSearchResource — keyword search - RelatedPinsResource — related/visual search - PinResource — single pin detail - PinCommentResource — comments on a pin - InterestFeedResource — trending pins by category - ShoppingSpotlightFeedResource — shopping/product pins


Setup and Shared Infrastructure

Install dependencies:

pip install httpx tenacity sqlite-utils

The following module is imported by all scripts in this guide. Save it as pinterest_base.py.

"""
pinterest_base.py
Shared session setup, retry logic, and dataclass models for Pinterest scraping.
"""
from __future__ import annotations

import json
import random
import time
import urllib.parse
from dataclasses import dataclass, field, asdict
from typing import Any, Optional

import httpx
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type


# ---------------------------------------------------------------------------
# Dataclass models
# ---------------------------------------------------------------------------

@dataclass
class Pin:
    id: str
    description: str
    image_url: Optional[str]
    save_count: int
    comment_count: int
    source_url: Optional[str]
    domain: Optional[str]
    created_at: Optional[str]
    board_id: Optional[str]
    pinner_username: Optional[str]
    is_shopping: bool = False
    price: Optional[str] = None
    currency: Optional[str] = None
    product_name: Optional[str] = None
    rich_metadata: dict = field(default_factory=dict)

    @classmethod
    def from_raw(cls, raw: dict) -> "Pin":
        images = raw.get("images") or {}
        orig = images.get("orig") or images.get("736x") or {}
        rich = raw.get("rich_metadata") or {}
        pinner = raw.get("pinner") or {}
        return cls(
            id=str(raw.get("id", "")),
            description=raw.get("description") or raw.get("grid_title") or "",
            image_url=orig.get("url"),
            save_count=raw.get("save_count") or raw.get("repin_count") or 0,
            comment_count=raw.get("comment_count") or 0,
            source_url=raw.get("link"),
            domain=raw.get("domain"),
            created_at=raw.get("created_at"),
            board_id=str(raw["board"]["id"]) if raw.get("board") else None,
            pinner_username=pinner.get("username"),
            is_shopping=bool(raw.get("shopping_rec_count") or raw.get("is_shopping_ad")),
            price=rich.get("price"),
            currency=rich.get("currency"),
            product_name=rich.get("name"),
            rich_metadata=rich,
        )


@dataclass
class Board:
    id: str
    name: str
    slug: str
    url: str
    description: str
    pin_count: int
    follower_count: int
    owner_username: str
    cover_image_url: Optional[str]
    category: Optional[str]
    created_at: Optional[str]

    @classmethod
    def from_raw(cls, raw: dict) -> "Board":
        owner = raw.get("owner") or {}
        cover = raw.get("cover_images") or {}
        cover_url = None
        for size in ("736x", "400x300", "200x150"):
            if size in cover:
                cover_url = cover[size].get("url")
                break
        url = raw.get("url", "")
        slug = url.strip("/").split("/")[-1] if url else raw.get("slug", "")
        return cls(
            id=str(raw.get("id", "")),
            name=raw.get("name", ""),
            slug=slug,
            url=url,
            description=raw.get("description") or "",
            pin_count=raw.get("pin_count") or 0,
            follower_count=raw.get("follower_count") or 0,
            owner_username=owner.get("username", ""),
            cover_image_url=cover_url,
            category=raw.get("category"),
            created_at=raw.get("created_at"),
        )


@dataclass
class UserProfile:
    id: str
    username: str
    full_name: str
    bio: str
    follower_count: int
    following_count: int
    board_count: int
    pin_count: int
    monthly_views: int
    website_url: Optional[str]
    profile_image_url: Optional[str]
    is_verified_merchant: bool

    @classmethod
    def from_raw(cls, raw: dict) -> "UserProfile":
        return cls(
            id=str(raw.get("id", "")),
            username=raw.get("username", ""),
            full_name=raw.get("full_name") or "",
            bio=raw.get("about") or "",
            follower_count=raw.get("follower_count") or 0,
            following_count=raw.get("following_count") or 0,
            board_count=raw.get("board_count") or 0,
            pin_count=raw.get("pin_count") or 0,
            monthly_views=raw.get("monthly_views") or 0,
            website_url=raw.get("website_url"),
            profile_image_url=(raw.get("image_medium_url") or raw.get("image_large_url")),
            is_verified_merchant=bool(raw.get("is_verified_merchant")),
        )


@dataclass
class Comment:
    id: str
    pin_id: str
    text: str
    author_username: str
    author_id: str
    created_at: str
    like_count: int

    @classmethod
    def from_raw(cls, raw: dict, pin_id: str) -> "Comment":
        user = raw.get("user") or {}
        return cls(
            id=str(raw.get("id", "")),
            pin_id=pin_id,
            text=raw.get("text", ""),
            author_username=user.get("username", ""),
            author_id=str(user.get("id", "")),
            created_at=raw.get("created_at", ""),
            like_count=raw.get("like_count") or 0,
        )


# ---------------------------------------------------------------------------
# Session factory
# ---------------------------------------------------------------------------

CHROME_VERSION = "124.0.0.0"

HEADERS: dict[str, str] = {
    "User-Agent": (
        f"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        f"AppleWebKit/537.36 (KHTML, like Gecko) "
        f"Chrome/{CHROME_VERSION} Safari/537.36"
    ),
    "Accept": "application/json, text/javascript, */*, q=0.01",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Sec-CH-UA": f'"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',
    "Sec-CH-UA-Mobile": "?0",
    "Sec-CH-UA-Platform": '"macOS"',
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
    "X-Requested-With": "XMLHttpRequest",
    "X-APP-VERSION": "b1e66c1",
    "X-Pinterest-AppState": "active",
    "Referer": "https://www.pinterest.com/",
}


def make_session(proxy_url: Optional[str] = None) -> httpx.Client:
    """
    Build a warmed-up httpx session.
    Loads the homepage to acquire cookies (including csrftoken) before
    any API calls are made.
    """
    client = httpx.Client(
        headers=HEADERS,
        follow_redirects=True,
        timeout=httpx.Timeout(30.0),
        proxies={"all://": proxy_url} if proxy_url else None,
    )
    # Cookie warming — load homepage to acquire csrftoken
    try:
        resp = client.get("https://www.pinterest.com/")
        resp.raise_for_status()
        # Small pause to look like a human landing on the page
        time.sleep(random.uniform(1.2, 2.5))
    except httpx.HTTPError as exc:
        print(f"[warn] Homepage warm-up failed: {exc}")
    return client


# ---------------------------------------------------------------------------
# Retry decorator for transient failures
# ---------------------------------------------------------------------------

pinterest_retry = retry(
    retry=retry_if_exception_type((httpx.HTTPStatusError, httpx.TimeoutException)),
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1.5, min=2, max=45),
    reraise=True,
)


# ---------------------------------------------------------------------------
# Rate limiting helpers
# ---------------------------------------------------------------------------

def polite_delay(base: float = 1.5, jitter: float = 1.2) -> None:
    """Sleep for base + random jitter seconds."""
    time.sleep(base + random.uniform(0.0, jitter))


def build_params(source_url: str, options: dict[str, Any]) -> dict[str, str]:
    """Encode resource API query parameters."""
    return {
        "source_url": source_url,
        "data": json.dumps({"options": options, "context": {}}, separators=(",", ":")),
    }

Section 1: Board Pin Extraction with Full Pagination

This script fetches every pin from a board, handles the bookmark-based pagination, and writes results to JSON.

"""
scrape_board_pins.py
Extract all pins from a Pinterest board with full pagination.
Usage: python3 scrape_board_pins.py <username> <board_slug> [output.json]
"""
from __future__ import annotations

import json
import sys
from typing import Iterator

import httpx

from pinterest_base import (
    Pin, Board, make_session, polite_delay, build_params, pinterest_retry
)

RESOURCE_URL = "https://www.pinterest.com/resource/{resource}/get/"


@pinterest_retry
def _fetch_board_pins_page(
    client: httpx.Client,
    username: str,
    board_slug: str,
    bookmark: str | None,
) -> tuple[list[dict], str | None]:
    """Fetch a single page of board pins. Returns (raw_pins, next_bookmark)."""
    options: dict = {
        "board_url": f"/{username}/{board_slug}/",
        "board_id": None,
        "currentFilter": -1,
        "field_set_key": "react_grid_pin",
        "filter_section_pins": True,
        "layout": "default",
        "page_size": 25,
        "redux_normalize_feed": True,
    }
    if bookmark:
        options["bookmarks"] = [bookmark]

    params = build_params(f"/{username}/{board_slug}/", options)
    resp = client.get(
        RESOURCE_URL.format(resource="BoardFeedResource"),
        params=params,
    )
    resp.raise_for_status()
    body = resp.json()
    resource_response = body["resource_response"]
    raw_pins = resource_response.get("data") or []
    next_bookmark = resource_response.get("bookmark")
    return raw_pins, next_bookmark


def iter_board_pins(
    client: httpx.Client,
    username: str,
    board_slug: str,
) -> Iterator[Pin]:
    """Yield Pin objects for every pin on the board."""
    bookmark: str | None = None
    page = 0
    total = 0

    while True:
        page += 1
        raw_pins, bookmark = _fetch_board_pins_page(client, username, board_slug, bookmark)

        for raw in raw_pins:
            if not raw or not isinstance(raw, dict):
                continue
            try:
                yield Pin.from_raw(raw)
                total += 1
            except Exception as exc:
                print(f"[warn] Could not parse pin: {exc}")

        print(f"  Page {page}: fetched {len(raw_pins)} pins (total so far: {total})")

        if not bookmark or bookmark == "-end-":
            break

        polite_delay()


def scrape_board(username: str, board_slug: str, output_path: str = "pins.json") -> list[Pin]:
    client = make_session()
    print(f"Scraping board: pinterest.com/{username}/{board_slug}/")

    pins: list[Pin] = []
    for pin in iter_board_pins(client, username, board_slug):
        pins.append(pin)

    with open(output_path, "w") as fh:
        json.dump([vars(p) for p in pins], fh, indent=2, default=str)

    print(f"\nDone. {len(pins)} pins written to {output_path}")
    client.close()
    return pins


if __name__ == "__main__":
    username = sys.argv[1] if len(sys.argv) > 1 else "anthropologie"
    board_slug = sys.argv[2] if len(sys.argv) > 2 else "home"
    out = sys.argv[3] if len(sys.argv) > 3 else "pins.json"
    scrape_board(username, board_slug, out)

Sample output (single pin object):

{
  "id": "982374651820394756",
  "description": "Linen duvet cover in warm sand — perfect for that quiet luxury bedroom look",
  "image_url": "https://i.pinimg.com/originals/4a/b2/cc/4ab2cc9e3a1db55f1c2e837612facb9d.jpg",
  "save_count": 4821,
  "comment_count": 12,
  "source_url": "https://www.anthropologie.com/shop/linen-duvet-cover",
  "domain": "anthropologie.com",
  "created_at": "2025-11-03T14:22:11",
  "board_id": "771209384756",
  "pinner_username": "homeaesthetics_daily",
  "is_shopping": true,
  "price": "148.00",
  "currency": "USD",
  "product_name": "Washed Linen Duvet Cover"
}

Section 2: User Profile and Board Listing

"""
scrape_user_profile.py
Fetch a Pinterest user's profile and all their public boards.
Usage: python3 scrape_user_profile.py <username> [output.json]
"""
from __future__ import annotations

import json
import sys
from dataclasses import asdict

import httpx

from pinterest_base import (
    UserProfile, Board, make_session, polite_delay, build_params, pinterest_retry
)

RESOURCE_URL = "https://www.pinterest.com/resource/{resource}/get/"


@pinterest_retry
def fetch_user_profile(client: httpx.Client, username: str) -> UserProfile:
    """Fetch full profile metadata for a Pinterest user."""
    options = {
        "username": username,
        "field_set_key": "profile",
    }
    params = build_params(f"/{username}/", options)
    resp = client.get(RESOURCE_URL.format(resource="UserResource"), params=params)
    resp.raise_for_status()
    raw = resp.json()["resource_response"]["data"]
    return UserProfile.from_raw(raw)


@pinterest_retry
def _fetch_boards_page(
    client: httpx.Client,
    username: str,
    bookmark: str | None,
) -> tuple[list[dict], str | None]:
    options: dict = {
        "username": username,
        "field_set_key": "profile_grid_item",
        "page_size": 50,
        "privacy_filter": "all",
        "sort": "last_pinned_to",
    }
    if bookmark:
        options["bookmarks"] = [bookmark]

    params = build_params(f"/{username}/boards/", options)
    resp = client.get(RESOURCE_URL.format(resource="BoardsResource"), params=params)
    resp.raise_for_status()
    body = resp.json()["resource_response"]
    return body.get("data") or [], body.get("bookmark")


def fetch_all_boards(client: httpx.Client, username: str) -> list[Board]:
    """Return all public boards for a user."""
    boards: list[Board] = []
    bookmark: str | None = None

    while True:
        raw_boards, bookmark = _fetch_boards_page(client, username, bookmark)
        for raw in raw_boards:
            if not raw or not isinstance(raw, dict):
                continue
            try:
                boards.append(Board.from_raw(raw))
            except Exception as exc:
                print(f"[warn] Board parse error: {exc}")

        if not bookmark or bookmark == "-end-":
            break
        polite_delay(base=1.0, jitter=0.8)

    return boards


def scrape_user(username: str, output_path: str = "user_profile.json") -> dict:
    client = make_session()
    print(f"Fetching profile: pinterest.com/{username}/")

    profile = fetch_user_profile(client, username)
    polite_delay()

    print(f"Found {profile.board_count} boards, {profile.monthly_views:,} monthly views")
    boards = fetch_all_boards(client, username)
    print(f"Fetched {len(boards)} boards")

    output = {
        "profile": asdict(profile),
        "boards": [asdict(b) for b in boards],
    }
    with open(output_path, "w") as fh:
        json.dump(output, fh, indent=2, default=str)

    print(f"Written to {output_path}")
    client.close()
    return output


if __name__ == "__main__":
    username = sys.argv[1] if len(sys.argv) > 1 else "anthropologie"
    out = sys.argv[2] if len(sys.argv) > 2 else "user_profile.json"
    scrape_user(username, out)

Sample profile output:

{
  "profile": {
    "id": "502814736284",
    "username": "anthropologie",
    "full_name": "Anthropologie",
    "bio": "Inspiring the free-spirited lifestyle.",
    "follower_count": 2847391,
    "following_count": 148,
    "board_count": 94,
    "pin_count": 38201,
    "monthly_views": 12400000,
    "website_url": "https://www.anthropologie.com",
    "profile_image_url": "https://i.pinimg.com/280x280_RS/8b/6e/...",
    "is_verified_merchant": true
  },
  "boards": [
    {
      "id": "771209384756",
      "name": "Home Decor",
      "slug": "home-decor",
      "url": "/anthropologie/home-decor/",
      "description": "Curated home goods and interior inspiration.",
      "pin_count": 4203,
      "follower_count": 189421,
      "owner_username": "anthropologie",
      "cover_image_url": "https://i.pinimg.com/400x300/...",
      "category": "home_decor",
      "created_at": "2012-04-18T09:14:00"
    }
  ]
}

Section 3: Pin Search by Keyword

"""
scrape_pin_search.py
Search Pinterest pins by keyword with paginated results.
Usage: python3 scrape_pin_search.py "quiet luxury bedroom" [--pages 5] [output.json]
"""
from __future__ import annotations

import argparse
import json
import urllib.parse
from dataclasses import asdict
from typing import Iterator

import httpx

from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry

RESOURCE_URL = "https://www.pinterest.com/resource/BaseSearchResource/get/"


@pinterest_retry
def _fetch_search_page(
    client: httpx.Client,
    query: str,
    bookmark: str | None,
) -> tuple[list[dict], str | None]:
    options: dict = {
        "query": query,
        "scope": "pins",
        "no_fetch_context_on_resource": False,
        "page_size": 25,
        "redux_normalize_feed": True,
        "rs": "typed",
        "auto_correction_disabled": False,
    }
    if bookmark:
        options["bookmarks"] = [bookmark]

    encoded_query = urllib.parse.quote(query)
    params = build_params(
        f"/search/pins/?q={encoded_query}&rs=typed&term_meta%5B%5D=typed",
        options,
    )
    resp = client.get(RESOURCE_URL, params=params)
    resp.raise_for_status()
    body = resp.json()["resource_response"]
    data = body.get("data") or {}
    results = data.get("results") or []
    pins = [r for r in results if isinstance(r, dict) and r.get("type") == "pin"]
    return pins, body.get("bookmark")


def iter_search_pins(
    client: httpx.Client,
    query: str,
    max_pages: int = 10,
) -> Iterator[Pin]:
    """Yield Pin objects from keyword search results."""
    bookmark: str | None = None
    for page in range(1, max_pages + 1):
        raw_pins, bookmark = _fetch_search_page(client, query, bookmark)
        yielded = 0
        for raw in raw_pins:
            try:
                yield Pin.from_raw(raw)
                yielded += 1
            except Exception as exc:
                print(f"[warn] Search pin parse error: {exc}")
        print(f"  Search page {page}: {yielded} pins")
        if not bookmark or bookmark == "-end-":
            break
        polite_delay()


def scrape_search(query: str, max_pages: int = 5, output_path: str = "search_results.json") -> list[Pin]:
    client = make_session()
    print(f"Searching Pinterest for: '{query}' (up to {max_pages} pages)")

    pins: list[Pin] = []
    for pin in iter_search_pins(client, query, max_pages):
        pins.append(pin)

    with open(output_path, "w") as fh:
        json.dump([asdict(p) for p in pins], fh, indent=2, default=str)

    print(f"\nDone. {len(pins)} results written to {output_path}")
    client.close()
    return pins


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("query", help="Search query")
    parser.add_argument("--pages", type=int, default=5, help="Max pages to fetch")
    parser.add_argument("--output", default="search_results.json")
    args = parser.parse_args()
    scrape_search(args.query, args.pages, args.output)

Pinterest's visual search system powers the "More like this" section on any pin page. This is one of the most valuable endpoints for trend research because it reveals the platform's internal understanding of visual similarity.

"""
scrape_related_pins.py
Fetch visually related pins for a given pin ID (Pinterest's "More like this").
Usage: python3 scrape_related_pins.py <pin_id> [output.json]
"""
from __future__ import annotations

import json
import sys
from dataclasses import asdict
from typing import Iterator

import httpx

from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry

RESOURCE_URL = "https://www.pinterest.com/resource/RelatedPinsResource/get/"


@pinterest_retry
def _fetch_related_page(
    client: httpx.Client,
    pin_id: str,
    bookmark: str | None,
) -> tuple[list[dict], str | None]:
    options: dict = {
        "pin_id": pin_id,
        "add_vase": True,
        "count": 25,
        "field_set_key": "react_grid_pin",
        "redux_normalize_feed": True,
    }
    if bookmark:
        options["bookmarks"] = [bookmark]

    params = build_params(f"/pin/{pin_id}/", options)
    resp = client.get(RESOURCE_URL, params=params)
    resp.raise_for_status()
    body = resp.json()["resource_response"]
    return body.get("data") or [], body.get("bookmark")


def iter_related_pins(
    client: httpx.Client,
    pin_id: str,
    max_pages: int = 3,
) -> Iterator[Pin]:
    bookmark: str | None = None
    for page in range(1, max_pages + 1):
        raw_pins, bookmark = _fetch_related_page(client, pin_id, bookmark)
        yielded = 0
        for raw in raw_pins:
            if not isinstance(raw, dict):
                continue
            try:
                yield Pin.from_raw(raw)
                yielded += 1
            except Exception as exc:
                print(f"[warn] Related pin parse error: {exc}")
        print(f"  Related pins page {page}: {yielded} pins")
        if not bookmark or bookmark == "-end-":
            break
        polite_delay()


def scrape_related(pin_id: str, max_pages: int = 3, output_path: str = "related_pins.json") -> list[Pin]:
    client = make_session()
    print(f"Fetching related pins for: {pin_id}")

    pins: list[Pin] = []
    for pin in iter_related_pins(client, pin_id, max_pages):
        pins.append(pin)

    with open(output_path, "w") as fh:
        json.dump([asdict(p) for p in pins], fh, indent=2, default=str)

    print(f"\nDone. {len(pins)} related pins written to {output_path}")
    client.close()
    return pins


if __name__ == "__main__":
    pin_id = sys.argv[1] if len(sys.argv) > 1 else "982374651820394756"
    out = sys.argv[2] if len(sys.argv) > 2 else "related_pins.json"
    scrape_related(pin_id, output_path=out)

Section 5: Pin Comment Extraction

"""
scrape_pin_comments.py
Extract all comments from a Pinterest pin.
Usage: python3 scrape_pin_comments.py <pin_id> [output.json]
"""
from __future__ import annotations

import json
import sys
from dataclasses import asdict, dataclass
from typing import Iterator, Optional

import httpx

from pinterest_base import Comment, make_session, polite_delay, build_params, pinterest_retry

RESOURCE_URL = "https://www.pinterest.com/resource/AggregatedCommentResource/get/"


@pinterest_retry
def _fetch_comments_page(
    client: httpx.Client,
    pin_id: str,
    bookmark: str | None,
) -> tuple[list[dict], str | None]:
    options: dict = {
        "objectId": pin_id,
        "objectType": "pin",
        "page_size": 50,
    }
    if bookmark:
        options["bookmarks"] = [bookmark]

    params = build_params(f"/pin/{pin_id}/", options)
    resp = client.get(RESOURCE_URL, params=params)
    resp.raise_for_status()
    body = resp.json()["resource_response"]
    return body.get("data") or [], body.get("bookmark")


def iter_pin_comments(
    client: httpx.Client,
    pin_id: str,
) -> Iterator[Comment]:
    bookmark: str | None = None
    while True:
        raw_comments, bookmark = _fetch_comments_page(client, pin_id, bookmark)
        for raw in raw_comments:
            if not isinstance(raw, dict):
                continue
            try:
                yield Comment.from_raw(raw, pin_id)
            except Exception as exc:
                print(f"[warn] Comment parse error: {exc}")

        if not bookmark or bookmark == "-end-":
            break
        polite_delay(base=1.0, jitter=0.5)


def scrape_comments(pin_id: str, output_path: str = "comments.json") -> list[Comment]:
    client = make_session()
    print(f"Fetching comments for pin: {pin_id}")

    comments: list[Comment] = []
    for comment in iter_pin_comments(client, pin_id):
        comments.append(comment)

    with open(output_path, "w") as fh:
        json.dump([asdict(c) for c in comments], fh, indent=2, default=str)

    print(f"Done. {len(comments)} comments written to {output_path}")
    client.close()
    return comments


if __name__ == "__main__":
    pin_id = sys.argv[1] if len(sys.argv) > 1 else "982374651820394756"
    out = sys.argv[2] if len(sys.argv) > 2 else "comments.json"
    scrape_comments(pin_id, out)

Sample comment output:

[
  {
    "id": "6029483756102938471",
    "pin_id": "982374651820394756",
    "text": "Love this! Where can I find that throw blanket?",
    "author_username": "interiorinspo_daily",
    "author_id": "830194827364",
    "created_at": "2025-11-05T09:43:17",
    "like_count": 7
  }
]

Pinterest exposes category-level trending feeds through InterestFeedResource. Categories include home_decor, fashion, food_drink, beauty, travel, art, diy_crafts, and more.

"""
scrape_trending.py
Fetch trending pins for a Pinterest interest/category.
Usage: python3 scrape_trending.py <category> [--pages 3] [output.json]
"""
from __future__ import annotations

import argparse
import json
from dataclasses import asdict
from typing import Iterator

import httpx

from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry

RESOURCE_URL = "https://www.pinterest.com/resource/InterestFeedResource/get/"

# Known valid category slugs
VALID_CATEGORIES = [
    "home_decor", "fashion", "food_drink", "beauty", "travel",
    "art", "diy_crafts", "photography", "architecture", "cars_motorcycles",
    "film_music_books", "fitness", "gardening", "kids_parenting",
    "mens_fashion", "womens_fashion", "outdoor", "pets", "sports",
    "tattoos", "technology", "weddings",
]


@pinterest_retry
def _fetch_trending_page(
    client: httpx.Client,
    category: str,
    bookmark: str | None,
) -> tuple[list[dict], str | None]:
    options: dict = {
        "interest_id": category,
        "field_set_key": "react_grid_pin",
        "is_own_profile_pins": False,
        "page_size": 25,
        "redux_normalize_feed": True,
    }
    if bookmark:
        options["bookmarks"] = [bookmark]

    params = build_params(f"/ideas/{category}/", options)
    resp = client.get(RESOURCE_URL, params=params)
    resp.raise_for_status()
    body = resp.json()["resource_response"]
    return body.get("data") or [], body.get("bookmark")


def iter_trending_pins(
    client: httpx.Client,
    category: str,
    max_pages: int = 3,
) -> Iterator[Pin]:
    bookmark: str | None = None
    for page in range(1, max_pages + 1):
        raw_pins, bookmark = _fetch_trending_page(client, category, bookmark)
        yielded = 0
        for raw in raw_pins:
            if not isinstance(raw, dict):
                continue
            try:
                yield Pin.from_raw(raw)
                yielded += 1
            except Exception as exc:
                print(f"[warn] Trending pin parse error: {exc}")
        print(f"  Trending [{category}] page {page}: {yielded} pins")
        if not bookmark or bookmark == "-end-":
            break
        polite_delay()


def scrape_trending(
    category: str,
    max_pages: int = 3,
    output_path: str = "trending.json",
) -> list[Pin]:
    if category not in VALID_CATEGORIES:
        print(f"[warn] Unknown category '{category}'. Valid options: {VALID_CATEGORIES}")

    client = make_session()
    print(f"Fetching trending pins for category: {category}")

    pins: list[Pin] = []
    for pin in iter_trending_pins(client, category, max_pages):
        pins.append(pin)

    with open(output_path, "w") as fh:
        json.dump([asdict(p) for p in pins], fh, indent=2, default=str)

    print(f"\nDone. {len(pins)} trending pins written to {output_path}")
    client.close()
    return pins


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("category", help=f"One of: {', '.join(VALID_CATEGORIES)}")
    parser.add_argument("--pages", type=int, default=3)
    parser.add_argument("--output", default="trending.json")
    args = parser.parse_args()
    scrape_trending(args.category, args.pages, args.output)

Section 7: Shopping and Product Pin Extraction

Shopping pins contain rich structured metadata including price, retailer, product name, and availability. This makes them the highest-value target for e-commerce competitive intelligence.

"""
scrape_shopping_pins.py
Extract shopping/product pins from a board or search query.
Includes price, retailer, and product metadata.
Usage: python3 scrape_shopping_pins.py --board <user>/<slug> [--search "keyword"] [output.json]
"""
from __future__ import annotations

import argparse
import json
from dataclasses import asdict, dataclass
from typing import Iterator, Optional

import httpx

from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry

SHOPPING_RESOURCE_URL = "https://www.pinterest.com/resource/ShoppingSpotlightFeedResource/get/"
BOARD_RESOURCE_URL = "https://www.pinterest.com/resource/BoardFeedResource/get/"


@dataclass
class ShoppingPin:
    pin_id: str
    product_name: str
    description: str
    price: Optional[str]
    currency: Optional[str]
    retailer: Optional[str]
    retailer_domain: Optional[str]
    buy_url: Optional[str]
    image_url: Optional[str]
    save_count: int
    availability: Optional[str]
    condition: Optional[str]
    brand: Optional[str]

    @classmethod
    def from_pin(cls, pin: Pin) -> "ShoppingPin":
        rich = pin.rich_metadata
        return cls(
            pin_id=pin.id,
            product_name=pin.product_name or pin.description[:120],
            description=pin.description,
            price=pin.price,
            currency=pin.currency,
            retailer=rich.get("site_name") or rich.get("site"),
            retailer_domain=pin.domain,
            buy_url=pin.source_url,
            image_url=pin.image_url,
            save_count=pin.save_count,
            availability=rich.get("availability"),
            condition=rich.get("condition"),
            brand=rich.get("brand"),
        )


@pinterest_retry
def _fetch_shopping_spotlight_page(
    client: httpx.Client,
    bookmark: str | None,
) -> tuple[list[dict], str | None]:
    """Fetch from the shopping spotlight feed (curated product discovery)."""
    options: dict = {
        "field_set_key": "react_grid_pin",
        "page_size": 25,
        "redux_normalize_feed": True,
    }
    if bookmark:
        options["bookmarks"] = [bookmark]

    params = build_params("/shop/", options)
    resp = client.get(SHOPPING_RESOURCE_URL, params=params)
    resp.raise_for_status()
    body = resp.json()["resource_response"]
    return body.get("data") or [], body.get("bookmark")


def iter_shopping_spotlight(
    client: httpx.Client,
    max_pages: int = 5,
) -> Iterator[ShoppingPin]:
    """Yield ShoppingPin objects from the global shopping spotlight feed."""
    bookmark: str | None = None
    for page in range(1, max_pages + 1):
        raw_pins, bookmark = _fetch_shopping_spotlight_page(client, bookmark)
        count = 0
        for raw in raw_pins:
            if not isinstance(raw, dict):
                continue
            try:
                pin = Pin.from_raw(raw)
                if pin.is_shopping or pin.price or pin.rich_metadata:
                    yield ShoppingPin.from_pin(pin)
                    count += 1
            except Exception as exc:
                print(f"[warn] Shopping pin parse error: {exc}")
        print(f"  Shopping spotlight page {page}: {count} product pins")
        if not bookmark or bookmark == "-end-":
            break
        polite_delay()


def scrape_shopping(
    max_pages: int = 5,
    output_path: str = "shopping_pins.json",
) -> list[ShoppingPin]:
    client = make_session()
    print("Fetching Pinterest shopping spotlight feed...")

    items: list[ShoppingPin] = []
    for item in iter_shopping_spotlight(client, max_pages):
        items.append(item)

    with open(output_path, "w") as fh:
        json.dump([asdict(i) for i in items], fh, indent=2, default=str)

    print(f"\nDone. {len(items)} shopping pins written to {output_path}")
    client.close()
    return items


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--pages", type=int, default=5)
    parser.add_argument("--output", default="shopping_pins.json")
    args = parser.parse_args()
    scrape_shopping(args.pages, args.output)

Sample shopping pin output:

{
  "pin_id": "729384756102938",
  "product_name": "Merino Wool Crew Neck Sweater — Camel",
  "description": "The coziest winter staple. 100% Merino wool, relaxed fit.",
  "price": "98.00",
  "currency": "USD",
  "retailer": "Everlane",
  "retailer_domain": "everlane.com",
  "buy_url": "https://www.everlane.com/products/mens-merino-crew-camel",
  "image_url": "https://i.pinimg.com/originals/7c/2a/...",
  "save_count": 3102,
  "availability": "in stock",
  "condition": "new",
  "brand": "Everlane"
}

Section 8: Bulk Board Comparison (Pin Overlap Analysis)

This script takes two or more boards and computes overlap — pins that appear on multiple boards. Useful for understanding whether curators are drawing from the same source content, or for identifying "cornerstone" pins that spread across communities.

"""
scrape_board_compare.py
Compare pin overlap between two or more Pinterest boards.
Usage: python3 scrape_board_compare.py user1/board-a user2/board-b [user3/board-c ...] [--output report.json]
"""
from __future__ import annotations

import argparse
import json
from collections import Counter, defaultdict
from dataclasses import asdict
from typing import NamedTuple

from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry
from scrape_board_pins import iter_board_pins
import httpx


class BoardSpec(NamedTuple):
    username: str
    board_slug: str

    @classmethod
    def parse(cls, spec: str) -> "BoardSpec":
        parts = spec.strip("/").split("/")
        if len(parts) < 2:
            raise ValueError(f"Expected user/board-slug, got: {spec}")
        return cls(username=parts[0], board_slug=parts[1])

    def __str__(self) -> str:
        return f"{self.username}/{self.board_slug}"


def compute_overlap(
    client: httpx.Client,
    boards: list[BoardSpec],
) -> dict:
    """
    For each board, collect all pin IDs. Then compute pairwise overlap.
    Returns a report dict.
    """
    board_pin_map: dict[str, set[str]] = {}
    board_pins_full: dict[str, list[Pin]] = {}
    pin_registry: dict[str, Pin] = {}

    for spec in boards:
        label = str(spec)
        print(f"\nFetching board: {label}")
        pin_ids: set[str] = set()
        pins_list: list[Pin] = []
        for pin in iter_board_pins(client, spec.username, spec.board_slug):
            pin_ids.add(pin.id)
            pins_list.append(pin)
            pin_registry[pin.id] = pin
            polite_delay(base=0.3, jitter=0.3)
        board_pin_map[label] = pin_ids
        board_pins_full[label] = pins_list

    # Pairwise overlap
    labels = list(board_pin_map.keys())
    pairwise: list[dict] = []
    for i in range(len(labels)):
        for j in range(i + 1, len(labels)):
            a, b = labels[i], labels[j]
            overlap = board_pin_map[a] & board_pin_map[b]
            union = board_pin_map[a] | board_pin_map[b]
            jaccard = len(overlap) / len(union) if union else 0.0
            pairwise.append({
                "board_a": a,
                "board_b": b,
                "overlap_count": len(overlap),
                "board_a_total": len(board_pin_map[a]),
                "board_b_total": len(board_pin_map[b]),
                "jaccard_similarity": round(jaccard, 4),
                "overlap_pin_ids": sorted(overlap),
                "shared_pins": [asdict(pin_registry[pid]) for pid in sorted(overlap)],
            })

    # Pins appearing across the most boards
    pin_board_count = Counter()
    pin_board_names: dict[str, list[str]] = defaultdict(list)
    for label, ids in board_pin_map.items():
        for pid in ids:
            pin_board_count[pid] += 1
            pin_board_names[pid].append(label)

    most_shared = [
        {
            "pin_id": pid,
            "board_count": count,
            "boards": pin_board_names[pid],
            "pin": asdict(pin_registry[pid]) if pid in pin_registry else None,
        }
        for pid, count in pin_board_count.most_common(20)
        if count > 1
    ]

    return {
        "boards_analyzed": labels,
        "board_sizes": {label: len(ids) for label, ids in board_pin_map.items()},
        "pairwise_overlap": pairwise,
        "most_shared_pins": most_shared,
        "summary": {
            "total_unique_pins": len(pin_registry),
            "pins_on_multiple_boards": sum(1 for c in pin_board_count.values() if c > 1),
        },
    }


def scrape_compare(
    board_specs: list[str],
    output_path: str = "board_comparison.json",
) -> dict:
    boards = [BoardSpec.parse(s) for s in board_specs]
    client = make_session()

    print(f"Comparing {len(boards)} boards: {', '.join(str(b) for b in boards)}")
    report = compute_overlap(client, boards)

    with open(output_path, "w") as fh:
        json.dump(report, fh, indent=2, default=str)

    print(f"\nComparison written to {output_path}")
    print(f"  Total unique pins: {report['summary']['total_unique_pins']}")
    print(f"  Pins on multiple boards: {report['summary']['pins_on_multiple_boards']}")
    client.close()
    return report


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Compare Pinterest board pin overlap")
    parser.add_argument("boards", nargs="+", help="Board specs in user/board-slug format")
    parser.add_argument("--output", default="board_comparison.json")
    args = parser.parse_args()
    scrape_compare(args.boards, args.output)

Anti-Detection Deep Dive

Getting blocked by Pinterest is frustrating and entirely avoidable with the right approach. Here is a systematic breakdown of every layer of detection and exactly how to handle each one.

CSRF Token Management

Pinterest uses a CSRF token for all state-modifying requests and for some authenticated reads. The token is delivered via Set-Cookie when you load any Pinterest page. Your session handles this automatically as long as you warm it up:

"""
csrf_manager.py
Manage Pinterest CSRF tokens for authenticated scraping.
"""
from __future__ import annotations

import re
import time
import httpx


def get_csrf_token(client: httpx.Client) -> str | None:
    """
    Load the Pinterest homepage and extract the CSRF token from cookies
    or the page HTML. Returns the token string or None.
    """
    resp = client.get("https://www.pinterest.com/")
    resp.raise_for_status()

    # Method 1: from Set-Cookie (most reliable)
    csrf = client.cookies.get("csrftoken")
    if csrf:
        return csrf

    # Method 2: from page HTML (fallback)
    match = re.search(r'"csrftoken"\s*:\s*"([^"]+)"', resp.text)
    if match:
        return match.group(1)

    # Method 3: from response headers
    for cookie in resp.headers.get_list("set-cookie"):
        if "csrftoken=" in cookie:
            token = cookie.split("csrftoken=")[1].split(";")[0]
            return token

    return None


def inject_csrf_header(client: httpx.Client) -> httpx.Client:
    """
    Fetch a CSRF token and inject it as the X-CSRFToken header.
    Call this before any write/authenticated operation.
    """
    token = get_csrf_token(client)
    if token:
        client.headers["X-CSRFToken"] = token
        print(f"[csrf] Token acquired: {token[:12]}...")
    else:
        print("[csrf] Warning: could not acquire CSRF token")
    return client


def refresh_csrf_if_needed(client: httpx.Client, response: httpx.Response) -> bool:
    """
    Call after a failed request. Returns True if token was refreshed.
    403 responses often indicate an expired or missing CSRF token.
    """
    if response.status_code == 403:
        print("[csrf] 403 received — refreshing CSRF token")
        time.sleep(2.0)
        inject_csrf_header(client)
        return True
    return False

A fresh session hitting the API immediately is a strong bot signal. Real users browse several pages before triggering API calls. The following pattern mimics that:

"""
session_warmer.py
Warm up a Pinterest session to reduce bot detection rate.
"""
from __future__ import annotations

import json
import random
import time
from pathlib import Path

import httpx

from pinterest_base import HEADERS


WARM_UP_URLS = [
    "https://www.pinterest.com/",
    "https://www.pinterest.com/ideas/",
    "https://www.pinterest.com/ideas/home-decor/",
]


def warm_session(
    client: httpx.Client,
    extra_pages: int = 2,
) -> httpx.Client:
    """Load several pages to build cookie state before scraping."""
    pages = WARM_UP_URLS[:extra_pages + 1]
    for url in pages:
        try:
            resp = client.get(url)
            resp.raise_for_status()
            print(f"[warm] Loaded {url} — cookies: {len(client.cookies)}")
        except httpx.HTTPError as exc:
            print(f"[warm] Failed to load {url}: {exc}")
        time.sleep(random.uniform(1.5, 3.0))
    return client


def save_session(client: httpx.Client, path: str = "session_cookies.json") -> None:
    """Persist cookies to disk for reuse across runs."""
    cookies = {name: value for name, value in client.cookies.items()}
    with open(path, "w") as fh:
        json.dump(cookies, fh, indent=2)
    print(f"[session] Saved {len(cookies)} cookies to {path}")


def load_session(path: str = "session_cookies.json") -> httpx.Client | None:
    """Restore a previously saved session. Returns None if file not found."""
    cookie_path = Path(path)
    if not cookie_path.exists():
        return None

    with open(cookie_path) as fh:
        cookies = json.load(fh)

    client = httpx.Client(
        headers=HEADERS,
        follow_redirects=True,
        timeout=httpx.Timeout(30.0),
        cookies=cookies,
    )
    print(f"[session] Restored session with {len(cookies)} cookies from {path}")
    return client

Browser Fingerprint Headers

The header stack you send is checked for internal consistency. Chrome 124 on macOS sends specific Sec-CH-* headers that older or non-browser clients omit. Missing these is a common detection vector:

"""
fingerprint_headers.py
Browser-consistent header sets for anti-detection scraping.
"""
from __future__ import annotations

import random


# Realistic Chrome 124 on macOS header set
CHROME_MACOS_HEADERS: dict[str, str] = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Sec-CH-UA": '"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',
    "Sec-CH-UA-Mobile": "?0",
    "Sec-CH-UA-Platform": '"macOS"',
    "Sec-CH-UA-Platform-Version": '"14.4.0"',
    "Sec-CH-UA-Arch": '"arm"',
    "Sec-CH-UA-Full-Version": f'"124.0.{random.randint(6300, 6500)}.{random.randint(50, 200)}"',
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "Cache-Control": "max-age=0",
}

# Overrides for XHR/Fetch API calls (not page navigations)
API_HEADER_OVERRIDES: dict[str, str] = {
    "Accept": "application/json, text/javascript, */*, q=0.01",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-origin",
    "X-Requested-With": "XMLHttpRequest",
    "X-APP-VERSION": "b1e66c1",
    "X-Pinterest-AppState": "active",
}


def get_api_headers(referer: str = "https://www.pinterest.com/") -> dict[str, str]:
    """Return consistent API headers for Pinterest resource calls."""
    headers = {**CHROME_MACOS_HEADERS, **API_HEADER_OVERRIDES}
    headers["Referer"] = referer
    return headers

IP Reputation: Residential vs Datacenter Proxies

This is the most consequential factor in whether Pinterest blocks you. Datacenter IP ranges (AWS EC2, Google Cloud, DigitalOcean, Hetzner, Vultr) are well-known and aggressively flagged. Pinterest applies much stricter rate limits — often blocking after fewer than 10 requests per IP within a session — to datacenter addresses.

Residential IPs, by contrast, are assigned to real ISP subscribers and carry far less suspicion. Pinterest's systems treat them as potential real users.

For any scraping volume beyond casual testing, residential proxy rotation is not optional. ThorData offers a residential pool with geographic targeting, which matters because Pinterest also applies per-region rate limits — rotating through the same country repeatedly can still trigger flags. Using geo-diverse residential IPs from ThorData's pool distributes the request fingerprint across genuinely distinct network paths.

"""
proxy_rotation.py
Proxy configuration and rotation for Pinterest scraping.
"""
from __future__ import annotations

import itertools
import random
from typing import Optional


class ProxyRotator:
    """
    Rotate through a list of proxy URLs, cycling on each request.
    Supports sticky sessions (same proxy for N requests) for cookie coherence.
    """

    def __init__(
        self,
        proxy_urls: list[str],
        sticky_count: int = 10,
    ) -> None:
        if not proxy_urls:
            raise ValueError("Must provide at least one proxy URL")
        self._proxies = proxy_urls
        self._cycle = itertools.cycle(proxy_urls)
        self._sticky_count = sticky_count
        self._current: str = next(self._cycle)
        self._uses = 0

    def get(self) -> str:
        """Return current proxy, rotating after sticky_count uses."""
        if self._uses >= self._sticky_count:
            self._current = next(self._cycle)
            self._uses = 0
        self._uses += 1
        return self._current

    def rotate(self) -> str:
        """Force immediate rotation to next proxy."""
        self._current = next(self._cycle)
        self._uses = 0
        return self._current

    def as_httpx_proxies(self) -> dict[str, str]:
        return {"all://": self.get()}


def build_thordata_url(
    username: str,
    password: str,
    country: str = "US",
    session_id: Optional[str] = None,
) -> str:
    """
    Build a ThorData residential proxy URL.
    ThorData: https://thordata.partnerstack.com/partner/0a0x4nzh

    session_id: if set, uses sticky session routing (same exit IP for the session)
    """
    host = "proxy.thordata.com"
    port = 7777
    user_part = f"{username}-country-{country}"
    if session_id:
        user_part += f"-session-{session_id}"
    return f"http://{user_part}:{password}@{host}:{port}"


# Example usage:
#   from proxy_rotation import build_thordata_url, ProxyRotator
#
#   # Geo-diverse pool
#   proxy_urls = [
#       build_thordata_url("myuser", "mypass", "US"),
#       build_thordata_url("myuser", "mypass", "GB"),
#       build_thordata_url("myuser", "mypass", "CA"),
#       build_thordata_url("myuser", "mypass", "AU"),
#   ]
#   rotator = ProxyRotator(proxy_urls, sticky_count=15)

Rate Limiting Detection and Adaptive Backoff

Pinterest returns HTTP 429 for rate limiting and HTTP 403 for session/bot flags. The following decorator handles both gracefully:

"""
adaptive_backoff.py
Detect rate limiting signals and back off adaptively.
"""
from __future__ import annotations

import random
import time
from functools import wraps
from typing import Callable, TypeVar

import httpx

T = TypeVar("T")


def adaptive_retry(
    max_attempts: int = 5,
    base_delay: float = 2.0,
    max_delay: float = 120.0,
    jitter: float = 0.4,
) -> Callable:
    """
    Decorator that retries on 429/503 with exponential backoff + jitter.
    On 403, rotates proxy and retries once.
    """
    def decorator(fn: Callable) -> Callable:
        @wraps(fn)
        def wrapper(*args, **kwargs):
            delay = base_delay
            for attempt in range(1, max_attempts + 1):
                try:
                    result = fn(*args, **kwargs)
                    return result
                except httpx.HTTPStatusError as exc:
                    status = exc.response.status_code
                    if status == 429:
                        retry_after = exc.response.headers.get("Retry-After")
                        wait = float(retry_after) if retry_after else delay
                        jittered = wait + random.uniform(0, jitter * wait)
                        print(f"[rate-limit] 429 on attempt {attempt}. Waiting {jittered:.1f}s...")
                        time.sleep(min(jittered, max_delay))
                        delay = min(delay * 2, max_delay)
                    elif status == 503:
                        print(f"[rate-limit] 503 on attempt {attempt}. Waiting {delay:.1f}s...")
                        time.sleep(delay + random.uniform(0, jitter * delay))
                        delay = min(delay * 2, max_delay)
                    elif status == 403:
                        print(f"[rate-limit] 403 on attempt {attempt}. Possible bot flag.")
                        if attempt == 1:
                            time.sleep(delay * 3)
                            continue
                        raise
                    else:
                        raise
                    if attempt == max_attempts:
                        raise
        return wrapper
    return decorator


def request_with_jitter(
    client: httpx.Client,
    url: str,
    params: dict,
    base_delay: float = 1.5,
    jitter_range: tuple[float, float] = (0.5, 2.0),
) -> httpx.Response:
    """
    Make a GET request with pre-request jitter delay to mimic human timing.
    """
    sleep_time = base_delay + random.uniform(*jitter_range)
    time.sleep(sleep_time)
    return client.get(url, params=params)

Data Storage: JSON, CSV, and SQLite

SQLite Schema

"""
pinterest_db.py
SQLite storage for Pinterest scraping results using sqlite-utils.
"""
from __future__ import annotations

from dataclasses import asdict
from pathlib import Path
from typing import Iterable

import sqlite_utils

from pinterest_base import Pin, Board, UserProfile, Comment


def get_db(db_path: str = "pinterest.db") -> sqlite_utils.Database:
    """Open or create the Pinterest SQLite database with full schema."""
    db = sqlite_utils.Database(db_path)

    # Pins table
    if "pins" not in db.table_names():
        db["pins"].create({
            "id": str,
            "description": str,
            "image_url": str,
            "save_count": int,
            "comment_count": int,
            "source_url": str,
            "domain": str,
            "created_at": str,
            "board_id": str,
            "pinner_username": str,
            "is_shopping": int,  # SQLite has no bool
            "price": str,
            "currency": str,
            "product_name": str,
            "scraped_at": str,
        }, pk="id", not_null={"id"})

    # Boards table
    if "boards" not in db.table_names():
        db["boards"].create({
            "id": str,
            "name": str,
            "slug": str,
            "url": str,
            "description": str,
            "pin_count": int,
            "follower_count": int,
            "owner_username": str,
            "cover_image_url": str,
            "category": str,
            "created_at": str,
            "scraped_at": str,
        }, pk="id")

    # Users table
    if "users" not in db.table_names():
        db["users"].create({
            "id": str,
            "username": str,
            "full_name": str,
            "bio": str,
            "follower_count": int,
            "following_count": int,
            "board_count": int,
            "pin_count": int,
            "monthly_views": int,
            "website_url": str,
            "profile_image_url": str,
            "is_verified_merchant": int,
            "scraped_at": str,
        }, pk="id")

    # Comments table
    if "comments" not in db.table_names():
        db["comments"].create({
            "id": str,
            "pin_id": str,
            "text": str,
            "author_username": str,
            "author_id": str,
            "created_at": str,
            "like_count": int,
        }, pk="id", foreign_keys=[("pin_id", "pins", "id")])

    return db


def insert_pins(db: sqlite_utils.Database, pins: Iterable[Pin]) -> int:
    """Insert or replace pins into the database. Returns count inserted."""
    from datetime import datetime, timezone
    now = datetime.now(timezone.utc).isoformat()
    records = []
    for pin in pins:
        row = asdict(pin)
        row.pop("rich_metadata", None)  # not stored in flat table
        row["is_shopping"] = int(row["is_shopping"])
        row["scraped_at"] = now
        records.append(row)
    if records:
        db["pins"].upsert_all(records, pk="id")
    return len(records)


def insert_boards(db: sqlite_utils.Database, boards: Iterable[Board]) -> int:
    from datetime import datetime, timezone
    now = datetime.now(timezone.utc).isoformat()
    records = [{"scraped_at": now, **asdict(b)} for b in boards]
    if records:
        db["boards"].upsert_all(records, pk="id")
    return len(records)


def insert_user(db: sqlite_utils.Database, user: UserProfile) -> None:
    from datetime import datetime, timezone
    row = asdict(user)
    row["is_verified_merchant"] = int(row["is_verified_merchant"])
    row["scraped_at"] = datetime.now(timezone.utc).isoformat()
    db["users"].upsert(row, pk="id")


def insert_comments(db: sqlite_utils.Database, comments: Iterable[Comment]) -> int:
    records = [asdict(c) for c in comments]
    if records:
        db["comments"].upsert_all(records, pk="id")
    return len(records)


def export_csv(db: sqlite_utils.Database, table: str, output_path: str) -> None:
    """Export a table to CSV."""
    import csv
    rows = list(db[table].rows)
    if not rows:
        print(f"[export] Table '{table}' is empty")
        return
    with open(output_path, "w", newline="") as fh:
        writer = csv.DictWriter(fh, fieldnames=rows[0].keys())
        writer.writeheader()
        writer.writerows(rows)
    print(f"[export] {len(rows)} rows written to {output_path}")

Complete End-to-End Pipeline Script

This script orchestrates everything: profile fetch, board listing, pin extraction for all boards, comment sampling, and storage to SQLite and CSV.

"""
pinterest_pipeline.py
End-to-end Pinterest data collection pipeline.
Fetches a user's profile, all boards, all pins, and sampled comments.
Stores everything in SQLite and exports CSVs.

Usage:
    python3 pinterest_pipeline.py <username> [--proxy http://user:pass@host:port]
    python3 pinterest_pipeline.py anthropologie --max-boards 5 --output-dir ./output
"""
from __future__ import annotations

import argparse
import os
from datetime import datetime, timezone
from pathlib import Path

from pinterest_base import make_session, polite_delay
from pinterest_db import get_db, insert_pins, insert_boards, insert_user, insert_comments, export_csv
from scrape_user_profile import fetch_user_profile, fetch_all_boards
from scrape_board_pins import iter_board_pins
from scrape_pin_comments import iter_pin_comments


def run_pipeline(
    username: str,
    proxy_url: str | None = None,
    max_boards: int | None = None,
    comment_sample_size: int = 3,
    output_dir: str = "./output",
) -> None:
    os.makedirs(output_dir, exist_ok=True)
    db_path = os.path.join(output_dir, "pinterest.db")
    db = get_db(db_path)
    client = make_session(proxy_url=proxy_url)

    started_at = datetime.now(timezone.utc).isoformat()
    print(f"\n=== Pinterest Pipeline ===")
    print(f"Target: @{username}")
    print(f"Started: {started_at}")
    print(f"Output: {output_dir}")
    print()

    # --- Step 1: User profile ---
    print("[1/4] Fetching user profile...")
    profile = fetch_user_profile(client, username)
    insert_user(db, profile)
    print(f"  @{profile.username}: {profile.follower_count:,} followers, {profile.monthly_views:,} monthly views")
    polite_delay()

    # --- Step 2: Board listing ---
    print("\n[2/4] Fetching boards...")
    boards = fetch_all_boards(client, username)
    if max_boards:
        boards = boards[:max_boards]
    insert_boards(db, boards)
    print(f"  {len(boards)} boards fetched")
    polite_delay()

    # --- Step 3: Pins for each board ---
    print(f"\n[3/4] Fetching pins for {len(boards)} boards...")
    all_pin_ids: list[str] = []

    for i, board in enumerate(boards, 1):
        print(f"\n  Board {i}/{len(boards)}: '{board.name}' ({board.pin_count} pins expected)")
        pins = []
        for pin in iter_board_pins(client, board.owner_username, board.slug):
            pins.append(pin)
            all_pin_ids.append(pin.id)
        n = insert_pins(db, pins)
        print(f"  Inserted {n} pins from '{board.name}'")
        polite_delay(base=2.0, jitter=1.5)

    # --- Step 4: Comment sampling ---
    print(f"\n[4/4] Sampling comments from {comment_sample_size} high-save pins...")
    top_pins = sorted(
        (row for row in db["pins"].rows if row.get("comment_count", 0) > 0),
        key=lambda r: r.get("comment_count", 0),
        reverse=True,
    )[:comment_sample_size]

    total_comments = 0
    for pin_row in top_pins:
        pin_id = pin_row["id"]
        print(f"  Fetching comments for pin {pin_id} ({pin_row.get('comment_count', 0)} comments)...")
        comments = list(iter_pin_comments(client, pin_id))
        n = insert_comments(db, comments)
        total_comments += n
        polite_delay()

    # --- Export CSVs ---
    print("\n[export] Writing CSVs...")
    export_csv(db, "pins", os.path.join(output_dir, "pins.csv"))
    export_csv(db, "boards", os.path.join(output_dir, "boards.csv"))
    export_csv(db, "users", os.path.join(output_dir, "users.csv"))
    export_csv(db, "comments", os.path.join(output_dir, "comments.csv"))

    # --- Summary ---
    print(f"\n=== Pipeline Complete ===")
    print(f"  Profile: @{profile.username}")
    print(f"  Boards: {len(boards)}")
    print(f"  Pins: {len(all_pin_ids)}")
    print(f"  Comments: {total_comments}")
    print(f"  Database: {db_path}")

    client.close()


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Pinterest full-profile data pipeline")
    parser.add_argument("username", help="Pinterest username to scrape")
    parser.add_argument("--proxy", help="Proxy URL (e.g. http://user:pass@host:port)")
    parser.add_argument("--max-boards", type=int, help="Limit number of boards to scrape")
    parser.add_argument("--comment-sample", type=int, default=3, help="Number of pins to fetch comments from")
    parser.add_argument("--output-dir", default="./output", help="Directory for output files")
    args = parser.parse_args()

    run_pipeline(
        username=args.username,
        proxy_url=args.proxy,
        max_boards=args.max_boards,
        comment_sample_size=args.comment_sample,
        output_dir=args.output_dir,
    )

Troubleshooting Common Errors

401 Unauthorized on every request

Your session has no valid cookies. The homepage warm-up in make_session() is failing, likely because your IP is blocked or the proxy is not working. Verify connectivity: python3 -c "import httpx; print(httpx.get('https://www.pinterest.com/').status_code)". If you get a non-200, your IP or proxy needs to change.

403 Forbidden after N requests

This is Pinterest's bot detection triggering on your session. Common causes: 1. Missing or inconsistent Sec-CH-UA headers (add them via fingerprint_headers.py) 2. CSRF token expired (call inject_csrf_header() to refresh) 3. Datacenter IP (switch to residential — see ThorData) 4. Too many requests too fast (increase delays and add jitter)

429 Too Many Requests

You have exceeded Pinterest's rate limit for your IP. The adaptive_retry decorator handles this with backoff, but if 429s are constant, you need to slow down your request rate significantly or rotate to a fresh residential IP. A sustained rate of more than 1 request per second from a single IP will reliably trigger 429s.

KeyError: 'resource_response'

The response JSON does not match the expected envelope. This happens when: - Pinterest returns an error page (HTML) instead of JSON — check response.headers["content-type"] - The endpoint path has changed (Pinterest occasionally shuffles API versions) - You are being served a CAPTCHA challenge page

Add this check to any response handler:

if response.headers.get("content-type", "").startswith("text/html"):
    print("[error] Got HTML instead of JSON — possible CAPTCHA or block")
    print(response.text[:500])
    raise RuntimeError("Non-JSON response from Pinterest")

Bookmark loops (infinite pagination)

Occasionally Pinterest returns the same bookmark repeatedly, causing an infinite loop. Protect against this:

seen_bookmarks: set[str] = set()

while True:
    pins, bookmark = fetch_page(client, bookmark)
    # ... process pins ...
    if not bookmark or bookmark == "-end-" or bookmark in seen_bookmarks:
        break
    seen_bookmarks.add(bookmark)

json.JSONDecodeError on API responses

Pinterest sometimes returns 204 No Content or empty bodies for boards with no pins. Check response.content before calling response.json():

if not response.content:
    return [], None
body = response.json()

Image URLs returning 403

Pinterest image URLs are tied to session cookies for some content. If you are downloading images, do it within the same client session that fetched the pin data, not in a separate plain requests session.


Pinterest's Terms of Service prohibit automated data collection. That is a contract between you and Pinterest, not a law. Whether scraping their public data is legal depends on jurisdiction and use case.

The US legal landscape. The hiQ v. LinkedIn case established that scraping publicly accessible data does not violate the Computer Fraud and Abuse Act (CFAA) because accessing public data requires no authorization to bypass. The Ninth Circuit affirmed this principle in 2022. Public Pinterest boards — visible to anyone without an account — sit in similar territory. However, this ruling is narrow and ongoing, and does not apply universally to all scraping contexts.

EU considerations. The GDPR applies when you collect personal data about EU residents. Pinterest usernames, profile images, and biographical data are personal data under GDPR. Collect only what you need, store it securely, do not publish or resell it without a legal basis, and have a documented purpose.

Copyright. The images on Pinterest are mostly copyrighted by their original creators. Collecting image URLs is fine. Downloading, rehosting, or republishing pin images without license is not.

Practical ethics. Beyond legality: do not scrape in a way that degrades the service for other users. Cache data aggressively so you fetch each piece once. Respect robots.txt signals even if they are not legally binding. If you are building something commercial on top of Pinterest data, consider whether you should be using an official data partnership instead.

The code in this guide is for research, analysis, and personal use. Production commercial applications built on scraped data carry legal and reputational risk that you own entirely.


Quick Reference: Resource API Cheat Sheet

Use Case Resource Name Key Options
Board pins BoardFeedResource board_url, page_size, bookmarks
User profile UserResource username, field_set_key: "profile"
User's boards BoardsResource username, sort, privacy_filter
Keyword search BaseSearchResource query, scope: "pins"
Related pins RelatedPinsResource pin_id, count
Single pin PinResource id, field_set_key: "detailed"
Comments AggregatedCommentResource objectId, objectType: "pin"
Trending InterestFeedResource interest_id (category slug)
Shopping ShoppingSpotlightFeedResource field_set_key

All endpoints follow the same base pattern: GET https://www.pinterest.com/resource/<ResourceName>/get/?source_url=<path>&data=<json>

The data parameter is a JSON-encoded object with options and context keys. Options always include pagination via bookmarks: [<token>]. The response always wraps data in resource_response.data with the next page in resource_response.bookmark.


Keeping Your Scraper Working Over Time

Pinterest changes their internal API without notice. Endpoints that work today may return 404s next month, and new header requirements appear without warning. Here are habits that keep your scraper resilient:

Log raw responses. When a scraper breaks, the first thing you need is the actual server response. Log raw JSON to a file for at least a few days when running in production.

Monitor with a canary request. Before any large scraping run, do a single test fetch for one known-good board. If it fails, abort and debug before burning through your proxy quota.

Use DevTools as a reference. When something breaks, open Pinterest in Chrome, load the same page you are trying to scrape, and compare what headers and parameters the browser actually sends to what your script is sending. The browser is always right.

Track endpoint versions. The X-APP-VERSION header (b1e66c1 in the examples) is pinned to a specific Pinterest frontend build. Pinterest sometimes checks this. If you start seeing unexpected failures, open the Pinterest source and search for the current app version string.

Rotate user agents occasionally. Chrome releases a new major version every 6-8 weeks. A Chrome 124 user agent in late 2027 is a red flag. Keep your Sec-CH-UA and User-Agent headers in sync with a current Chrome release.