How to Scrape Pinterest Boards and Pins in 2026: The Definitive Python Guide
Pinterest is one of the most data-rich platforms on the internet, and yet it's one of the least discussed when it comes to programmatic data extraction. That's partly because its unofficial API is not documented, and partly because scraping it well requires solving a cluster of interlocking problems — session management, fingerprint consistency, pagination, CSRF tokens, and IP reputation — all at once.
This guide solves all of them. It is a definitive, working reference for 2026, written for engineers who need real data, not toy examples.
Why Scrape Pinterest?
Before the code, here is why people actually build Pinterest scrapers — because the use case shapes what data you need and how aggressively you need to collect it.
Trend research and forecasting. Pinterest users save content months before they buy it. The platform has been consistently accurate as an early signal for seasonal trends in home decor, fashion, food, and travel. A board that picks up 10,000 new saves in a week for "quiet luxury interiors" tells you something that no search volume tool can, because it reflects intentional curation, not passive browsing. Retailers, trend agencies, and CPG brands monitor boards at scale to get 2-6 month lead time on consumer demand shifts.
E-commerce competitive intelligence. Product pins expose competitor inventory, pricing strategy, image creative, and which SKUs are gaining traction in the visual search layer. If a competing brand's product pins are accumulating saves across dozens of influencer boards, that's a market signal. Shopping pins also expose domain attribution, so you can see where traffic is flowing even if you can't see the GA data.
Fashion and visual content marketing. Style boards curated by top pinners function as editorial taste-making. Tracking which visual aesthetics (color palettes, composition styles, lifestyle contexts) accumulate the most engagement on fashion-adjacent boards gives content teams a data-backed brief rather than a vibes-based creative direction.
SEO and content strategy. Pinterest ranks in Google image search and has its own internal search engine. Understanding which pin descriptions, keywords, and image styles perform in Pinterest search tells you something about how Google processes visual content at scale — and gives content marketers a second traffic channel to optimize for.
Academic research on visual culture. Scholars studying how aesthetic movements spread through social networks, how visual norms shift across communities, and how platforms shape cultural production need bulk data. Pinterest's curation layer — where people explicitly organize images into named categories — makes it uniquely useful as a structured dataset for visual culture research.
Influencer and audience research. Board composition, save counts, and follower data let you evaluate whether an influencer's audience is real and engaged, or whether their follow metrics are gamed. A pinner with 500K followers but boards averaging 3 saves per pin is a different story than someone with 50K followers whose pins consistently get 200+ saves.
How Pinterest's Internal API Works
Pinterest has no public API worth using in 2026. The official v1 API was shuttered for most developers, and the v3 endpoint is heavily restricted. The good news is that the browser has to get data from somewhere, and that somewhere is a clean, JSON-based internal REST API.
Open Chrome DevTools on any Pinterest page, go to the Network tab, filter by Fetch/XHR, and watch the requests. You will see calls to two main patterns:
https://www.pinterest.com/resource/<ResourceName>/get/
https://api.pinterest.com/v3/...
The resource/ API is the one you want. Each resource call sends a data query parameter containing a JSON-encoded options object. The server responds with a consistent envelope:
{
"resource": { "name": "BoardFeedResource", "options": {...} },
"resource_response": {
"status": "success",
"http_status": 200,
"data": [...],
"bookmark": "SomeOpaqueBase64String=="
},
"client_context": {...}
}
The bookmark field is how pagination works. Pass it back in the next request inside options.bookmarks as a single-item array. When bookmark is null or "-end-", you have reached the last page.
The resource names you will use most:
- BoardFeedResource — pins on a board
- BoardsResource — boards belonging to a user
- UserResource — user profile data
- BaseSearchResource — keyword search
- RelatedPinsResource — related/visual search
- PinResource — single pin detail
- PinCommentResource — comments on a pin
- InterestFeedResource — trending pins by category
- ShoppingSpotlightFeedResource — shopping/product pins
Setup and Shared Infrastructure
Install dependencies:
pip install httpx tenacity sqlite-utils
The following module is imported by all scripts in this guide. Save it as pinterest_base.py.
"""
pinterest_base.py
Shared session setup, retry logic, and dataclass models for Pinterest scraping.
"""
from __future__ import annotations
import json
import random
import time
import urllib.parse
from dataclasses import dataclass, field, asdict
from typing import Any, Optional
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
# ---------------------------------------------------------------------------
# Dataclass models
# ---------------------------------------------------------------------------
@dataclass
class Pin:
id: str
description: str
image_url: Optional[str]
save_count: int
comment_count: int
source_url: Optional[str]
domain: Optional[str]
created_at: Optional[str]
board_id: Optional[str]
pinner_username: Optional[str]
is_shopping: bool = False
price: Optional[str] = None
currency: Optional[str] = None
product_name: Optional[str] = None
rich_metadata: dict = field(default_factory=dict)
@classmethod
def from_raw(cls, raw: dict) -> "Pin":
images = raw.get("images") or {}
orig = images.get("orig") or images.get("736x") or {}
rich = raw.get("rich_metadata") or {}
pinner = raw.get("pinner") or {}
return cls(
id=str(raw.get("id", "")),
description=raw.get("description") or raw.get("grid_title") or "",
image_url=orig.get("url"),
save_count=raw.get("save_count") or raw.get("repin_count") or 0,
comment_count=raw.get("comment_count") or 0,
source_url=raw.get("link"),
domain=raw.get("domain"),
created_at=raw.get("created_at"),
board_id=str(raw["board"]["id"]) if raw.get("board") else None,
pinner_username=pinner.get("username"),
is_shopping=bool(raw.get("shopping_rec_count") or raw.get("is_shopping_ad")),
price=rich.get("price"),
currency=rich.get("currency"),
product_name=rich.get("name"),
rich_metadata=rich,
)
@dataclass
class Board:
id: str
name: str
slug: str
url: str
description: str
pin_count: int
follower_count: int
owner_username: str
cover_image_url: Optional[str]
category: Optional[str]
created_at: Optional[str]
@classmethod
def from_raw(cls, raw: dict) -> "Board":
owner = raw.get("owner") or {}
cover = raw.get("cover_images") or {}
cover_url = None
for size in ("736x", "400x300", "200x150"):
if size in cover:
cover_url = cover[size].get("url")
break
url = raw.get("url", "")
slug = url.strip("/").split("/")[-1] if url else raw.get("slug", "")
return cls(
id=str(raw.get("id", "")),
name=raw.get("name", ""),
slug=slug,
url=url,
description=raw.get("description") or "",
pin_count=raw.get("pin_count") or 0,
follower_count=raw.get("follower_count") or 0,
owner_username=owner.get("username", ""),
cover_image_url=cover_url,
category=raw.get("category"),
created_at=raw.get("created_at"),
)
@dataclass
class UserProfile:
id: str
username: str
full_name: str
bio: str
follower_count: int
following_count: int
board_count: int
pin_count: int
monthly_views: int
website_url: Optional[str]
profile_image_url: Optional[str]
is_verified_merchant: bool
@classmethod
def from_raw(cls, raw: dict) -> "UserProfile":
return cls(
id=str(raw.get("id", "")),
username=raw.get("username", ""),
full_name=raw.get("full_name") or "",
bio=raw.get("about") or "",
follower_count=raw.get("follower_count") or 0,
following_count=raw.get("following_count") or 0,
board_count=raw.get("board_count") or 0,
pin_count=raw.get("pin_count") or 0,
monthly_views=raw.get("monthly_views") or 0,
website_url=raw.get("website_url"),
profile_image_url=(raw.get("image_medium_url") or raw.get("image_large_url")),
is_verified_merchant=bool(raw.get("is_verified_merchant")),
)
@dataclass
class Comment:
id: str
pin_id: str
text: str
author_username: str
author_id: str
created_at: str
like_count: int
@classmethod
def from_raw(cls, raw: dict, pin_id: str) -> "Comment":
user = raw.get("user") or {}
return cls(
id=str(raw.get("id", "")),
pin_id=pin_id,
text=raw.get("text", ""),
author_username=user.get("username", ""),
author_id=str(user.get("id", "")),
created_at=raw.get("created_at", ""),
like_count=raw.get("like_count") or 0,
)
# ---------------------------------------------------------------------------
# Session factory
# ---------------------------------------------------------------------------
CHROME_VERSION = "124.0.0.0"
HEADERS: dict[str, str] = {
"User-Agent": (
f"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
f"AppleWebKit/537.36 (KHTML, like Gecko) "
f"Chrome/{CHROME_VERSION} Safari/537.36"
),
"Accept": "application/json, text/javascript, */*, q=0.01",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-CH-UA": f'"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',
"Sec-CH-UA-Mobile": "?0",
"Sec-CH-UA-Platform": '"macOS"',
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"X-Requested-With": "XMLHttpRequest",
"X-APP-VERSION": "b1e66c1",
"X-Pinterest-AppState": "active",
"Referer": "https://www.pinterest.com/",
}
def make_session(proxy_url: Optional[str] = None) -> httpx.Client:
"""
Build a warmed-up httpx session.
Loads the homepage to acquire cookies (including csrftoken) before
any API calls are made.
"""
client = httpx.Client(
headers=HEADERS,
follow_redirects=True,
timeout=httpx.Timeout(30.0),
proxies={"all://": proxy_url} if proxy_url else None,
)
# Cookie warming — load homepage to acquire csrftoken
try:
resp = client.get("https://www.pinterest.com/")
resp.raise_for_status()
# Small pause to look like a human landing on the page
time.sleep(random.uniform(1.2, 2.5))
except httpx.HTTPError as exc:
print(f"[warn] Homepage warm-up failed: {exc}")
return client
# ---------------------------------------------------------------------------
# Retry decorator for transient failures
# ---------------------------------------------------------------------------
pinterest_retry = retry(
retry=retry_if_exception_type((httpx.HTTPStatusError, httpx.TimeoutException)),
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1.5, min=2, max=45),
reraise=True,
)
# ---------------------------------------------------------------------------
# Rate limiting helpers
# ---------------------------------------------------------------------------
def polite_delay(base: float = 1.5, jitter: float = 1.2) -> None:
"""Sleep for base + random jitter seconds."""
time.sleep(base + random.uniform(0.0, jitter))
def build_params(source_url: str, options: dict[str, Any]) -> dict[str, str]:
"""Encode resource API query parameters."""
return {
"source_url": source_url,
"data": json.dumps({"options": options, "context": {}}, separators=(",", ":")),
}
Section 1: Board Pin Extraction with Full Pagination
This script fetches every pin from a board, handles the bookmark-based pagination, and writes results to JSON.
"""
scrape_board_pins.py
Extract all pins from a Pinterest board with full pagination.
Usage: python3 scrape_board_pins.py <username> <board_slug> [output.json]
"""
from __future__ import annotations
import json
import sys
from typing import Iterator
import httpx
from pinterest_base import (
Pin, Board, make_session, polite_delay, build_params, pinterest_retry
)
RESOURCE_URL = "https://www.pinterest.com/resource/{resource}/get/"
@pinterest_retry
def _fetch_board_pins_page(
client: httpx.Client,
username: str,
board_slug: str,
bookmark: str | None,
) -> tuple[list[dict], str | None]:
"""Fetch a single page of board pins. Returns (raw_pins, next_bookmark)."""
options: dict = {
"board_url": f"/{username}/{board_slug}/",
"board_id": None,
"currentFilter": -1,
"field_set_key": "react_grid_pin",
"filter_section_pins": True,
"layout": "default",
"page_size": 25,
"redux_normalize_feed": True,
}
if bookmark:
options["bookmarks"] = [bookmark]
params = build_params(f"/{username}/{board_slug}/", options)
resp = client.get(
RESOURCE_URL.format(resource="BoardFeedResource"),
params=params,
)
resp.raise_for_status()
body = resp.json()
resource_response = body["resource_response"]
raw_pins = resource_response.get("data") or []
next_bookmark = resource_response.get("bookmark")
return raw_pins, next_bookmark
def iter_board_pins(
client: httpx.Client,
username: str,
board_slug: str,
) -> Iterator[Pin]:
"""Yield Pin objects for every pin on the board."""
bookmark: str | None = None
page = 0
total = 0
while True:
page += 1
raw_pins, bookmark = _fetch_board_pins_page(client, username, board_slug, bookmark)
for raw in raw_pins:
if not raw or not isinstance(raw, dict):
continue
try:
yield Pin.from_raw(raw)
total += 1
except Exception as exc:
print(f"[warn] Could not parse pin: {exc}")
print(f" Page {page}: fetched {len(raw_pins)} pins (total so far: {total})")
if not bookmark or bookmark == "-end-":
break
polite_delay()
def scrape_board(username: str, board_slug: str, output_path: str = "pins.json") -> list[Pin]:
client = make_session()
print(f"Scraping board: pinterest.com/{username}/{board_slug}/")
pins: list[Pin] = []
for pin in iter_board_pins(client, username, board_slug):
pins.append(pin)
with open(output_path, "w") as fh:
json.dump([vars(p) for p in pins], fh, indent=2, default=str)
print(f"\nDone. {len(pins)} pins written to {output_path}")
client.close()
return pins
if __name__ == "__main__":
username = sys.argv[1] if len(sys.argv) > 1 else "anthropologie"
board_slug = sys.argv[2] if len(sys.argv) > 2 else "home"
out = sys.argv[3] if len(sys.argv) > 3 else "pins.json"
scrape_board(username, board_slug, out)
Sample output (single pin object):
{
"id": "982374651820394756",
"description": "Linen duvet cover in warm sand — perfect for that quiet luxury bedroom look",
"image_url": "https://i.pinimg.com/originals/4a/b2/cc/4ab2cc9e3a1db55f1c2e837612facb9d.jpg",
"save_count": 4821,
"comment_count": 12,
"source_url": "https://www.anthropologie.com/shop/linen-duvet-cover",
"domain": "anthropologie.com",
"created_at": "2025-11-03T14:22:11",
"board_id": "771209384756",
"pinner_username": "homeaesthetics_daily",
"is_shopping": true,
"price": "148.00",
"currency": "USD",
"product_name": "Washed Linen Duvet Cover"
}
Section 2: User Profile and Board Listing
"""
scrape_user_profile.py
Fetch a Pinterest user's profile and all their public boards.
Usage: python3 scrape_user_profile.py <username> [output.json]
"""
from __future__ import annotations
import json
import sys
from dataclasses import asdict
import httpx
from pinterest_base import (
UserProfile, Board, make_session, polite_delay, build_params, pinterest_retry
)
RESOURCE_URL = "https://www.pinterest.com/resource/{resource}/get/"
@pinterest_retry
def fetch_user_profile(client: httpx.Client, username: str) -> UserProfile:
"""Fetch full profile metadata for a Pinterest user."""
options = {
"username": username,
"field_set_key": "profile",
}
params = build_params(f"/{username}/", options)
resp = client.get(RESOURCE_URL.format(resource="UserResource"), params=params)
resp.raise_for_status()
raw = resp.json()["resource_response"]["data"]
return UserProfile.from_raw(raw)
@pinterest_retry
def _fetch_boards_page(
client: httpx.Client,
username: str,
bookmark: str | None,
) -> tuple[list[dict], str | None]:
options: dict = {
"username": username,
"field_set_key": "profile_grid_item",
"page_size": 50,
"privacy_filter": "all",
"sort": "last_pinned_to",
}
if bookmark:
options["bookmarks"] = [bookmark]
params = build_params(f"/{username}/boards/", options)
resp = client.get(RESOURCE_URL.format(resource="BoardsResource"), params=params)
resp.raise_for_status()
body = resp.json()["resource_response"]
return body.get("data") or [], body.get("bookmark")
def fetch_all_boards(client: httpx.Client, username: str) -> list[Board]:
"""Return all public boards for a user."""
boards: list[Board] = []
bookmark: str | None = None
while True:
raw_boards, bookmark = _fetch_boards_page(client, username, bookmark)
for raw in raw_boards:
if not raw or not isinstance(raw, dict):
continue
try:
boards.append(Board.from_raw(raw))
except Exception as exc:
print(f"[warn] Board parse error: {exc}")
if not bookmark or bookmark == "-end-":
break
polite_delay(base=1.0, jitter=0.8)
return boards
def scrape_user(username: str, output_path: str = "user_profile.json") -> dict:
client = make_session()
print(f"Fetching profile: pinterest.com/{username}/")
profile = fetch_user_profile(client, username)
polite_delay()
print(f"Found {profile.board_count} boards, {profile.monthly_views:,} monthly views")
boards = fetch_all_boards(client, username)
print(f"Fetched {len(boards)} boards")
output = {
"profile": asdict(profile),
"boards": [asdict(b) for b in boards],
}
with open(output_path, "w") as fh:
json.dump(output, fh, indent=2, default=str)
print(f"Written to {output_path}")
client.close()
return output
if __name__ == "__main__":
username = sys.argv[1] if len(sys.argv) > 1 else "anthropologie"
out = sys.argv[2] if len(sys.argv) > 2 else "user_profile.json"
scrape_user(username, out)
Sample profile output:
{
"profile": {
"id": "502814736284",
"username": "anthropologie",
"full_name": "Anthropologie",
"bio": "Inspiring the free-spirited lifestyle.",
"follower_count": 2847391,
"following_count": 148,
"board_count": 94,
"pin_count": 38201,
"monthly_views": 12400000,
"website_url": "https://www.anthropologie.com",
"profile_image_url": "https://i.pinimg.com/280x280_RS/8b/6e/...",
"is_verified_merchant": true
},
"boards": [
{
"id": "771209384756",
"name": "Home Decor",
"slug": "home-decor",
"url": "/anthropologie/home-decor/",
"description": "Curated home goods and interior inspiration.",
"pin_count": 4203,
"follower_count": 189421,
"owner_username": "anthropologie",
"cover_image_url": "https://i.pinimg.com/400x300/...",
"category": "home_decor",
"created_at": "2012-04-18T09:14:00"
}
]
}
Section 3: Pin Search by Keyword
"""
scrape_pin_search.py
Search Pinterest pins by keyword with paginated results.
Usage: python3 scrape_pin_search.py "quiet luxury bedroom" [--pages 5] [output.json]
"""
from __future__ import annotations
import argparse
import json
import urllib.parse
from dataclasses import asdict
from typing import Iterator
import httpx
from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry
RESOURCE_URL = "https://www.pinterest.com/resource/BaseSearchResource/get/"
@pinterest_retry
def _fetch_search_page(
client: httpx.Client,
query: str,
bookmark: str | None,
) -> tuple[list[dict], str | None]:
options: dict = {
"query": query,
"scope": "pins",
"no_fetch_context_on_resource": False,
"page_size": 25,
"redux_normalize_feed": True,
"rs": "typed",
"auto_correction_disabled": False,
}
if bookmark:
options["bookmarks"] = [bookmark]
encoded_query = urllib.parse.quote(query)
params = build_params(
f"/search/pins/?q={encoded_query}&rs=typed&term_meta%5B%5D=typed",
options,
)
resp = client.get(RESOURCE_URL, params=params)
resp.raise_for_status()
body = resp.json()["resource_response"]
data = body.get("data") or {}
results = data.get("results") or []
pins = [r for r in results if isinstance(r, dict) and r.get("type") == "pin"]
return pins, body.get("bookmark")
def iter_search_pins(
client: httpx.Client,
query: str,
max_pages: int = 10,
) -> Iterator[Pin]:
"""Yield Pin objects from keyword search results."""
bookmark: str | None = None
for page in range(1, max_pages + 1):
raw_pins, bookmark = _fetch_search_page(client, query, bookmark)
yielded = 0
for raw in raw_pins:
try:
yield Pin.from_raw(raw)
yielded += 1
except Exception as exc:
print(f"[warn] Search pin parse error: {exc}")
print(f" Search page {page}: {yielded} pins")
if not bookmark or bookmark == "-end-":
break
polite_delay()
def scrape_search(query: str, max_pages: int = 5, output_path: str = "search_results.json") -> list[Pin]:
client = make_session()
print(f"Searching Pinterest for: '{query}' (up to {max_pages} pages)")
pins: list[Pin] = []
for pin in iter_search_pins(client, query, max_pages):
pins.append(pin)
with open(output_path, "w") as fh:
json.dump([asdict(p) for p in pins], fh, indent=2, default=str)
print(f"\nDone. {len(pins)} results written to {output_path}")
client.close()
return pins
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("query", help="Search query")
parser.add_argument("--pages", type=int, default=5, help="Max pages to fetch")
parser.add_argument("--output", default="search_results.json")
args = parser.parse_args()
scrape_search(args.query, args.pages, args.output)
Section 4: Related Pins and Visual Search Recommendations
Pinterest's visual search system powers the "More like this" section on any pin page. This is one of the most valuable endpoints for trend research because it reveals the platform's internal understanding of visual similarity.
"""
scrape_related_pins.py
Fetch visually related pins for a given pin ID (Pinterest's "More like this").
Usage: python3 scrape_related_pins.py <pin_id> [output.json]
"""
from __future__ import annotations
import json
import sys
from dataclasses import asdict
from typing import Iterator
import httpx
from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry
RESOURCE_URL = "https://www.pinterest.com/resource/RelatedPinsResource/get/"
@pinterest_retry
def _fetch_related_page(
client: httpx.Client,
pin_id: str,
bookmark: str | None,
) -> tuple[list[dict], str | None]:
options: dict = {
"pin_id": pin_id,
"add_vase": True,
"count": 25,
"field_set_key": "react_grid_pin",
"redux_normalize_feed": True,
}
if bookmark:
options["bookmarks"] = [bookmark]
params = build_params(f"/pin/{pin_id}/", options)
resp = client.get(RESOURCE_URL, params=params)
resp.raise_for_status()
body = resp.json()["resource_response"]
return body.get("data") or [], body.get("bookmark")
def iter_related_pins(
client: httpx.Client,
pin_id: str,
max_pages: int = 3,
) -> Iterator[Pin]:
bookmark: str | None = None
for page in range(1, max_pages + 1):
raw_pins, bookmark = _fetch_related_page(client, pin_id, bookmark)
yielded = 0
for raw in raw_pins:
if not isinstance(raw, dict):
continue
try:
yield Pin.from_raw(raw)
yielded += 1
except Exception as exc:
print(f"[warn] Related pin parse error: {exc}")
print(f" Related pins page {page}: {yielded} pins")
if not bookmark or bookmark == "-end-":
break
polite_delay()
def scrape_related(pin_id: str, max_pages: int = 3, output_path: str = "related_pins.json") -> list[Pin]:
client = make_session()
print(f"Fetching related pins for: {pin_id}")
pins: list[Pin] = []
for pin in iter_related_pins(client, pin_id, max_pages):
pins.append(pin)
with open(output_path, "w") as fh:
json.dump([asdict(p) for p in pins], fh, indent=2, default=str)
print(f"\nDone. {len(pins)} related pins written to {output_path}")
client.close()
return pins
if __name__ == "__main__":
pin_id = sys.argv[1] if len(sys.argv) > 1 else "982374651820394756"
out = sys.argv[2] if len(sys.argv) > 2 else "related_pins.json"
scrape_related(pin_id, output_path=out)
Section 5: Pin Comment Extraction
"""
scrape_pin_comments.py
Extract all comments from a Pinterest pin.
Usage: python3 scrape_pin_comments.py <pin_id> [output.json]
"""
from __future__ import annotations
import json
import sys
from dataclasses import asdict, dataclass
from typing import Iterator, Optional
import httpx
from pinterest_base import Comment, make_session, polite_delay, build_params, pinterest_retry
RESOURCE_URL = "https://www.pinterest.com/resource/AggregatedCommentResource/get/"
@pinterest_retry
def _fetch_comments_page(
client: httpx.Client,
pin_id: str,
bookmark: str | None,
) -> tuple[list[dict], str | None]:
options: dict = {
"objectId": pin_id,
"objectType": "pin",
"page_size": 50,
}
if bookmark:
options["bookmarks"] = [bookmark]
params = build_params(f"/pin/{pin_id}/", options)
resp = client.get(RESOURCE_URL, params=params)
resp.raise_for_status()
body = resp.json()["resource_response"]
return body.get("data") or [], body.get("bookmark")
def iter_pin_comments(
client: httpx.Client,
pin_id: str,
) -> Iterator[Comment]:
bookmark: str | None = None
while True:
raw_comments, bookmark = _fetch_comments_page(client, pin_id, bookmark)
for raw in raw_comments:
if not isinstance(raw, dict):
continue
try:
yield Comment.from_raw(raw, pin_id)
except Exception as exc:
print(f"[warn] Comment parse error: {exc}")
if not bookmark or bookmark == "-end-":
break
polite_delay(base=1.0, jitter=0.5)
def scrape_comments(pin_id: str, output_path: str = "comments.json") -> list[Comment]:
client = make_session()
print(f"Fetching comments for pin: {pin_id}")
comments: list[Comment] = []
for comment in iter_pin_comments(client, pin_id):
comments.append(comment)
with open(output_path, "w") as fh:
json.dump([asdict(c) for c in comments], fh, indent=2, default=str)
print(f"Done. {len(comments)} comments written to {output_path}")
client.close()
return comments
if __name__ == "__main__":
pin_id = sys.argv[1] if len(sys.argv) > 1 else "982374651820394756"
out = sys.argv[2] if len(sys.argv) > 2 else "comments.json"
scrape_comments(pin_id, out)
Sample comment output:
[
{
"id": "6029483756102938471",
"pin_id": "982374651820394756",
"text": "Love this! Where can I find that throw blanket?",
"author_username": "interiorinspo_daily",
"author_id": "830194827364",
"created_at": "2025-11-05T09:43:17",
"like_count": 7
}
]
Section 6: Trending Pins by Category
Pinterest exposes category-level trending feeds through InterestFeedResource. Categories include home_decor, fashion, food_drink, beauty, travel, art, diy_crafts, and more.
"""
scrape_trending.py
Fetch trending pins for a Pinterest interest/category.
Usage: python3 scrape_trending.py <category> [--pages 3] [output.json]
"""
from __future__ import annotations
import argparse
import json
from dataclasses import asdict
from typing import Iterator
import httpx
from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry
RESOURCE_URL = "https://www.pinterest.com/resource/InterestFeedResource/get/"
# Known valid category slugs
VALID_CATEGORIES = [
"home_decor", "fashion", "food_drink", "beauty", "travel",
"art", "diy_crafts", "photography", "architecture", "cars_motorcycles",
"film_music_books", "fitness", "gardening", "kids_parenting",
"mens_fashion", "womens_fashion", "outdoor", "pets", "sports",
"tattoos", "technology", "weddings",
]
@pinterest_retry
def _fetch_trending_page(
client: httpx.Client,
category: str,
bookmark: str | None,
) -> tuple[list[dict], str | None]:
options: dict = {
"interest_id": category,
"field_set_key": "react_grid_pin",
"is_own_profile_pins": False,
"page_size": 25,
"redux_normalize_feed": True,
}
if bookmark:
options["bookmarks"] = [bookmark]
params = build_params(f"/ideas/{category}/", options)
resp = client.get(RESOURCE_URL, params=params)
resp.raise_for_status()
body = resp.json()["resource_response"]
return body.get("data") or [], body.get("bookmark")
def iter_trending_pins(
client: httpx.Client,
category: str,
max_pages: int = 3,
) -> Iterator[Pin]:
bookmark: str | None = None
for page in range(1, max_pages + 1):
raw_pins, bookmark = _fetch_trending_page(client, category, bookmark)
yielded = 0
for raw in raw_pins:
if not isinstance(raw, dict):
continue
try:
yield Pin.from_raw(raw)
yielded += 1
except Exception as exc:
print(f"[warn] Trending pin parse error: {exc}")
print(f" Trending [{category}] page {page}: {yielded} pins")
if not bookmark or bookmark == "-end-":
break
polite_delay()
def scrape_trending(
category: str,
max_pages: int = 3,
output_path: str = "trending.json",
) -> list[Pin]:
if category not in VALID_CATEGORIES:
print(f"[warn] Unknown category '{category}'. Valid options: {VALID_CATEGORIES}")
client = make_session()
print(f"Fetching trending pins for category: {category}")
pins: list[Pin] = []
for pin in iter_trending_pins(client, category, max_pages):
pins.append(pin)
with open(output_path, "w") as fh:
json.dump([asdict(p) for p in pins], fh, indent=2, default=str)
print(f"\nDone. {len(pins)} trending pins written to {output_path}")
client.close()
return pins
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("category", help=f"One of: {', '.join(VALID_CATEGORIES)}")
parser.add_argument("--pages", type=int, default=3)
parser.add_argument("--output", default="trending.json")
args = parser.parse_args()
scrape_trending(args.category, args.pages, args.output)
Section 7: Shopping and Product Pin Extraction
Shopping pins contain rich structured metadata including price, retailer, product name, and availability. This makes them the highest-value target for e-commerce competitive intelligence.
"""
scrape_shopping_pins.py
Extract shopping/product pins from a board or search query.
Includes price, retailer, and product metadata.
Usage: python3 scrape_shopping_pins.py --board <user>/<slug> [--search "keyword"] [output.json]
"""
from __future__ import annotations
import argparse
import json
from dataclasses import asdict, dataclass
from typing import Iterator, Optional
import httpx
from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry
SHOPPING_RESOURCE_URL = "https://www.pinterest.com/resource/ShoppingSpotlightFeedResource/get/"
BOARD_RESOURCE_URL = "https://www.pinterest.com/resource/BoardFeedResource/get/"
@dataclass
class ShoppingPin:
pin_id: str
product_name: str
description: str
price: Optional[str]
currency: Optional[str]
retailer: Optional[str]
retailer_domain: Optional[str]
buy_url: Optional[str]
image_url: Optional[str]
save_count: int
availability: Optional[str]
condition: Optional[str]
brand: Optional[str]
@classmethod
def from_pin(cls, pin: Pin) -> "ShoppingPin":
rich = pin.rich_metadata
return cls(
pin_id=pin.id,
product_name=pin.product_name or pin.description[:120],
description=pin.description,
price=pin.price,
currency=pin.currency,
retailer=rich.get("site_name") or rich.get("site"),
retailer_domain=pin.domain,
buy_url=pin.source_url,
image_url=pin.image_url,
save_count=pin.save_count,
availability=rich.get("availability"),
condition=rich.get("condition"),
brand=rich.get("brand"),
)
@pinterest_retry
def _fetch_shopping_spotlight_page(
client: httpx.Client,
bookmark: str | None,
) -> tuple[list[dict], str | None]:
"""Fetch from the shopping spotlight feed (curated product discovery)."""
options: dict = {
"field_set_key": "react_grid_pin",
"page_size": 25,
"redux_normalize_feed": True,
}
if bookmark:
options["bookmarks"] = [bookmark]
params = build_params("/shop/", options)
resp = client.get(SHOPPING_RESOURCE_URL, params=params)
resp.raise_for_status()
body = resp.json()["resource_response"]
return body.get("data") or [], body.get("bookmark")
def iter_shopping_spotlight(
client: httpx.Client,
max_pages: int = 5,
) -> Iterator[ShoppingPin]:
"""Yield ShoppingPin objects from the global shopping spotlight feed."""
bookmark: str | None = None
for page in range(1, max_pages + 1):
raw_pins, bookmark = _fetch_shopping_spotlight_page(client, bookmark)
count = 0
for raw in raw_pins:
if not isinstance(raw, dict):
continue
try:
pin = Pin.from_raw(raw)
if pin.is_shopping or pin.price or pin.rich_metadata:
yield ShoppingPin.from_pin(pin)
count += 1
except Exception as exc:
print(f"[warn] Shopping pin parse error: {exc}")
print(f" Shopping spotlight page {page}: {count} product pins")
if not bookmark or bookmark == "-end-":
break
polite_delay()
def scrape_shopping(
max_pages: int = 5,
output_path: str = "shopping_pins.json",
) -> list[ShoppingPin]:
client = make_session()
print("Fetching Pinterest shopping spotlight feed...")
items: list[ShoppingPin] = []
for item in iter_shopping_spotlight(client, max_pages):
items.append(item)
with open(output_path, "w") as fh:
json.dump([asdict(i) for i in items], fh, indent=2, default=str)
print(f"\nDone. {len(items)} shopping pins written to {output_path}")
client.close()
return items
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--pages", type=int, default=5)
parser.add_argument("--output", default="shopping_pins.json")
args = parser.parse_args()
scrape_shopping(args.pages, args.output)
Sample shopping pin output:
{
"pin_id": "729384756102938",
"product_name": "Merino Wool Crew Neck Sweater — Camel",
"description": "The coziest winter staple. 100% Merino wool, relaxed fit.",
"price": "98.00",
"currency": "USD",
"retailer": "Everlane",
"retailer_domain": "everlane.com",
"buy_url": "https://www.everlane.com/products/mens-merino-crew-camel",
"image_url": "https://i.pinimg.com/originals/7c/2a/...",
"save_count": 3102,
"availability": "in stock",
"condition": "new",
"brand": "Everlane"
}
Section 8: Bulk Board Comparison (Pin Overlap Analysis)
This script takes two or more boards and computes overlap — pins that appear on multiple boards. Useful for understanding whether curators are drawing from the same source content, or for identifying "cornerstone" pins that spread across communities.
"""
scrape_board_compare.py
Compare pin overlap between two or more Pinterest boards.
Usage: python3 scrape_board_compare.py user1/board-a user2/board-b [user3/board-c ...] [--output report.json]
"""
from __future__ import annotations
import argparse
import json
from collections import Counter, defaultdict
from dataclasses import asdict
from typing import NamedTuple
from pinterest_base import Pin, make_session, polite_delay, build_params, pinterest_retry
from scrape_board_pins import iter_board_pins
import httpx
class BoardSpec(NamedTuple):
username: str
board_slug: str
@classmethod
def parse(cls, spec: str) -> "BoardSpec":
parts = spec.strip("/").split("/")
if len(parts) < 2:
raise ValueError(f"Expected user/board-slug, got: {spec}")
return cls(username=parts[0], board_slug=parts[1])
def __str__(self) -> str:
return f"{self.username}/{self.board_slug}"
def compute_overlap(
client: httpx.Client,
boards: list[BoardSpec],
) -> dict:
"""
For each board, collect all pin IDs. Then compute pairwise overlap.
Returns a report dict.
"""
board_pin_map: dict[str, set[str]] = {}
board_pins_full: dict[str, list[Pin]] = {}
pin_registry: dict[str, Pin] = {}
for spec in boards:
label = str(spec)
print(f"\nFetching board: {label}")
pin_ids: set[str] = set()
pins_list: list[Pin] = []
for pin in iter_board_pins(client, spec.username, spec.board_slug):
pin_ids.add(pin.id)
pins_list.append(pin)
pin_registry[pin.id] = pin
polite_delay(base=0.3, jitter=0.3)
board_pin_map[label] = pin_ids
board_pins_full[label] = pins_list
# Pairwise overlap
labels = list(board_pin_map.keys())
pairwise: list[dict] = []
for i in range(len(labels)):
for j in range(i + 1, len(labels)):
a, b = labels[i], labels[j]
overlap = board_pin_map[a] & board_pin_map[b]
union = board_pin_map[a] | board_pin_map[b]
jaccard = len(overlap) / len(union) if union else 0.0
pairwise.append({
"board_a": a,
"board_b": b,
"overlap_count": len(overlap),
"board_a_total": len(board_pin_map[a]),
"board_b_total": len(board_pin_map[b]),
"jaccard_similarity": round(jaccard, 4),
"overlap_pin_ids": sorted(overlap),
"shared_pins": [asdict(pin_registry[pid]) for pid in sorted(overlap)],
})
# Pins appearing across the most boards
pin_board_count = Counter()
pin_board_names: dict[str, list[str]] = defaultdict(list)
for label, ids in board_pin_map.items():
for pid in ids:
pin_board_count[pid] += 1
pin_board_names[pid].append(label)
most_shared = [
{
"pin_id": pid,
"board_count": count,
"boards": pin_board_names[pid],
"pin": asdict(pin_registry[pid]) if pid in pin_registry else None,
}
for pid, count in pin_board_count.most_common(20)
if count > 1
]
return {
"boards_analyzed": labels,
"board_sizes": {label: len(ids) for label, ids in board_pin_map.items()},
"pairwise_overlap": pairwise,
"most_shared_pins": most_shared,
"summary": {
"total_unique_pins": len(pin_registry),
"pins_on_multiple_boards": sum(1 for c in pin_board_count.values() if c > 1),
},
}
def scrape_compare(
board_specs: list[str],
output_path: str = "board_comparison.json",
) -> dict:
boards = [BoardSpec.parse(s) for s in board_specs]
client = make_session()
print(f"Comparing {len(boards)} boards: {', '.join(str(b) for b in boards)}")
report = compute_overlap(client, boards)
with open(output_path, "w") as fh:
json.dump(report, fh, indent=2, default=str)
print(f"\nComparison written to {output_path}")
print(f" Total unique pins: {report['summary']['total_unique_pins']}")
print(f" Pins on multiple boards: {report['summary']['pins_on_multiple_boards']}")
client.close()
return report
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Compare Pinterest board pin overlap")
parser.add_argument("boards", nargs="+", help="Board specs in user/board-slug format")
parser.add_argument("--output", default="board_comparison.json")
args = parser.parse_args()
scrape_compare(args.boards, args.output)
Anti-Detection Deep Dive
Getting blocked by Pinterest is frustrating and entirely avoidable with the right approach. Here is a systematic breakdown of every layer of detection and exactly how to handle each one.
CSRF Token Management
Pinterest uses a CSRF token for all state-modifying requests and for some authenticated reads. The token is delivered via Set-Cookie when you load any Pinterest page. Your session handles this automatically as long as you warm it up:
"""
csrf_manager.py
Manage Pinterest CSRF tokens for authenticated scraping.
"""
from __future__ import annotations
import re
import time
import httpx
def get_csrf_token(client: httpx.Client) -> str | None:
"""
Load the Pinterest homepage and extract the CSRF token from cookies
or the page HTML. Returns the token string or None.
"""
resp = client.get("https://www.pinterest.com/")
resp.raise_for_status()
# Method 1: from Set-Cookie (most reliable)
csrf = client.cookies.get("csrftoken")
if csrf:
return csrf
# Method 2: from page HTML (fallback)
match = re.search(r'"csrftoken"\s*:\s*"([^"]+)"', resp.text)
if match:
return match.group(1)
# Method 3: from response headers
for cookie in resp.headers.get_list("set-cookie"):
if "csrftoken=" in cookie:
token = cookie.split("csrftoken=")[1].split(";")[0]
return token
return None
def inject_csrf_header(client: httpx.Client) -> httpx.Client:
"""
Fetch a CSRF token and inject it as the X-CSRFToken header.
Call this before any write/authenticated operation.
"""
token = get_csrf_token(client)
if token:
client.headers["X-CSRFToken"] = token
print(f"[csrf] Token acquired: {token[:12]}...")
else:
print("[csrf] Warning: could not acquire CSRF token")
return client
def refresh_csrf_if_needed(client: httpx.Client, response: httpx.Response) -> bool:
"""
Call after a failed request. Returns True if token was refreshed.
403 responses often indicate an expired or missing CSRF token.
"""
if response.status_code == 403:
print("[csrf] 403 received — refreshing CSRF token")
time.sleep(2.0)
inject_csrf_header(client)
return True
return False
Cookie Warming and Session Persistence
A fresh session hitting the API immediately is a strong bot signal. Real users browse several pages before triggering API calls. The following pattern mimics that:
"""
session_warmer.py
Warm up a Pinterest session to reduce bot detection rate.
"""
from __future__ import annotations
import json
import random
import time
from pathlib import Path
import httpx
from pinterest_base import HEADERS
WARM_UP_URLS = [
"https://www.pinterest.com/",
"https://www.pinterest.com/ideas/",
"https://www.pinterest.com/ideas/home-decor/",
]
def warm_session(
client: httpx.Client,
extra_pages: int = 2,
) -> httpx.Client:
"""Load several pages to build cookie state before scraping."""
pages = WARM_UP_URLS[:extra_pages + 1]
for url in pages:
try:
resp = client.get(url)
resp.raise_for_status()
print(f"[warm] Loaded {url} — cookies: {len(client.cookies)}")
except httpx.HTTPError as exc:
print(f"[warm] Failed to load {url}: {exc}")
time.sleep(random.uniform(1.5, 3.0))
return client
def save_session(client: httpx.Client, path: str = "session_cookies.json") -> None:
"""Persist cookies to disk for reuse across runs."""
cookies = {name: value for name, value in client.cookies.items()}
with open(path, "w") as fh:
json.dump(cookies, fh, indent=2)
print(f"[session] Saved {len(cookies)} cookies to {path}")
def load_session(path: str = "session_cookies.json") -> httpx.Client | None:
"""Restore a previously saved session. Returns None if file not found."""
cookie_path = Path(path)
if not cookie_path.exists():
return None
with open(cookie_path) as fh:
cookies = json.load(fh)
client = httpx.Client(
headers=HEADERS,
follow_redirects=True,
timeout=httpx.Timeout(30.0),
cookies=cookies,
)
print(f"[session] Restored session with {len(cookies)} cookies from {path}")
return client
Browser Fingerprint Headers
The header stack you send is checked for internal consistency. Chrome 124 on macOS sends specific Sec-CH-* headers that older or non-browser clients omit. Missing these is a common detection vector:
"""
fingerprint_headers.py
Browser-consistent header sets for anti-detection scraping.
"""
from __future__ import annotations
import random
# Realistic Chrome 124 on macOS header set
CHROME_MACOS_HEADERS: dict[str, str] = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br, zstd",
"Sec-CH-UA": '"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',
"Sec-CH-UA-Mobile": "?0",
"Sec-CH-UA-Platform": '"macOS"',
"Sec-CH-UA-Platform-Version": '"14.4.0"',
"Sec-CH-UA-Arch": '"arm"',
"Sec-CH-UA-Full-Version": f'"124.0.{random.randint(6300, 6500)}.{random.randint(50, 200)}"',
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"Cache-Control": "max-age=0",
}
# Overrides for XHR/Fetch API calls (not page navigations)
API_HEADER_OVERRIDES: dict[str, str] = {
"Accept": "application/json, text/javascript, */*, q=0.01",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"X-Requested-With": "XMLHttpRequest",
"X-APP-VERSION": "b1e66c1",
"X-Pinterest-AppState": "active",
}
def get_api_headers(referer: str = "https://www.pinterest.com/") -> dict[str, str]:
"""Return consistent API headers for Pinterest resource calls."""
headers = {**CHROME_MACOS_HEADERS, **API_HEADER_OVERRIDES}
headers["Referer"] = referer
return headers
IP Reputation: Residential vs Datacenter Proxies
This is the most consequential factor in whether Pinterest blocks you. Datacenter IP ranges (AWS EC2, Google Cloud, DigitalOcean, Hetzner, Vultr) are well-known and aggressively flagged. Pinterest applies much stricter rate limits — often blocking after fewer than 10 requests per IP within a session — to datacenter addresses.
Residential IPs, by contrast, are assigned to real ISP subscribers and carry far less suspicion. Pinterest's systems treat them as potential real users.
For any scraping volume beyond casual testing, residential proxy rotation is not optional. ThorData offers a residential pool with geographic targeting, which matters because Pinterest also applies per-region rate limits — rotating through the same country repeatedly can still trigger flags. Using geo-diverse residential IPs from ThorData's pool distributes the request fingerprint across genuinely distinct network paths.
"""
proxy_rotation.py
Proxy configuration and rotation for Pinterest scraping.
"""
from __future__ import annotations
import itertools
import random
from typing import Optional
class ProxyRotator:
"""
Rotate through a list of proxy URLs, cycling on each request.
Supports sticky sessions (same proxy for N requests) for cookie coherence.
"""
def __init__(
self,
proxy_urls: list[str],
sticky_count: int = 10,
) -> None:
if not proxy_urls:
raise ValueError("Must provide at least one proxy URL")
self._proxies = proxy_urls
self._cycle = itertools.cycle(proxy_urls)
self._sticky_count = sticky_count
self._current: str = next(self._cycle)
self._uses = 0
def get(self) -> str:
"""Return current proxy, rotating after sticky_count uses."""
if self._uses >= self._sticky_count:
self._current = next(self._cycle)
self._uses = 0
self._uses += 1
return self._current
def rotate(self) -> str:
"""Force immediate rotation to next proxy."""
self._current = next(self._cycle)
self._uses = 0
return self._current
def as_httpx_proxies(self) -> dict[str, str]:
return {"all://": self.get()}
def build_thordata_url(
username: str,
password: str,
country: str = "US",
session_id: Optional[str] = None,
) -> str:
"""
Build a ThorData residential proxy URL.
ThorData: https://thordata.partnerstack.com/partner/0a0x4nzh
session_id: if set, uses sticky session routing (same exit IP for the session)
"""
host = "proxy.thordata.com"
port = 7777
user_part = f"{username}-country-{country}"
if session_id:
user_part += f"-session-{session_id}"
return f"http://{user_part}:{password}@{host}:{port}"
# Example usage:
# from proxy_rotation import build_thordata_url, ProxyRotator
#
# # Geo-diverse pool
# proxy_urls = [
# build_thordata_url("myuser", "mypass", "US"),
# build_thordata_url("myuser", "mypass", "GB"),
# build_thordata_url("myuser", "mypass", "CA"),
# build_thordata_url("myuser", "mypass", "AU"),
# ]
# rotator = ProxyRotator(proxy_urls, sticky_count=15)
Rate Limiting Detection and Adaptive Backoff
Pinterest returns HTTP 429 for rate limiting and HTTP 403 for session/bot flags. The following decorator handles both gracefully:
"""
adaptive_backoff.py
Detect rate limiting signals and back off adaptively.
"""
from __future__ import annotations
import random
import time
from functools import wraps
from typing import Callable, TypeVar
import httpx
T = TypeVar("T")
def adaptive_retry(
max_attempts: int = 5,
base_delay: float = 2.0,
max_delay: float = 120.0,
jitter: float = 0.4,
) -> Callable:
"""
Decorator that retries on 429/503 with exponential backoff + jitter.
On 403, rotates proxy and retries once.
"""
def decorator(fn: Callable) -> Callable:
@wraps(fn)
def wrapper(*args, **kwargs):
delay = base_delay
for attempt in range(1, max_attempts + 1):
try:
result = fn(*args, **kwargs)
return result
except httpx.HTTPStatusError as exc:
status = exc.response.status_code
if status == 429:
retry_after = exc.response.headers.get("Retry-After")
wait = float(retry_after) if retry_after else delay
jittered = wait + random.uniform(0, jitter * wait)
print(f"[rate-limit] 429 on attempt {attempt}. Waiting {jittered:.1f}s...")
time.sleep(min(jittered, max_delay))
delay = min(delay * 2, max_delay)
elif status == 503:
print(f"[rate-limit] 503 on attempt {attempt}. Waiting {delay:.1f}s...")
time.sleep(delay + random.uniform(0, jitter * delay))
delay = min(delay * 2, max_delay)
elif status == 403:
print(f"[rate-limit] 403 on attempt {attempt}. Possible bot flag.")
if attempt == 1:
time.sleep(delay * 3)
continue
raise
else:
raise
if attempt == max_attempts:
raise
return wrapper
return decorator
def request_with_jitter(
client: httpx.Client,
url: str,
params: dict,
base_delay: float = 1.5,
jitter_range: tuple[float, float] = (0.5, 2.0),
) -> httpx.Response:
"""
Make a GET request with pre-request jitter delay to mimic human timing.
"""
sleep_time = base_delay + random.uniform(*jitter_range)
time.sleep(sleep_time)
return client.get(url, params=params)
Data Storage: JSON, CSV, and SQLite
SQLite Schema
"""
pinterest_db.py
SQLite storage for Pinterest scraping results using sqlite-utils.
"""
from __future__ import annotations
from dataclasses import asdict
from pathlib import Path
from typing import Iterable
import sqlite_utils
from pinterest_base import Pin, Board, UserProfile, Comment
def get_db(db_path: str = "pinterest.db") -> sqlite_utils.Database:
"""Open or create the Pinterest SQLite database with full schema."""
db = sqlite_utils.Database(db_path)
# Pins table
if "pins" not in db.table_names():
db["pins"].create({
"id": str,
"description": str,
"image_url": str,
"save_count": int,
"comment_count": int,
"source_url": str,
"domain": str,
"created_at": str,
"board_id": str,
"pinner_username": str,
"is_shopping": int, # SQLite has no bool
"price": str,
"currency": str,
"product_name": str,
"scraped_at": str,
}, pk="id", not_null={"id"})
# Boards table
if "boards" not in db.table_names():
db["boards"].create({
"id": str,
"name": str,
"slug": str,
"url": str,
"description": str,
"pin_count": int,
"follower_count": int,
"owner_username": str,
"cover_image_url": str,
"category": str,
"created_at": str,
"scraped_at": str,
}, pk="id")
# Users table
if "users" not in db.table_names():
db["users"].create({
"id": str,
"username": str,
"full_name": str,
"bio": str,
"follower_count": int,
"following_count": int,
"board_count": int,
"pin_count": int,
"monthly_views": int,
"website_url": str,
"profile_image_url": str,
"is_verified_merchant": int,
"scraped_at": str,
}, pk="id")
# Comments table
if "comments" not in db.table_names():
db["comments"].create({
"id": str,
"pin_id": str,
"text": str,
"author_username": str,
"author_id": str,
"created_at": str,
"like_count": int,
}, pk="id", foreign_keys=[("pin_id", "pins", "id")])
return db
def insert_pins(db: sqlite_utils.Database, pins: Iterable[Pin]) -> int:
"""Insert or replace pins into the database. Returns count inserted."""
from datetime import datetime, timezone
now = datetime.now(timezone.utc).isoformat()
records = []
for pin in pins:
row = asdict(pin)
row.pop("rich_metadata", None) # not stored in flat table
row["is_shopping"] = int(row["is_shopping"])
row["scraped_at"] = now
records.append(row)
if records:
db["pins"].upsert_all(records, pk="id")
return len(records)
def insert_boards(db: sqlite_utils.Database, boards: Iterable[Board]) -> int:
from datetime import datetime, timezone
now = datetime.now(timezone.utc).isoformat()
records = [{"scraped_at": now, **asdict(b)} for b in boards]
if records:
db["boards"].upsert_all(records, pk="id")
return len(records)
def insert_user(db: sqlite_utils.Database, user: UserProfile) -> None:
from datetime import datetime, timezone
row = asdict(user)
row["is_verified_merchant"] = int(row["is_verified_merchant"])
row["scraped_at"] = datetime.now(timezone.utc).isoformat()
db["users"].upsert(row, pk="id")
def insert_comments(db: sqlite_utils.Database, comments: Iterable[Comment]) -> int:
records = [asdict(c) for c in comments]
if records:
db["comments"].upsert_all(records, pk="id")
return len(records)
def export_csv(db: sqlite_utils.Database, table: str, output_path: str) -> None:
"""Export a table to CSV."""
import csv
rows = list(db[table].rows)
if not rows:
print(f"[export] Table '{table}' is empty")
return
with open(output_path, "w", newline="") as fh:
writer = csv.DictWriter(fh, fieldnames=rows[0].keys())
writer.writeheader()
writer.writerows(rows)
print(f"[export] {len(rows)} rows written to {output_path}")
Complete End-to-End Pipeline Script
This script orchestrates everything: profile fetch, board listing, pin extraction for all boards, comment sampling, and storage to SQLite and CSV.
"""
pinterest_pipeline.py
End-to-end Pinterest data collection pipeline.
Fetches a user's profile, all boards, all pins, and sampled comments.
Stores everything in SQLite and exports CSVs.
Usage:
python3 pinterest_pipeline.py <username> [--proxy http://user:pass@host:port]
python3 pinterest_pipeline.py anthropologie --max-boards 5 --output-dir ./output
"""
from __future__ import annotations
import argparse
import os
from datetime import datetime, timezone
from pathlib import Path
from pinterest_base import make_session, polite_delay
from pinterest_db import get_db, insert_pins, insert_boards, insert_user, insert_comments, export_csv
from scrape_user_profile import fetch_user_profile, fetch_all_boards
from scrape_board_pins import iter_board_pins
from scrape_pin_comments import iter_pin_comments
def run_pipeline(
username: str,
proxy_url: str | None = None,
max_boards: int | None = None,
comment_sample_size: int = 3,
output_dir: str = "./output",
) -> None:
os.makedirs(output_dir, exist_ok=True)
db_path = os.path.join(output_dir, "pinterest.db")
db = get_db(db_path)
client = make_session(proxy_url=proxy_url)
started_at = datetime.now(timezone.utc).isoformat()
print(f"\n=== Pinterest Pipeline ===")
print(f"Target: @{username}")
print(f"Started: {started_at}")
print(f"Output: {output_dir}")
print()
# --- Step 1: User profile ---
print("[1/4] Fetching user profile...")
profile = fetch_user_profile(client, username)
insert_user(db, profile)
print(f" @{profile.username}: {profile.follower_count:,} followers, {profile.monthly_views:,} monthly views")
polite_delay()
# --- Step 2: Board listing ---
print("\n[2/4] Fetching boards...")
boards = fetch_all_boards(client, username)
if max_boards:
boards = boards[:max_boards]
insert_boards(db, boards)
print(f" {len(boards)} boards fetched")
polite_delay()
# --- Step 3: Pins for each board ---
print(f"\n[3/4] Fetching pins for {len(boards)} boards...")
all_pin_ids: list[str] = []
for i, board in enumerate(boards, 1):
print(f"\n Board {i}/{len(boards)}: '{board.name}' ({board.pin_count} pins expected)")
pins = []
for pin in iter_board_pins(client, board.owner_username, board.slug):
pins.append(pin)
all_pin_ids.append(pin.id)
n = insert_pins(db, pins)
print(f" Inserted {n} pins from '{board.name}'")
polite_delay(base=2.0, jitter=1.5)
# --- Step 4: Comment sampling ---
print(f"\n[4/4] Sampling comments from {comment_sample_size} high-save pins...")
top_pins = sorted(
(row for row in db["pins"].rows if row.get("comment_count", 0) > 0),
key=lambda r: r.get("comment_count", 0),
reverse=True,
)[:comment_sample_size]
total_comments = 0
for pin_row in top_pins:
pin_id = pin_row["id"]
print(f" Fetching comments for pin {pin_id} ({pin_row.get('comment_count', 0)} comments)...")
comments = list(iter_pin_comments(client, pin_id))
n = insert_comments(db, comments)
total_comments += n
polite_delay()
# --- Export CSVs ---
print("\n[export] Writing CSVs...")
export_csv(db, "pins", os.path.join(output_dir, "pins.csv"))
export_csv(db, "boards", os.path.join(output_dir, "boards.csv"))
export_csv(db, "users", os.path.join(output_dir, "users.csv"))
export_csv(db, "comments", os.path.join(output_dir, "comments.csv"))
# --- Summary ---
print(f"\n=== Pipeline Complete ===")
print(f" Profile: @{profile.username}")
print(f" Boards: {len(boards)}")
print(f" Pins: {len(all_pin_ids)}")
print(f" Comments: {total_comments}")
print(f" Database: {db_path}")
client.close()
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Pinterest full-profile data pipeline")
parser.add_argument("username", help="Pinterest username to scrape")
parser.add_argument("--proxy", help="Proxy URL (e.g. http://user:pass@host:port)")
parser.add_argument("--max-boards", type=int, help="Limit number of boards to scrape")
parser.add_argument("--comment-sample", type=int, default=3, help="Number of pins to fetch comments from")
parser.add_argument("--output-dir", default="./output", help="Directory for output files")
args = parser.parse_args()
run_pipeline(
username=args.username,
proxy_url=args.proxy,
max_boards=args.max_boards,
comment_sample_size=args.comment_sample,
output_dir=args.output_dir,
)
Troubleshooting Common Errors
401 Unauthorized on every request
Your session has no valid cookies. The homepage warm-up in make_session() is failing, likely because your IP is blocked or the proxy is not working. Verify connectivity: python3 -c "import httpx; print(httpx.get('https://www.pinterest.com/').status_code)". If you get a non-200, your IP or proxy needs to change.
403 Forbidden after N requests
This is Pinterest's bot detection triggering on your session. Common causes:
1. Missing or inconsistent Sec-CH-UA headers (add them via fingerprint_headers.py)
2. CSRF token expired (call inject_csrf_header() to refresh)
3. Datacenter IP (switch to residential — see ThorData)
4. Too many requests too fast (increase delays and add jitter)
429 Too Many Requests
You have exceeded Pinterest's rate limit for your IP. The adaptive_retry decorator handles this with backoff, but if 429s are constant, you need to slow down your request rate significantly or rotate to a fresh residential IP. A sustained rate of more than 1 request per second from a single IP will reliably trigger 429s.
KeyError: 'resource_response'
The response JSON does not match the expected envelope. This happens when:
- Pinterest returns an error page (HTML) instead of JSON — check response.headers["content-type"]
- The endpoint path has changed (Pinterest occasionally shuffles API versions)
- You are being served a CAPTCHA challenge page
Add this check to any response handler:
if response.headers.get("content-type", "").startswith("text/html"):
print("[error] Got HTML instead of JSON — possible CAPTCHA or block")
print(response.text[:500])
raise RuntimeError("Non-JSON response from Pinterest")
Bookmark loops (infinite pagination)
Occasionally Pinterest returns the same bookmark repeatedly, causing an infinite loop. Protect against this:
seen_bookmarks: set[str] = set()
while True:
pins, bookmark = fetch_page(client, bookmark)
# ... process pins ...
if not bookmark or bookmark == "-end-" or bookmark in seen_bookmarks:
break
seen_bookmarks.add(bookmark)
json.JSONDecodeError on API responses
Pinterest sometimes returns 204 No Content or empty bodies for boards with no pins. Check response.content before calling response.json():
if not response.content:
return [], None
body = response.json()
Image URLs returning 403
Pinterest image URLs are tied to session cookies for some content. If you are downloading images, do it within the same client session that fetched the pin data, not in a separate plain requests session.
Ethics and Legal Considerations
Pinterest's Terms of Service prohibit automated data collection. That is a contract between you and Pinterest, not a law. Whether scraping their public data is legal depends on jurisdiction and use case.
The US legal landscape. The hiQ v. LinkedIn case established that scraping publicly accessible data does not violate the Computer Fraud and Abuse Act (CFAA) because accessing public data requires no authorization to bypass. The Ninth Circuit affirmed this principle in 2022. Public Pinterest boards — visible to anyone without an account — sit in similar territory. However, this ruling is narrow and ongoing, and does not apply universally to all scraping contexts.
EU considerations. The GDPR applies when you collect personal data about EU residents. Pinterest usernames, profile images, and biographical data are personal data under GDPR. Collect only what you need, store it securely, do not publish or resell it without a legal basis, and have a documented purpose.
Copyright. The images on Pinterest are mostly copyrighted by their original creators. Collecting image URLs is fine. Downloading, rehosting, or republishing pin images without license is not.
Practical ethics. Beyond legality: do not scrape in a way that degrades the service for other users. Cache data aggressively so you fetch each piece once. Respect robots.txt signals even if they are not legally binding. If you are building something commercial on top of Pinterest data, consider whether you should be using an official data partnership instead.
The code in this guide is for research, analysis, and personal use. Production commercial applications built on scraped data carry legal and reputational risk that you own entirely.
Quick Reference: Resource API Cheat Sheet
| Use Case | Resource Name | Key Options |
|---|---|---|
| Board pins | BoardFeedResource |
board_url, page_size, bookmarks |
| User profile | UserResource |
username, field_set_key: "profile" |
| User's boards | BoardsResource |
username, sort, privacy_filter |
| Keyword search | BaseSearchResource |
query, scope: "pins" |
| Related pins | RelatedPinsResource |
pin_id, count |
| Single pin | PinResource |
id, field_set_key: "detailed" |
| Comments | AggregatedCommentResource |
objectId, objectType: "pin" |
| Trending | InterestFeedResource |
interest_id (category slug) |
| Shopping | ShoppingSpotlightFeedResource |
field_set_key |
All endpoints follow the same base pattern: GET https://www.pinterest.com/resource/<ResourceName>/get/?source_url=<path>&data=<json>
The data parameter is a JSON-encoded object with options and context keys. Options always include pagination via bookmarks: [<token>]. The response always wraps data in resource_response.data with the next page in resource_response.bookmark.
Keeping Your Scraper Working Over Time
Pinterest changes their internal API without notice. Endpoints that work today may return 404s next month, and new header requirements appear without warning. Here are habits that keep your scraper resilient:
Log raw responses. When a scraper breaks, the first thing you need is the actual server response. Log raw JSON to a file for at least a few days when running in production.
Monitor with a canary request. Before any large scraping run, do a single test fetch for one known-good board. If it fails, abort and debug before burning through your proxy quota.
Use DevTools as a reference. When something breaks, open Pinterest in Chrome, load the same page you are trying to scrape, and compare what headers and parameters the browser actually sends to what your script is sending. The browser is always right.
Track endpoint versions. The X-APP-VERSION header (b1e66c1 in the examples) is pinned to a specific Pinterest frontend build. Pinterest sometimes checks this. If you start seeing unexpected failures, open the Pinterest source and search for the current app version string.
Rotate user agents occasionally. Chrome releases a new major version every 6-8 weeks. A Chrome 124 user agent in late 2027 is a red flag. Keep your Sec-CH-UA and User-Agent headers in sync with a current Chrome release.