Scraping Vimeo Video Data with Python (2026)
Scraping Vimeo Video Data with Python (2026)
Vimeo is the platform of choice for creative professionals — filmmakers, agencies, and production studios use it for client deliverables, portfolio pieces, and premium video hosting. That makes it a useful data source for market research on the creative industry, competitive analysis of video performance, and building tools that track video engagement across creative niches.
Unlike YouTube, Vimeo has a well-documented, stable API that covers most of what you'd want from video metadata. The challenge is knowing which endpoint to use for which task, understanding the rate limit structure, and knowing where the API leaves off and direct scraping begins.
What Data Is Available
The Vimeo API provides clean access to:
Video metadata: - Title, description, duration (seconds), upload date - Privacy setting (public, private, password, unlisted, disable) - Tags and categories - Content rating - Download availability flag
Engagement statistics: - View count (plays) - Like count (via metadata.connections.likes.total) - Comment count (via metadata.connections.comments.total) - Save/bookmark count (via metadata.connections.saves.total)
Technical specifications: - Width and height of the source file - Frame rate - Encoding status - File sizes (if download enabled) - Available quality versions (360p through 4K)
Channel and showcase data: - Channel name, description, subscriber count - Video count in channel - Featured video selections - Channel creation date and URL
User and portfolio data: - Display name, bio, location - Follower and following count - Total video count - Membership tier (basic, plus, pro, premium, enterprise) - Account creation date
Embed configuration: - Full embed HTML snippet - Player color, autoplay, loop settings - Allowed embedding domains - Privacy embed settings
Comments: - Commenter display name and profile - Comment text (plain and formatted) - Reply threading - Timestamp
What is NOT in the API: - Analytics beyond view/like/comment counts (impressions, reach require creator auth) - Email addresses or direct contact info - Revenue or monetization data - Private video content without explicit sharing - Real-time viewer counts for live streams (separate Live API)
API vs Scraping: Decision Framework
Use the API when: - You have a Vimeo account and can get a personal access token - You need view counts, likes, comments, or technical specs - You're doing bulk collection across many videos or channels - You want structured, reliable data without parsing HTML
Fall back to scraping when: - You need aggregate stats visible on showcase/channel pages but not in API endpoints - You're doing quick oEmbed lookups for embed metadata without building an OAuth flow - You need to check if a video is accessible in a specific region - The API rate limits are constraining your collection speed
Setting Up the API Client
Vimeo uses OAuth 2.0. For personal scripts and read-only data, a Personal Access Token skips the full OAuth flow:
- Log into Vimeo, go to developer.vimeo.com/apps
- Create an app (name it anything)
- Under "Personal access tokens", click "Generate"
- Select "Public" and "Read" scopes for read-only data collection
- Copy the token — it's shown only once
import httpx
import time
from typing import Optional
VIMEO_TOKEN = "your_personal_access_token"
def vimeo_client(token: str = VIMEO_TOKEN) -> httpx.Client:
"""
Create an httpx client configured for Vimeo API v3.4.
The Accept header version is important — without it some endpoints
return deprecated response formats.
"""
return httpx.Client(
base_url="https://api.vimeo.com",
headers={
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.vimeo.*+json;version=3.4",
"Content-Type": "application/json",
},
timeout=20,
)
def vimeo_async_client(token: str = VIMEO_TOKEN) -> httpx.AsyncClient:
"""Async version of the Vimeo API client."""
return httpx.AsyncClient(
base_url="https://api.vimeo.com",
headers={
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.vimeo.*+json;version=3.4",
},
timeout=20,
)
Fetching Individual Video Metadata
The /videos/{video_id} endpoint returns complete metadata. The video ID is the numeric part of any Vimeo URL:
from dataclasses import dataclass, asdict
from typing import Optional
@dataclass
class VimeoVideo:
"""Structured representation of a Vimeo video."""
video_id: str
title: str
description: str
duration_sec: int
view_count: int
like_count: int
comment_count: int
save_count: int
upload_date: str
tags: list
categories: list
privacy: str
embed_url: str
thumbnail_url: str
width: Optional[int] = None
height: Optional[int] = None
content_rating: Optional[str] = None
def get_video(
video_id: str,
client: httpx.Client,
fields: str = None,
) -> VimeoVideo:
"""
Fetch full metadata for a single Vimeo video.
video_id: numeric ID from vimeo.com/NNNNNNNN
fields: optional comma-separated field selector to reduce response size
e.g. "uri,name,description,stats,created_time,tags,privacy,pictures"
"""
params = {}
if fields:
params["fields"] = fields
resp = client.get(f"/videos/{video_id}", params=params)
resp.raise_for_status()
d = resp.json()
meta = d.get("metadata", {}).get("connections", {})
pictures = d.get("pictures", {}).get("sizes", [])
thumbnail = pictures[-1].get("link", "") if pictures else ""
return VimeoVideo(
video_id=str(d["uri"].split("/")[-1]),
title=d["name"],
description=d.get("description") or "",
duration_sec=d.get("duration", 0),
view_count=d.get("stats", {}).get("plays", 0) or 0,
like_count=meta.get("likes", {}).get("total", 0),
comment_count=meta.get("comments", {}).get("total", 0),
save_count=meta.get("saves", {}).get("total", 0),
upload_date=d.get("created_time", ""),
tags=[t["name"] for t in d.get("tags", [])],
categories=[c["name"] for c in d.get("categories", [])],
privacy=d.get("privacy", {}).get("view", ""),
embed_url=d.get("link", ""),
thumbnail_url=thumbnail,
width=d.get("width"),
height=d.get("height"),
content_rating=d.get("content_rating", [None])[0] if d.get("content_rating") else None,
)
def scrape_video_list(
video_ids: list[str],
delay: float = 1.0,
fields: str = None,
) -> list[dict]:
"""
Scrape metadata for a list of video IDs.
delay: seconds to wait between requests (respect rate limits)
"""
results = []
with vimeo_client() as client:
for i, vid_id in enumerate(video_ids):
try:
video = get_video(vid_id, client, fields=fields)
results.append(asdict(video))
print(f" [{i+1}/{len(video_ids)}] {video.title[:50]} "
f"— {video.view_count:,} views")
except httpx.HTTPStatusError as e:
code = e.response.status_code
if code == 404:
print(f" {vid_id}: not found (private or deleted)")
elif code == 403:
print(f" {vid_id}: access denied (private video)")
elif code == 429:
print(f" Rate limited — waiting 60s")
time.sleep(60)
# Retry once
try:
video = get_video(vid_id, client, fields=fields)
results.append(asdict(video))
except Exception:
pass
else:
print(f" {vid_id}: HTTP {code}")
except Exception as e:
print(f" {vid_id}: {e}")
time.sleep(delay)
return results
Channel and User Data
Channels are curated video collections. Users are the account holders:
def get_channel(channel_id: str, client: httpx.Client) -> dict:
"""
Fetch metadata for a Vimeo channel.
channel_id: numeric ID or channel name slug
"""
resp = client.get(f"/channels/{channel_id}")
resp.raise_for_status()
d = resp.json()
meta = d.get("metadata", {}).get("connections", {})
return {
"channel_id": d["uri"].split("/")[-1],
"name": d["name"],
"description": d.get("description", ""),
"subscriber_count": meta.get("users", {}).get("total", 0),
"video_count": meta.get("videos", {}).get("total", 0),
"created_time": d.get("created_time", ""),
"url": d.get("link", ""),
"privacy": d.get("privacy", {}).get("view", ""),
}
def get_channel_videos(
channel_id: str,
client: httpx.Client,
max_videos: int = 100,
sort: str = "date",
fields: str = "uri,name,stats,created_time,duration,tags",
) -> list[dict]:
"""
Fetch video listing from a Vimeo channel.
sort: 'date', 'alphabetical', 'plays', 'likes', 'added', 'modified'
"""
videos = []
page = 1
per_page = min(25, max_videos)
while len(videos) < max_videos:
resp = client.get(
f"/channels/{channel_id}/videos",
params={
"page": page,
"per_page": per_page,
"sort": sort,
"direction": "desc",
"fields": fields,
},
)
resp.raise_for_status()
data = resp.json()
batch = data.get("data", [])
if not batch:
break
for v in batch:
videos.append({
"video_id": v["uri"].split("/")[-1],
"title": v["name"],
"views": v.get("stats", {}).get("plays", 0),
"duration_sec": v.get("duration", 0),
"created_time": v.get("created_time", ""),
"tags": [t["name"] for t in v.get("tags", [])],
})
# Check for next page via pagination object
if not data.get("paging", {}).get("next"):
break
page += 1
time.sleep(0.8)
return videos[:max_videos]
def get_user_videos(
user_id: str,
client: httpx.Client,
max_videos: int = 100,
sort: str = "date",
) -> list[dict]:
"""
Fetch all public videos from a user's portfolio.
user_id: numeric Vimeo user ID or 'me' for the authenticated user
"""
videos = []
next_url = f"/users/{user_id}/videos"
while len(videos) < max_videos and next_url:
resp = client.get(
next_url,
params={
"per_page": 25,
"sort": sort,
"direction": "desc",
"fields": "uri,name,stats,created_time,duration,privacy,tags",
} if "?" not in next_url else {},
)
resp.raise_for_status()
data = resp.json()
for v in data.get("data", []):
# Skip private videos (stats will be 0 for those anyway)
if v.get("privacy", {}).get("view") not in ("public", "anybody"):
continue
videos.append({
"video_id": v["uri"].split("/")[-1],
"title": v["name"],
"views": v.get("stats", {}).get("plays", 0),
"duration_sec": v.get("duration", 0),
"created_time": v.get("created_time", ""),
"privacy": v.get("privacy", {}).get("view"),
})
next_page = data.get("paging", {}).get("next")
next_url = next_page if next_page else None
time.sleep(0.5)
return videos[:max_videos]
Async Bulk Collection
For collecting data across many videos simultaneously:
import asyncio
import httpx
from typing import Optional
async def get_video_async(
video_id: str,
client: httpx.AsyncClient,
semaphore: asyncio.Semaphore,
) -> Optional[dict]:
"""Fetch a single video with semaphore-controlled concurrency."""
async with semaphore:
try:
resp = await client.get(
f"/videos/{video_id}",
params={"fields": "uri,name,description,stats,created_time,tags,privacy,pictures,duration"},
)
resp.raise_for_status()
d = resp.json()
meta = d.get("metadata", {}).get("connections", {})
pictures = d.get("pictures", {}).get("sizes", [])
return {
"video_id": video_id,
"title": d["name"],
"description": (d.get("description") or "")[:500],
"duration_sec": d.get("duration", 0),
"view_count": d.get("stats", {}).get("plays", 0) or 0,
"like_count": meta.get("likes", {}).get("total", 0),
"comment_count": meta.get("comments", {}).get("total", 0),
"upload_date": d.get("created_time", ""),
"tags": [t["name"] for t in d.get("tags", [])],
"privacy": d.get("privacy", {}).get("view", ""),
"thumbnail": pictures[-1].get("link", "") if pictures else "",
}
except httpx.HTTPStatusError as e:
if e.response.status_code in (404, 403):
return None # Skip inaccessible videos
if e.response.status_code == 429:
await asyncio.sleep(60)
return None
return None
except Exception:
return None
async def bulk_collect_videos(
video_ids: list[str],
token: str = VIMEO_TOKEN,
max_concurrency: int = 5,
) -> list[dict]:
"""
Collect metadata for many videos asynchronously.
max_concurrency: keep below 10 to stay within Vimeo's rate limits.
"""
semaphore = asyncio.Semaphore(max_concurrency)
results = []
async with vimeo_async_client(token) as client:
tasks = [
get_video_async(vid_id, client, semaphore)
for vid_id in video_ids
]
completed = await asyncio.gather(*tasks, return_exceptions=True)
for r in completed:
if isinstance(r, dict) and r:
results.append(r)
return results
# Collect 200 videos asynchronously
video_ids = ["76979871", "225408806", "366214187"] # example IDs
data = asyncio.run(bulk_collect_videos(video_ids, max_concurrency=5))
print(f"Collected {len(data)} videos")
oEmbed Endpoint for Embed Data
The oEmbed endpoint requires no authentication and is the fastest path to embed HTML and basic metadata:
import httpx
from urllib.parse import quote
def get_oembed(
video_url: str,
width: int = 1280,
autoplay: bool = False,
) -> dict:
"""
Fetch oEmbed data for a public Vimeo video.
No authentication required. Returns embed HTML and basic metadata.
Does NOT include view counts or engagement stats.
"""
encoded_url = quote(video_url, safe="")
resp = httpx.get(
f"https://vimeo.com/api/oembed.json?url={encoded_url}&width={width}",
timeout=15,
)
resp.raise_for_status()
d = resp.json()
return {
"title": d.get("title"),
"author_name": d.get("author_name"),
"author_url": d.get("author_url"),
"duration_sec": d.get("duration"),
"thumbnail_url": d.get("thumbnail_url"),
"thumbnail_width": d.get("thumbnail_width"),
"thumbnail_height": d.get("thumbnail_height"),
"video_width": d.get("width"),
"video_height": d.get("height"),
"embed_html": d.get("html"),
"video_id": d.get("video_id"),
}
# No token needed
data = get_oembed("https://vimeo.com/76979871")
print(data["title"])
print(data["embed_html"][:100])
Anti-Bot Measures
API rate limits. Vimeo throttles requests per access token, not per IP. The limit for standard API tiers is 1,000 requests per 15-minute window. For large-scale collection, distribute requests across multiple app registrations with different personal access tokens. The X-RateLimit-Remaining and X-RateLimit-Reset headers tell you your current status.
Cloudflare on web pages. Vimeo's web interface runs behind Cloudflare. Plain httpx or requests requests to vimeo.com/NNNNNN will hit JS challenges. For web page scraping (as opposed to API calls), use Playwright.
Residential proxies for bulk page scraping. If you're scraping Vimeo web pages at scale rather than using the API, datacenter IPs trigger Cloudflare challenges immediately. ThorData's rotating residential proxies work well for this — they have clean residential IPs that pass Cloudflare's reputation checks, and the geo-targeting is useful for verifying embed availability in specific regions.
import httpx
THORDATA_PROXY = "http://USER:[email protected]:9000"
def create_scraping_client(proxy: str = THORDATA_PROXY) -> httpx.Client:
"""
Create an httpx client for scraping Vimeo web pages.
Routes through residential proxy to avoid Cloudflare blocks.
"""
return httpx.Client(
transport=httpx.HTTPTransport(proxy=proxy),
headers={
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://vimeo.com/",
},
timeout=25,
follow_redirects=True,
)
def rate_limit_handler(resp: httpx.Response, retries: int = 3) -> None:
"""
Handle Vimeo API rate limit responses.
Check X-RateLimit headers and back off proactively.
"""
remaining = int(resp.headers.get("X-RateLimit-Remaining", "999"))
reset_ts = float(resp.headers.get("X-RateLimit-Reset", "0"))
if resp.status_code == 429:
wait = max(reset_ts - time.time(), 15)
print(f"Rate limited. Waiting {wait:.0f}s for reset...")
time.sleep(wait)
elif remaining < 50:
wait = max(reset_ts - time.time(), 0) + 5
print(f"Low rate limit ({remaining} remaining). Pausing {wait:.0f}s...")
time.sleep(wait)
SQLite Storage Schema
import sqlite3
import json
from datetime import datetime, timezone
def init_vimeo_db(db_path: str = "vimeo_data.db") -> sqlite3.Connection:
"""Initialize the Vimeo data SQLite database."""
conn = sqlite3.connect(db_path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS videos (
video_id TEXT PRIMARY KEY,
title TEXT,
description TEXT,
duration_sec INTEGER,
view_count INTEGER DEFAULT 0,
like_count INTEGER DEFAULT 0,
comment_count INTEGER DEFAULT 0,
save_count INTEGER DEFAULT 0,
upload_date TEXT,
tags TEXT, -- JSON array
categories TEXT, -- JSON array
privacy TEXT,
embed_url TEXT,
thumbnail_url TEXT,
width INTEGER,
height INTEGER,
content_rating TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS channels (
channel_id TEXT PRIMARY KEY,
name TEXT,
description TEXT,
subscriber_count INTEGER DEFAULT 0,
video_count INTEGER DEFAULT 0,
created_time TEXT,
url TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS channel_videos (
channel_id TEXT NOT NULL,
video_id TEXT NOT NULL,
added_at TEXT,
PRIMARY KEY (channel_id, video_id)
);
CREATE TABLE IF NOT EXISTS view_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
video_id TEXT NOT NULL,
view_count INTEGER,
like_count INTEGER,
comment_count INTEGER,
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_videos_views ON videos(view_count DESC);
CREATE INDEX IF NOT EXISTS idx_videos_date ON videos(upload_date);
CREATE INDEX IF NOT EXISTS idx_view_history_video ON view_history(video_id, recorded_at);
CREATE INDEX IF NOT EXISTS idx_channel_videos ON channel_videos(channel_id);
""")
conn.commit()
return conn
def upsert_video(conn: sqlite3.Connection, video: dict):
"""Insert or update a video record, tracking view count history."""
# Record view snapshot for trending analysis
if video.get("view_count"):
conn.execute("""
INSERT INTO view_history (video_id, view_count, like_count, comment_count)
VALUES (?, ?, ?, ?)
""", (
video["video_id"],
video.get("view_count", 0),
video.get("like_count", 0),
video.get("comment_count", 0),
))
conn.execute("""
INSERT INTO videos
(video_id, title, description, duration_sec, view_count, like_count,
comment_count, save_count, upload_date, tags, categories, privacy,
embed_url, thumbnail_url, width, height, content_rating, scraped_at, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(video_id) DO UPDATE SET
view_count = excluded.view_count,
like_count = excluded.like_count,
comment_count = excluded.comment_count,
updated_at = excluded.updated_at
""", (
video.get("video_id"),
video.get("title"),
video.get("description"),
video.get("duration_sec"),
video.get("view_count", 0),
video.get("like_count", 0),
video.get("comment_count", 0),
video.get("save_count", 0),
video.get("upload_date"),
json.dumps(video.get("tags", [])),
json.dumps(video.get("categories", [])),
video.get("privacy"),
video.get("embed_url"),
video.get("thumbnail_url"),
video.get("width"),
video.get("height"),
video.get("content_rating"),
datetime.now(timezone.utc).isoformat(),
datetime.now(timezone.utc).isoformat(),
))
conn.commit()
def get_top_videos(
conn: sqlite3.Connection,
limit: int = 20,
metric: str = "view_count",
min_duration: int = 0,
) -> list:
"""Get top videos sorted by metric."""
valid_metrics = {"view_count", "like_count", "comment_count", "save_count"}
if metric not in valid_metrics:
metric = "view_count"
return conn.execute(f"""
SELECT video_id, title, {metric}, duration_sec, upload_date, tags
FROM videos
WHERE duration_sec >= ?
ORDER BY {metric} DESC
LIMIT ?
""", (min_duration, limit)).fetchall()
def get_view_growth_rate(
conn: sqlite3.Connection,
video_id: str,
days: int = 30,
) -> dict:
"""Calculate view growth rate for a video over the last N days."""
rows = conn.execute("""
SELECT view_count, recorded_at
FROM view_history
WHERE video_id = ?
AND recorded_at >= datetime('now', ? || ' days')
ORDER BY recorded_at
""", (video_id, f"-{days}")).fetchall()
if len(rows) < 2:
return {"video_id": video_id, "insufficient_data": True}
first_count = rows[0][0]
last_count = rows[-1][0]
growth = last_count - first_count
return {
"video_id": video_id,
"views_start": first_count,
"views_end": last_count,
"growth": growth,
"growth_pct": round((growth / first_count) * 100, 2) if first_count > 0 else 0,
"data_points": len(rows),
}
Bulk Export: JSON and CSV
import json
import csv
from datetime import datetime
def export_videos(
videos: list[dict],
filename: str = "vimeo_data",
formats: list[str] = None,
):
"""
Export video data to JSON and/or CSV.
formats: list containing 'json' and/or 'csv'. Defaults to both.
"""
if not formats:
formats = ["json", "csv"]
if "json" in formats:
with open(f"{filename}.json", "w", encoding="utf-8") as f:
json.dump(videos, f, indent=2, default=str)
print(f"Saved {len(videos)} videos to {filename}.json")
if "csv" in formats and videos:
# Flatten list fields to strings for CSV
flat = []
for v in videos:
row = {}
for k, val in v.items():
if isinstance(val, list):
row[k] = ", ".join(str(x) for x in val)
else:
row[k] = val
flat.append(row)
with open(f"{filename}.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=flat[0].keys())
writer.writeheader()
writer.writerows(flat)
print(f"Saved {len(flat)} videos to {filename}.csv")
Practical API Tips
Use the fields parameter. Vimeo's API supports field selection that dramatically reduces response payload size and speeds up bulk collection:
# Only fetch the fields you actually need
resp = client.get(
f"/videos/{video_id}",
params={"fields": "uri,name,stats,created_time,tags,privacy,pictures"},
)
Follow the paging.next URL. When paginating channel or user video lists, follow the next URL from the paging object directly instead of incrementing page numbers. If items are added or removed during your collection, following next handles this correctly.
mrv parameter for efficient refresh. When you want to check if a video's stats have changed since your last scrape, you don't need to re-fetch everything. The /videos/{id} endpoint always returns fresh data — just cache the last updated_at value and compare.
Handle 403 vs 404 carefully. A 404 means the video doesn't exist or has been deleted. A 403 means it exists but you don't have permission (private, password, or domain-restricted). Track these separately — a 403 video might become accessible later if privacy settings change.
Password-protected videos. Pass the password as a query parameter: /videos/{id}?password=thepassword. This only works if you actually know the password.
Private videos with explicit sharing. If a video owner has shared a private video directly with you (by Vimeo account), that video will be accessible with your personal access token even though it's private.
Use Cases
Creative agency competitive analysis. Track the top 50 production agencies on Vimeo. Monitor their video publication rate, view growth, and engagement trends. Identify which types of work (commercials, documentaries, short films) generate the most views, which informs content strategy.
Video performance benchmarking. Collect data across a category (e.g., all videos tagged "motion graphics") and build a distribution of view counts, like rates, and comment rates. This gives you a benchmark against which to evaluate any individual video's performance.
Embed availability checking. Use the oEmbed endpoint to verify that specific videos are publicly accessible and embeddable before building integrations. Useful for content curation tools that aggregate Vimeo videos from multiple creators.
Creator talent identification. Find emerging creators by tracking channels with rapidly growing subscriber counts and high per-video engagement rates relative to their current audience size.
Vimeo's API Is One of the Good Ones
Vimeo's API is developer-friendly in ways that YouTube's has never been: accurate documentation, consistent response structures, sane rate limits, and a personal access token flow that doesn't require a full OAuth redirect dance for simple read-only scripts. The oEmbed endpoint is especially useful for quick integrations.
Use the API for view counts and bulk metadata collection. Fall back to page scraping with Playwright and ThorData residential proxies only when you genuinely need data the API doesn't expose — regional availability checks, showcase page aggregates, or embed configurations visible only in the rendered page. For the vast majority of Vimeo data collection tasks, the API is the right tool.