Scraping Dailymotion Video Data and Channel Stats with Python (2026)
Scraping Dailymotion Video Data and Channel Stats with Python (2026)
Dailymotion flies under the radar compared to YouTube, but it remains one of the world's largest video platforms with over 400 million unique monthly visitors. For data projects — competitive analysis in the video space, content trend tracking, media monitoring, journalism research — Dailymotion is an underexploited source. The platform hosts significant news and sports content that doesn't appear on YouTube due to licensing agreements, particularly European broadcast media.
The good news: Dailymotion still maintains a public Data API that's more generous than most video platforms. The less good news: the API has quirks, undocumented rate limits, and some endpoints return inconsistent data. Geographic content restrictions are a significant complication for cross-market research. This guide covers how to work with the API effectively, handle its limitations, and supplement it with direct page scraping where the API falls short.
What Data Is Available
Through the Dailymotion Data API and page scraping combined:
Video-level data: - Title, description, tags, duration (seconds) - Creation date, last publication date - View counts: total, last hour, last 24 hours, last week - Like counts, bookmark counts - Thumbnail URLs in multiple resolutions - Language and country tags - Channel name and owner info - Content moderation status, adult content flag - Direct embed code and player URL
Channel data: - Subscriber (fan) count - Total video count - Total view count across all videos - Channel description and creation date - Verified badge status - Country and language preferences
Trending data: - Videos trending in each country - Category-level trending - New video feeds by category
Search: - Full text search with sort and filter options - Filter by duration, upload date, language, country - Sort by relevance, date, views, likes
Why Geographic Data Varies So Much
Dailymotion has significant content licensing agreements with major media companies. A soccer highlights clip from a French broadcaster might be available on Dailymotion in France but blocked everywhere else. American news broadcasts may geo-restrict their Dailymotion uploads to US viewers.
This isn't just about available videos — it affects the view count data too. Dailymotion historically reports country-regional view counts for some content categories, and the trending lists vary substantially by country. If you're doing cross-market research, you need to collect data from multiple geographic vantage points.
Anti-Bot Measures
Dailymotion's defenses are moderate compared to platforms like YouTube or Instagram, but they exist:
Rate limiting. The API enforces rate limits per IP and per API key. Unauthenticated requests are limited to roughly 600 requests per 10-minute window. Authenticated requests get approximately 5,000 per 10-minute window. Exceeding the limit returns HTTP 403 with error code limit_reached.
API key requirement for bulk operations. While individual video lookups work without an API key, search and listing endpoints require one for paginated results beyond the first page. Registration at the Dailymotion Partner HQ is free.
Cloudflare protection on the website. The Data API itself sits behind light protection, but the main Dailymotion website uses Cloudflare with JavaScript challenges. Direct HTML scraping of dailymotion.com requires browser automation or challenge-solving.
Geographic restrictions. Some videos and API responses are filtered by the IP's geographic location. A video visible from France may return 404 from a US IP. Particularly common for sports and news content.
Referrer checking. Certain embed and player endpoints validate the referrer header. Requests without a plausible referrer get empty responses.
Setting Up API Access
Register an application at the Dailymotion Partner HQ (https://developer.dailymotion.com/api/) to get credentials:
import httpx
import time
import json
import sqlite3
import random
from dataclasses import dataclass, field
from typing import Optional
import threading
API_KEY = "your_api_key"
API_SECRET = "your_api_secret"
BASE_URL = "https://api.dailymotion.com"
class DailymotionClient:
"""
Thread-safe Dailymotion API client with automatic token refresh.
"""
def __init__(self, api_key: str, api_secret: str):
self.api_key = api_key
self.api_secret = api_secret
self._token: Optional[str] = None
self._token_expires: float = 0
self._lock = threading.Lock()
self._session = httpx.Client(
base_url=BASE_URL,
timeout=30,
headers={"User-Agent": "DailymotionDataBot/1.0"},
)
def _refresh_token(self):
resp = httpx.post(
f"{BASE_URL}/oauth/token",
data={
"grant_type": "client_credentials",
"client_id": self.api_key,
"client_secret": self.api_secret,
},
timeout=15,
)
resp.raise_for_status()
data = resp.json()
self._token = data["access_token"]
self._token_expires = time.time() + data.get("expires_in", 3600) - 60
def _get_token(self) -> str:
with self._lock:
if not self._token or time.time() >= self._token_expires:
self._refresh_token()
return self._token
def get(
self,
path: str,
params: dict = None,
use_auth: bool = True,
proxy: Optional[str] = None,
retries: int = 3,
) -> dict:
"""Make an authenticated GET request with retry logic."""
full_params = dict(params or {})
if use_auth:
full_params["access_token"] = self._get_token()
for attempt in range(retries):
try:
if proxy:
with httpx.Client(
base_url=BASE_URL,
proxy=proxy,
timeout=30,
) as proxy_client:
resp = proxy_client.get(path, params=full_params)
else:
resp = self._session.get(path, params=full_params)
if resp.status_code == 200:
return resp.json()
elif resp.status_code == 403:
error_data = resp.json()
error_code = error_data.get("error_data", {}).get("code", "")
if error_code == "limit_reached":
wait = 60 * (attempt + 1)
print(f"Rate limit reached. Waiting {wait}s...")
time.sleep(wait)
continue
elif "oauth2" in str(error_data).lower():
# Token expired, refresh and retry
self._token = None
full_params["access_token"] = self._get_token()
continue
else:
return {} # Video not accessible from this region
elif resp.status_code == 404:
return {}
elif resp.status_code == 503:
time.sleep(10 * (attempt + 1))
continue
else:
resp.raise_for_status()
except httpx.TimeoutException:
if attempt == retries - 1:
raise
time.sleep(5 * (attempt + 1))
return {}
def close(self):
self._session.close()
Fetching Video Metadata
The /video/{id} endpoint accepts a fields parameter:
@dataclass
class DailymotionVideo:
video_id: str
title: str
description: str
tags: list
duration_seconds: int
views_total: int
views_last_hour: int
views_last_24h: int
views_last_week: int
likes_total: int
bookmarks_total: int
created_time: int
updated_time: int
thumbnail_url: str
thumbnail_720_url: str
language: str
country: str
channel_name: str
owner_screenname: str
owner_fans_total: int
is_explicit: bool = False
is_private: bool = False
status: str = "published"
VIDEO_FIELDS = [
"id", "title", "description", "tags", "duration",
"views_total", "views_last_hour", "views_last_24h", "views_last_week",
"likes_total", "bookmarks_total",
"created_time", "updated_time",
"thumbnail_url", "thumbnail_720_url",
"language", "country", "channel.name",
"owner.screenname", "owner.fans_total",
"explicit", "private", "status",
"embed_url", "short_url",
]
def get_video(client: DailymotionClient, video_id: str, proxy: Optional[str] = None) -> Optional[DailymotionVideo]:
"""Fetch full metadata for a single video."""
data = client.get(
f"/video/{video_id}",
params={"fields": ",".join(VIDEO_FIELDS)},
proxy=proxy,
)
if not data or "id" not in data:
return None
return DailymotionVideo(
video_id=data.get("id", video_id),
title=data.get("title", ""),
description=(data.get("description") or "")[:1000],
tags=data.get("tags", []),
duration_seconds=data.get("duration", 0),
views_total=data.get("views_total", 0),
views_last_hour=data.get("views_last_hour", 0),
views_last_24h=data.get("views_last_24h", 0),
views_last_week=data.get("views_last_week", 0),
likes_total=data.get("likes_total", 0),
bookmarks_total=data.get("bookmarks_total", 0),
created_time=data.get("created_time", 0),
updated_time=data.get("updated_time", 0),
thumbnail_url=data.get("thumbnail_url", ""),
thumbnail_720_url=data.get("thumbnail_720_url", ""),
language=data.get("language", ""),
country=data.get("country", ""),
channel_name=data.get("channel.name", "") or data.get("channel", {}).get("name", ""),
owner_screenname=data.get("owner.screenname", "") or data.get("owner", {}).get("screenname", ""),
owner_fans_total=data.get("owner.fans_total", 0) or 0,
is_explicit=data.get("explicit", False),
is_private=data.get("private", False),
status=data.get("status", "published"),
)
def search_videos(
client: DailymotionClient,
query: str,
sort: str = "relevance",
limit: int = 100,
language: Optional[str] = None,
country: Optional[str] = None,
created_after: Optional[int] = None,
proxy: Optional[str] = None,
delay: float = 1.2,
) -> list[DailymotionVideo]:
"""
Search Dailymotion for videos.
sort: 'relevance', 'recent', 'visited' (views), 'rating', 'trending'
"""
fields = [
"id", "title", "duration", "views_total", "views_last_24h",
"created_time", "thumbnail_url", "owner.screenname", "channel.name",
"language", "country", "tags",
]
results = []
page = 1
while len(results) < limit:
params = {
"search": query,
"fields": ",".join(fields),
"sort": sort,
"page": page,
"limit": min(100, limit - len(results)),
}
if language:
params["language"] = language
if country:
params["country"] = country
if created_after:
params["created_after"] = created_after
data = client.get("/videos", params=params, proxy=proxy)
items = data.get("list", [])
if not items:
break
for item in items:
video = DailymotionVideo(
video_id=item.get("id", ""),
title=item.get("title", ""),
description="",
tags=item.get("tags", []),
duration_seconds=item.get("duration", 0),
views_total=item.get("views_total", 0),
views_last_hour=0,
views_last_24h=item.get("views_last_24h", 0),
views_last_week=0,
likes_total=0,
bookmarks_total=0,
created_time=item.get("created_time", 0),
updated_time=0,
thumbnail_url=item.get("thumbnail_url", ""),
thumbnail_720_url="",
language=item.get("language", ""),
country=item.get("country", ""),
channel_name=item.get("channel.name", "") or "",
owner_screenname=item.get("owner.screenname", "") or "",
owner_fans_total=0,
)
results.append(video)
if not data.get("has_more"):
break
page += 1
time.sleep(delay)
return results[:limit]
Channel Scraping
Pull a channel's metadata and iterate through its video library:
@dataclass
class DailymotionChannel:
screenname: str
fans_total: int
videos_total: int
views_total: int
description: str
created_time: int
verified: bool
country: str
url: str
CHANNEL_FIELDS = [
"screenname", "fans_total", "videos_total",
"views_total", "description", "created_time",
"verified", "country", "url",
"avatar_720_url", "cover_url",
]
def get_channel_stats(
client: DailymotionClient,
channel_name: str,
proxy: Optional[str] = None,
) -> Optional[DailymotionChannel]:
"""Get statistics for a Dailymotion channel."""
data = client.get(
f"/user/{channel_name}",
params={"fields": ",".join(CHANNEL_FIELDS)},
proxy=proxy,
)
if not data or "screenname" not in data:
return None
return DailymotionChannel(
screenname=data.get("screenname", channel_name),
fans_total=data.get("fans_total", 0),
videos_total=data.get("videos_total", 0),
views_total=data.get("views_total", 0),
description=(data.get("description") or "")[:500],
created_time=data.get("created_time", 0),
verified=data.get("verified", False),
country=data.get("country", ""),
url=data.get("url", ""),
)
def get_channel_videos(
client: DailymotionClient,
channel_name: str,
max_videos: int = 500,
sort: str = "recent",
proxy: Optional[str] = None,
delay: float = 1.0,
) -> list[DailymotionVideo]:
"""
Retrieve all videos from a channel.
sort: 'recent', 'visited', 'rating', 'relevance', 'random'
"""
fields = [
"id", "title", "duration", "views_total", "views_last_24h",
"likes_total", "created_time", "tags",
"thumbnail_url", "language", "country", "status",
]
videos = []
page = 1
while len(videos) < max_videos:
data = client.get(
f"/user/{channel_name}/videos",
params={
"fields": ",".join(fields),
"page": page,
"limit": 100,
"sort": sort,
},
proxy=proxy,
)
if not data or data.get("error"):
break
items = data.get("list", [])
if not items:
break
for item in items:
videos.append(DailymotionVideo(
video_id=item.get("id", ""),
title=item.get("title", ""),
description="",
tags=item.get("tags", []),
duration_seconds=item.get("duration", 0),
views_total=item.get("views_total", 0),
views_last_hour=0,
views_last_24h=item.get("views_last_24h", 0),
views_last_week=0,
likes_total=item.get("likes_total", 0),
bookmarks_total=0,
created_time=item.get("created_time", 0),
updated_time=0,
thumbnail_url=item.get("thumbnail_url", ""),
thumbnail_720_url="",
language=item.get("language", ""),
country=item.get("country", ""),
channel_name=channel_name,
owner_screenname=channel_name,
owner_fans_total=0,
status=item.get("status", "published"),
))
if not data.get("has_more"):
break
page += 1
time.sleep(delay)
return videos[:max_videos]
def get_similar_channels(
client: DailymotionClient,
channel_name: str,
limit: int = 10,
) -> list[dict]:
"""Find channels similar to a given one."""
data = client.get(
f"/user/{channel_name}/subscriptions",
params={"fields": "screenname,fans_total,videos_total,verified", "limit": limit},
)
return data.get("list", [])
Trending Videos by Country
Dailymotion's trending endpoint is essential for media monitoring:
AVAILABLE_COUNTRIES = [
"us", "fr", "de", "gb", "es", "it", "br", "mx", "ar", "in",
"au", "ca", "nl", "be", "ch", "pl", "ru", "jp", "kr", "tr",
]
AVAILABLE_CHANNELS = [
"news", "sport", "fun", "music", "videogames",
"tech", "travel", "animals", "auto", "film",
]
def get_trending_videos(
client: DailymotionClient,
country: str = "us",
channel: Optional[str] = None,
limit: int = 50,
proxy: Optional[str] = None,
) -> list[dict]:
"""
Get trending videos for a specific country and optional category.
country: ISO 3166-1 alpha-2 code in lowercase
channel: category name (see AVAILABLE_CHANNELS)
"""
fields = [
"id", "title", "views_total", "views_last_24h", "views_last_week",
"duration", "created_time", "owner.screenname",
"thumbnail_720_url", "language", "channel.name",
]
params = {
"fields": ",".join(fields),
"country": country,
"limit": min(100, limit),
"sort": "trending",
}
if channel:
params["channel"] = channel
data = client.get("/videos", params=params, proxy=proxy)
results = []
for i, item in enumerate(data.get("list", [])[:limit]):
results.append({
"rank": i + 1,
"country": country,
"channel": channel or "all",
"video_id": item.get("id", ""),
"title": item.get("title", ""),
"views_total": item.get("views_total", 0),
"views_last_24h": item.get("views_last_24h", 0),
"views_last_week": item.get("views_last_week", 0),
"duration_seconds": item.get("duration", 0),
"created_time": item.get("created_time", 0),
"owner": item.get("owner.screenname", ""),
"channel_name": item.get("channel.name", ""),
"thumbnail_url": item.get("thumbnail_720_url", ""),
})
return results
def get_multi_country_trending(
client: DailymotionClient,
countries: list[str] = None,
channel: Optional[str] = None,
limit_per_country: int = 20,
delay: float = 1.5,
proxy_map: Optional[dict[str, str]] = None,
) -> dict[str, list[dict]]:
"""
Collect trending videos across multiple countries.
proxy_map: dict mapping country code -> proxy URL for geo-targeting.
"""
if countries is None:
countries = ["us", "fr", "de", "gb", "es", "it", "br"]
all_trending = {}
for country in countries:
proxy = proxy_map.get(country) if proxy_map else None
print(f"Fetching trending for {country.upper()}...")
try:
trending = get_trending_videos(
client,
country=country,
channel=channel,
limit=limit_per_country,
proxy=proxy,
)
all_trending[country] = trending
print(f" Got {len(trending)} trending videos")
except Exception as e:
print(f" Error for {country}: {e}")
all_trending[country] = []
time.sleep(delay)
return all_trending
Proxy Configuration for Geo-Restricted Content
Geographic restrictions are the primary reason you'd need proxies for Dailymotion scraping. Content libraries vary significantly by country, especially for news and sports.
ThorData's residential proxies offer country-level targeting — route French content requests through French IPs, German content through German IPs. This isn't about evading bot detection (the API is fairly permissive); it's about seeing the same content catalog that users in each country see.
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "proxy.thordata.com"
THORDATA_PORT = 9001
def make_country_proxy(country_code: str) -> str:
"""
Create a ThorData proxy URL targeting a specific country.
country_code: ISO 3166-1 alpha-2 (e.g., 'fr', 'de', 'us')
"""
# ThorData country targeting via session label
session_label = f"dm-{country_code}-{random.randint(1000, 9999)}"
return (
f"http://{THORDATA_USER}-country-{country_code}-session-{session_label}"
f":{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"
)
# Pre-build proxy map for target countries
COUNTRY_PROXIES = {
country: make_country_proxy(country)
for country in ["us", "fr", "de", "gb", "es", "it", "br", "au"]
}
def get_video_with_geo(
client: DailymotionClient,
video_id: str,
target_country: str = "us",
) -> Optional[DailymotionVideo]:
"""
Fetch video metadata using a country-specific proxy.
Useful for checking whether geo-restricted content is available.
"""
proxy = make_country_proxy(target_country)
video = get_video(client, video_id, proxy=proxy)
return video
def check_video_geo_availability(
client: DailymotionClient,
video_id: str,
countries: list[str] = None,
) -> dict[str, bool]:
"""
Check video availability across multiple countries.
Returns dict: country_code -> available (bool)
"""
if countries is None:
countries = ["us", "fr", "de", "gb", "es"]
availability = {}
for country in countries:
proxy = make_country_proxy(country)
try:
data = client.get(
f"/video/{video_id}",
params={"fields": "id,status"},
proxy=proxy,
)
availability[country] = bool(data.get("id"))
except Exception:
availability[country] = False
time.sleep(0.5)
return availability
Storage Schema and Analytics
def setup_dailymotion_db(db_path: str = "dailymotion.db") -> sqlite3.Connection:
conn = sqlite3.connect(db_path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA synchronous=NORMAL")
conn.executescript("""
CREATE TABLE IF NOT EXISTS videos (
video_id TEXT PRIMARY KEY,
title TEXT,
description TEXT,
tags TEXT,
channel_name TEXT,
owner_screenname TEXT,
owner_fans_total INTEGER,
duration_seconds INTEGER,
created_time INTEGER,
language TEXT,
country TEXT,
is_explicit INTEGER DEFAULT 0,
status TEXT DEFAULT 'published',
scraped_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS video_stats (
id INTEGER PRIMARY KEY AUTOINCREMENT,
video_id TEXT NOT NULL,
views_total INTEGER,
views_last_hour INTEGER,
views_last_24h INTEGER,
views_last_week INTEGER,
likes_total INTEGER,
bookmarks_total INTEGER,
snapshot_date TEXT DEFAULT (date('now')),
captured_at TEXT DEFAULT (datetime('now')),
FOREIGN KEY (video_id) REFERENCES videos(video_id),
UNIQUE (video_id, snapshot_date)
);
CREATE TABLE IF NOT EXISTS channels (
screenname TEXT PRIMARY KEY,
fans_total INTEGER,
videos_total INTEGER,
views_total INTEGER,
description TEXT,
created_time INTEGER,
verified INTEGER DEFAULT 0,
country TEXT,
url TEXT,
scraped_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS trending_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
country TEXT,
channel_name TEXT,
rank_position INTEGER,
video_id TEXT,
title TEXT,
views_last_24h INTEGER,
owner_screenname TEXT,
snapshot_date TEXT DEFAULT (date('now')),
captured_at TEXT DEFAULT (datetime('now')),
FOREIGN KEY (video_id) REFERENCES videos(video_id)
);
CREATE INDEX IF NOT EXISTS idx_videos_channel ON videos(channel_name);
CREATE INDEX IF NOT EXISTS idx_videos_created ON videos(created_time);
CREATE INDEX IF NOT EXISTS idx_stats_video ON video_stats(video_id);
CREATE INDEX IF NOT EXISTS idx_stats_date ON video_stats(snapshot_date);
CREATE INDEX IF NOT EXISTS idx_trending_country ON trending_snapshots(country, snapshot_date);
""")
conn.commit()
return conn
def save_video(conn: sqlite3.Connection, video: DailymotionVideo):
"""Save video metadata and a stats snapshot."""
conn.execute("""
INSERT OR IGNORE INTO videos
(video_id, title, description, tags, channel_name, owner_screenname,
owner_fans_total, duration_seconds, created_time, language, country,
is_explicit, status)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
video.video_id, video.title, video.description[:500] if video.description else "",
json.dumps(video.tags), video.channel_name, video.owner_screenname,
video.owner_fans_total, video.duration_seconds, video.created_time,
video.language, video.country, int(video.is_explicit), video.status,
))
# Record stats snapshot (one per day)
conn.execute("""
INSERT OR REPLACE INTO video_stats
(video_id, views_total, views_last_hour, views_last_24h, views_last_week,
likes_total, bookmarks_total)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (
video.video_id, video.views_total, video.views_last_hour,
video.views_last_24h, video.views_last_week,
video.likes_total, video.bookmarks_total,
))
conn.commit()
def save_channel(conn: sqlite3.Connection, channel: DailymotionChannel):
conn.execute("""
INSERT OR REPLACE INTO channels
(screenname, fans_total, videos_total, views_total, description,
created_time, verified, country, url)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
channel.screenname, channel.fans_total, channel.videos_total,
channel.views_total, channel.description[:500] if channel.description else "",
channel.created_time, int(channel.verified), channel.country, channel.url,
))
conn.commit()
def compute_video_growth(conn: sqlite3.Connection, video_id: str, days: int = 7) -> dict:
"""Calculate view growth rate over the last N days."""
rows = conn.execute("""
SELECT snapshot_date, views_total, views_last_24h
FROM video_stats
WHERE video_id = ?
ORDER BY snapshot_date DESC
LIMIT ?
""", (video_id, days)).fetchall()
if len(rows) < 2:
return {"video_id": video_id, "insufficient_data": True}
latest = rows[0]
oldest = rows[-1]
views_gained = latest[1] - oldest[1] if latest[1] and oldest[1] else 0
days_span = len(rows) - 1
return {
"video_id": video_id,
"total_views_now": latest[1],
"total_views_then": oldest[1],
"views_gained": views_gained,
"days_tracked": days_span,
"avg_daily_views": round(views_gained / days_span, 0) if days_span else 0,
"latest_24h_views": latest[2],
}
Error Handling and Production Patterns
For production monitoring pipelines, you need comprehensive error handling:
def run_channel_monitoring_pipeline(
channels: list[str],
db_path: str = "dailymotion.db",
country: str = "us",
max_videos_per_channel: int = 200,
) -> dict:
"""
Monitor a list of channels: collect stats and new videos.
"""
client = DailymotionClient(API_KEY, API_SECRET)
conn = setup_dailymotion_db(db_path)
proxy = make_country_proxy(country) if country != "us" else None
stats = {
"channels_processed": 0,
"channels_failed": 0,
"videos_saved": 0,
"errors": [],
}
for channel_name in channels:
print(f"\nProcessing channel: {channel_name}")
# Get channel stats
try:
channel = get_channel_stats(client, channel_name, proxy=proxy)
if channel:
save_channel(conn, channel)
print(f" Fans: {channel.fans_total:,} | Videos: {channel.videos_total:,}")
except Exception as e:
print(f" Channel stats failed: {e}")
stats["errors"].append({"channel": channel_name, "error": str(e), "stage": "stats"})
time.sleep(1.0)
# Get recent videos
try:
videos = get_channel_videos(
client, channel_name,
max_videos=max_videos_per_channel,
proxy=proxy,
)
print(f" Got {len(videos)} videos")
for video in videos:
try:
save_video(conn, video)
stats["videos_saved"] += 1
except Exception as e:
pass # Non-fatal; log and continue
except Exception as e:
print(f" Video collection failed: {e}")
stats["errors"].append({"channel": channel_name, "error": str(e), "stage": "videos"})
stats["channels_failed"] += 1
continue
stats["channels_processed"] += 1
# Run trending snapshot
print("\nCapturing trending snapshot...")
try:
trending = get_trending_videos(client, country=country, limit=50, proxy=proxy)
for item in trending:
conn.execute("""
INSERT INTO trending_snapshots
(country, channel_name, rank_position, video_id, title,
views_last_24h, owner_screenname)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (
country, item.get("channel"), item.get("rank"),
item.get("video_id"), item.get("title"),
item.get("views_last_24h"), item.get("owner"),
))
conn.commit()
print(f" Saved {len(trending)} trending videos")
except Exception as e:
print(f" Trending failed: {e}")
client.close()
conn.close()
print(f"\n=== Pipeline complete ===")
print(f"Channels processed: {stats['channels_processed']}")
print(f"Videos saved: {stats['videos_saved']}")
print(f"Errors: {len(stats['errors'])}")
return stats
def enrich_video_details(
video_ids: list[str],
db_path: str = "dailymotion.db",
delay: float = 1.0,
) -> int:
"""
Enrich video records with full metadata (not just search snippet fields).
Returns count of successfully enriched videos.
"""
client = DailymotionClient(API_KEY, API_SECRET)
conn = sqlite3.connect(db_path)
enriched = 0
for video_id in video_ids:
try:
video = get_video(client, video_id)
if video:
save_video(conn, video)
enriched += 1
except Exception as e:
print(f"Enrichment failed for {video_id}: {e}")
time.sleep(delay)
client.close()
conn.close()
return enriched
Legal Note
Dailymotion's API Terms of Use permit data collection for non-commercial analysis and application development, provided you:
- Don't cache data beyond 24 hours without refreshing
- Don't redistribute video content or build a competing video platform
- Attribute Dailymotion as the source in public-facing uses
- Don't use the data to circumvent monetization systems
The API is more permissive than most video platforms. Review current terms at developer.dailymotion.com before deploying anything commercial. Direct HTML scraping of the website (rather than the API) is explicitly prohibited in their ToS.
Key Takeaways
- Dailymotion's Data API is one of the more accessible video platform APIs — free registration, reasonably generous rate limits, and granular field selection.
- Always use the
fieldsparameter to request only what you need. Expensive computed fields (likeviews_last_hour) increase server processing time and likely contribute to rate limit triggers. - Geographic restrictions are the biggest practical challenge — content libraries vary significantly by country, especially for news and sports.
- ThorData residential proxies with country targeting let you see each market's content catalog as local users see it — essential for cross-market media analysis and comprehensive trending data.
- Authenticate even if you don't strictly need to — the rate limit increase from 600 to 5,000 requests per 10-minute window makes it worthwhile for any sustained collection.
- Build in daily snapshot logic from the start: view counts that seemed stale yesterday become valuable trend data after a month of collection.