In June 2023 Reddit made its API prohibitively expensive, killing every third-party client -- Apollo, Reddit is Fun, Sync -- and making historical archives like Pushshift unusable for the general public. Researchers who had built their work on free data lost access overnight.
The paid tier now charges $0.24 per 1,000 API calls. For anything beyond a hobbyist script, that adds up fast. But here is the thing: Reddit's public web pages still expose a free, no-auth JSON API on every URL. This post covers the practical methods that still work in 2026.
Three routes remain viable in 2026:
.json suffix trick -- append .json to any Reddit URL to get structured data. No auth, no API key.We'll focus on route 1 because it's the simplest and covers ~90% of typical scraping use cases without any account setup.
Every Reddit page has a JSON twin. Add .json to the URL and you get the same data the web client uses, rendered as structured JSON. This is not an officially documented API, but it has been stable since 2012.
https://www.reddit.com/r/python/hot.json https://www.reddit.com/r/python/comments/abc123/some_post.json https://www.reddit.com/user/spez/submitted.json https://www.reddit.com/search.json?q=web+scraping
MyResearchBot/1.0 (contact: [email protected]). Python's default python-requests/2.x is rate-limited so aggressively it's effectively banned.
Grab the top posts from any subreddit. The response includes every field the web UI shows plus some it doesn't (upvote ratio, gilded status, flair metadata).
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
after tokenReddit paginates via an opaque after token, not page numbers. Each response includes data.after; pass it back as the after parameter to get the next page.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
after chaining. Reddit does not expose posts older than that via the listing endpoint. For deeper history you need the /search endpoint with a time filter, or the now-restricted Pushshift.
Appending .json to a post URL returns a two-element array: the post itself, and the comment tree.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
The "more" items you encounter (kind == "more") are continuation tokens -- Reddit collapses very long comment threads and requires a separate /api/morechildren call to expand them. That endpoint requires OAuth, so for most purposes just ignore collapsed branches or fetch them via the official API.
Any username's submission and comment history is public unless the account is suspended.
# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient
client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)
for item in client.dataset(run['defaultDatasetId']).iterate_items():
print(item)
Reddit search is quirky but functional. Supports time windows, sort order, and subreddit filters.
https://www.reddit.com/search.json?q=web+scraping&sort=relevance&t=month&limit=25
Parameters: q (query), sort (relevance|hot|top|new|comments), t (hour|day|week|month|year|all), restrict_sr (true for within one subreddit).
Unauthenticated JSON endpoints are governed by implicit limits:
For production scraping you need proxies. Residential rotating proxies are the gold standard; mobile proxies work even better but cost more. Avoid datacenter proxies -- Reddit's Cloudflare rules flag them within a handful of requests.
If you need reliable, high-volume Reddit data and don't want to manage proxies, User-Agent rotation, and pagination edge cases, use a managed actor.
Our Reddit Scraper Fast actor on Apify handles proxy rotation, rate limiting, and comment tree expansion automatically. You pass a subreddit, post URL, or search query and get clean JSON back -- no OAuth setup, no User-Agent management, no Cloudflare headaches.
For academic research, dataset collection, or monitoring workflows, it's cheaper than the official API's $0.24/1k calls and returns richer data (comment trees, user history, flair metadata) in one shot.
| Approach | Cost | Setup | Historical |
|---|---|---|---|
| Official Reddit API | $0.24 / 1k calls | OAuth app | ~1k per endpoint |
| .json endpoint (direct) | Free | None | ~1k per endpoint |
| Pushshift (mod-only) | Gated | Mod verification | Full archive |
| Managed actor (Apify) | Pay-per-use | API key | Recent + search |
For one-off research and scripts, hit the .json endpoints directly with a good User-Agent and 2-second delays. When you graduate to production -- monitoring hundreds of subreddits, building a live dataset -- the managed route saves you from maintaining proxy pools and rate-limit logic yourself.
Try Apify free — the platform powering these scrapers. Get started →