← All posts

Reddit Data API in 2026: Free Alternatives After the Pricing Controversy

April 14, 2026 · 9 min read
Contents What changed with the Reddit API What still works for free The .json endpoint trick Scraping a subreddit Extracting comments and threads User profile and history Search across Reddit Rate limits and User-Agent rules Managed scraper option

In June 2023 Reddit made its API prohibitively expensive, killing every third-party client -- Apollo, Reddit is Fun, Sync -- and making historical archives like Pushshift unusable for the general public. Researchers who had built their work on free data lost access overnight.

The paid tier now charges $0.24 per 1,000 API calls. For anything beyond a hobbyist script, that adds up fast. But here is the thing: Reddit's public web pages still expose a free, no-auth JSON API on every URL. This post covers the practical methods that still work in 2026.

What changed with the Reddit API

What still works for free

Three routes remain viable in 2026:

  1. The .json suffix trick -- append .json to any Reddit URL to get structured data. No auth, no API key.
  2. OAuth with a personal app -- the official way, 100 requests/min, free for non-commercial use (registration + 2FA required).
  3. Old.reddit.com HTML parsing -- noisier but works when JSON endpoints are Cloudflare-challenged.

We'll focus on route 1 because it's the simplest and covers ~90% of typical scraping use cases without any account setup.

The .json endpoint trick

Every Reddit page has a JSON twin. Add .json to the URL and you get the same data the web client uses, rendered as structured JSON. This is not an officially documented API, but it has been stable since 2012.

https://www.reddit.com/r/python/hot.json
https://www.reddit.com/r/python/comments/abc123/some_post.json
https://www.reddit.com/user/spez/submitted.json
https://www.reddit.com/search.json?q=web+scraping
User-Agent is mandatory: Reddit will hard-block any request with a generic or missing User-Agent. Always send a unique, descriptive UA like MyResearchBot/1.0 (contact: [email protected]). Python's default python-requests/2.x is rate-limited so aggressively it's effectively banned.

Scraping a subreddit

Grab the top posts from any subreddit. The response includes every field the web UI shows plus some it doesn't (upvote ratio, gilded status, flair metadata).

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
    run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Pagination with the after token

Reddit paginates via an opaque after token, not page numbers. Each response includes data.after; pass it back as the after parameter to get the next page.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
    run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)
Depth limit: Pagination tops out around 1,000 posts regardless of after chaining. Reddit does not expose posts older than that via the listing endpoint. For deeper history you need the /search endpoint with a time filter, or the now-restricted Pushshift.

Extracting comments and threads

Appending .json to a post URL returns a two-element array: the post itself, and the comment tree.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
    run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

The "more" items you encounter (kind == "more") are continuation tokens -- Reddit collapses very long comment threads and requires a separate /api/morechildren call to expand them. That endpoint requires OAuth, so for most purposes just ignore collapsed branches or fetch them via the official API.

User profile and history

Any username's submission and comment history is public unless the account is suspended.

# Managed actor call — skip guest tokens, rotating proxies, and brittle selectors
from apify_client import ApifyClient

client = ApifyClient('YOUR_APIFY_TOKEN')
run = client.actor('cryptosignals/reddit-scraper-fast').call(
    run_input={'subreddits': ['dataisbeautiful'], 'sort': 'hot', 'maxItems': 100}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item)

Reddit search is quirky but functional. Supports time windows, sort order, and subreddit filters.

https://www.reddit.com/search.json?q=web+scraping&sort=relevance&t=month&limit=25

Parameters: q (query), sort (relevance|hot|top|new|comments), t (hour|day|week|month|year|all), restrict_sr (true for within one subreddit).

Rate limits and User-Agent rules

Unauthenticated JSON endpoints are governed by implicit limits:

Tip: Reddit's throttling looks at (IP, User-Agent) pairs. If you run multiple scrapers from the same server, give each a distinct UA string. Adding 1-2 seconds of jitter between requests avoids most rate-limit issues for low-volume work.

For production scraping you need proxies. Residential rotating proxies are the gold standard; mobile proxies work even better but cost more. Avoid datacenter proxies -- Reddit's Cloudflare rules flag them within a handful of requests.

Managed scraper option

If you need reliable, high-volume Reddit data and don't want to manage proxies, User-Agent rotation, and pagination edge cases, use a managed actor.

Our Reddit Scraper Fast actor on Apify handles proxy rotation, rate limiting, and comment tree expansion automatically. You pass a subreddit, post URL, or search query and get clean JSON back -- no OAuth setup, no User-Agent management, no Cloudflare headaches.

For academic research, dataset collection, or monitoring workflows, it's cheaper than the official API's $0.24/1k calls and returns richer data (comment trees, user history, flair metadata) in one shot.

ApproachCostSetupHistorical
Official Reddit API$0.24 / 1k callsOAuth app~1k per endpoint
.json endpoint (direct)FreeNone~1k per endpoint
Pushshift (mod-only)GatedMod verificationFull archive
Managed actor (Apify)Pay-per-useAPI keyRecent + search

For one-off research and scripts, hit the .json endpoints directly with a good User-Agent and 2-second delays. When you graduate to production -- monitoring hundreds of subreddits, building a live dataset -- the managed route saves you from maintaining proxy pools and rate-limit logic yourself.


Try Apify free — the platform powering these scrapers. Get started →