Your Scraper's TLS Handshake Is Ratting You Out: A Complete Guide to JA3/JA4+ Fingerprinting
You have set up perfect headers. Your User-Agent rotates between Chrome, Firefox, and Safari. You are sending sec-ch-ua hints, Accept-Language headers, even Sec-Fetch-Dest metadata. Your request looks identical to what a real browser sends. And you are still getting blocked on the very first request.
Here is what is actually happening: the site never looked at your HTTP headers. It fingerprinted your TLS handshake before your HTTP request even started. Your beautifully crafted headers are riding inside an encrypted tunnel that already screamed "I am a Python script" during the initial handshake. The anti-bot system made its decision at the transport layer — your application-layer disguise was irrelevant.
This is one of the most common and least understood reasons scrapers fail. I have spent months investigating TLS fingerprinting after debugging a scraper that had perfect headers, rotating residential proxies, randomized timing, and still got blocked 100% of the time on Cloudflare-protected sites. The moment I understood the TLS layer, my success rate went from zero to over 90%. This guide covers everything I learned.
If you are building any kind of web scraper, data collection tool, or automated HTTP client that targets sites with bot protection, understanding TLS fingerprinting is not optional — it is the single most important anti-detection concept you need to master in 2025.
What Is a TLS Fingerprint?
Every HTTPS connection begins with a TLS handshake. Before a single byte of your HTTP request is transmitted, your client and the server negotiate encryption parameters through a series of messages. The very first message your client sends — the ClientHello — contains a wealth of identifying information.
The ClientHello Message
When your HTTP library initiates a TLS connection, it sends a ClientHello that contains:
- TLS version: The maximum TLS version your client supports (typically TLS 1.3)
- Cipher suites: An ordered list of encryption algorithms your client can use
- Extensions: Additional capabilities like Server Name Indication (SNI), supported groups, signature algorithms, ALPN protocols
- Supported groups: The elliptic curves your client supports for key exchange
- Signature algorithms: Which signing algorithms your client accepts
- Compression methods: Usually just "null" in modern clients
The critical insight is that every HTTP library has a unique combination of these parameters. The cipher suite order, extension list, and supported groups are determined by the underlying TLS implementation, not by your application code. You cannot change them by setting HTTP headers.
How JA3 Hashing Works
JA3 (developed by John Althouse, Jeff Atkinson, and Josh Atkins at Salesforce) creates a fingerprint by concatenating five fields from the ClientHello:
JA3 = MD5(TLSVersion,Ciphers,Extensions,EllipticCurves,EllipticCurvePointFormats)
For example, a Python requests library ClientHello might produce:
TLSVersion: 771
Ciphers: 4866-4867-4865-49196-49200-159-52393-52392-52394-49195-49199-158-49188-49192-107-49187-49191-103-49162-49172-57-49161-49171-51-157-156-61-60-53-47-255
Extensions: 0-11-10-35-22-23-13-43-45-51
Curves: 29-23-30-25-24
Point: 0
JA3 Hash: e7d705a3286e19ea42f587b344ee6865
Meanwhile, Chrome 131 produces a completely different hash because it uses BoringSSL (not OpenSSL) and has different cipher preferences:
TLSVersion: 771
Ciphers: 4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53
Extensions: 0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513-21
Curves: 29-23-24
Point: 0
JA3 Hash: cd08e31494f9531f560d64c695473da9
These two hashes are completely different. Any server that maintains a database of known JA3 fingerprints can instantly tell that the first connection is a Python script and the second is Chrome.
JA3S: Server Fingerprinting
JA3S is the server-side counterpart. It fingerprints the server's ServerHello response. Since servers often respond differently depending on the client's capabilities, the JA3/JA3S pair together provides an even more specific identification of both endpoints. This is used more for network security monitoring than bot detection, but it is worth understanding the full picture.
The Fingerprint Landscape in 2025
Here is what common tools and libraries look like to a modern Web Application Firewall:
| Tool / Library | TLS Backend | JA3 Looks Like | Detection Risk |
|---|---|---|---|
Python requests |
OpenSSL (via urllib3) | Python/OpenSSL | Instant block |
Python httpx |
OpenSSL (via h11/httpcore) | Python/OpenSSL | Instant block |
Python aiohttp |
OpenSSL | Python/OpenSSL | Instant block |
Python urllib3 |
OpenSSL | Python/OpenSSL | Instant block |
Node.js axios/node-fetch |
Node.js OpenSSL | Node.js-specific | High |
Go net/http |
Go crypto/tls | Go stdlib | High |
Rust reqwest |
rustls or OpenSSL | Rust-specific | High |
Java HttpClient |
JSSE | Java-specific | High |
curl (default) |
OpenSSL/LibreSSL | curl-specific | High |
| Headless Chrome (Puppeteer) | BoringSSL | Chrome-ish | Medium |
| Headless Chrome (Playwright) | BoringSSL | Chrome-ish | Medium |
curl_cffi (impersonating) |
BoringSSL | Matches target browser | Low |
| Real Chrome browser | BoringSSL | Chrome (authentic) | Very Low |
| Real Firefox browser | NSS | Firefox (authentic) | Very Low |
Notice the pattern: anything built on OpenSSL's default settings gets caught immediately. It is not that OpenSSL is a bad TLS library — it is that its default cipher suite ordering and extension configuration is well-documented and distinctive. Anti-bot vendors maintain databases of thousands of JA3 hashes mapped to specific tools and library versions.
Why All Python HTTP Libraries Look the Same
Python's requests, httpx, and aiohttp all ultimately use OpenSSL for TLS through Python's ssl module. Even though they are different libraries with different APIs, their TLS behavior is identical because they all delegate to the same underlying C library. The JA3 hash is determined by OpenSSL's default configuration, not by the Python library wrapping it.
This means that switching from requests to httpx does not change your TLS fingerprint. You are swapping the HTTP-layer code while keeping the same TLS-layer identity. From a fingerprinting perspective, they are the same client.
# All three produce the SAME JA3 fingerprint:
import requests
import httpx
import aiohttp
# requests uses urllib3 -> OpenSSL
requests.get("https://target.com")
# httpx uses httpcore -> h11/h2 -> OpenSSL
httpx.get("https://target.com")
# aiohttp uses its own connector -> OpenSSL
# async with aiohttp.ClientSession() as session:
# await session.get("https://target.com")
# From the server's perspective, all three connections
# have identical TLS ClientHello messages
Why Headless Chrome Is Not Safe Either
You might think switching to Puppeteer or Playwright solves the problem. After all, they launch a real Chrome binary with a real BoringSSL TLS stack. Better, but not bulletproof.
The Headless Fingerprint Difference
Headless Chrome's JA3 fingerprint is almost identical to regular Chrome, but there are subtle differences:
-
TLS extension ordering: In some Chrome versions, the headless mode produces a slightly different extension order in the ClientHello. This difference is small enough to look identical to basic JA3, but JA4+ and custom fingerprinting catch it.
-
GREASE values: Chrome uses GREASE (Generate Random Extensions And Sustain Extensibility) to insert random cipher suite and extension values. The GREASE values differ between headed and headless instances, and some fingerprinting systems track these patterns.
-
ALPN preferences: The Application-Layer Protocol Negotiation extension can differ between headed and headless modes, particularly regarding HTTP/2 vs HTTP/3 preferences.
Version Mismatch Detection
Chrome's JA3 hash changes between versions because cipher preferences and extensions evolve. If your headless Chrome is version 120 but real users are on 131, the version mismatch in the TLS fingerprint is a signal:
# Chrome 120 JA3: abc123...
# Chrome 124 JA3: def456...
# Chrome 131 JA3: ghi789...
# If Cloudflare sees Chrome/131 User-Agent but Chrome/120 JA3,
# that mismatch is a detection signal
This means you need to keep your headless browser updated. A three-month-old Chrome binary has a JA3 that no real user is sending anymore.
Beyond JA3: HTTP/2 Fingerprinting
Modern anti-bot systems do not stop at TLS fingerprinting. They also fingerprint your HTTP/2 behavior. When an HTTP/2 connection is established, the client sends a SETTINGS frame with configuration parameters:
SETTINGS Frame:
HEADER_TABLE_SIZE: 65536
ENABLE_PUSH: 0 (or 1)
MAX_CONCURRENT_STREAMS: 1000
INITIAL_WINDOW_SIZE: 6291456
MAX_FRAME_SIZE: 16384
MAX_HEADER_LIST_SIZE: 262144
These values differ between browsers. Chrome, Firefox, and Safari each send distinct SETTINGS values. A Python HTTP/2 client (like httpx with h2) sends different values than any browser. This creates a secondary fingerprint that can be checked alongside JA3.
# Chrome's HTTP/2 SETTINGS (typical):
# HEADER_TABLE_SIZE=65536, INITIAL_WINDOW_SIZE=6291456, MAX_HEADER_LIST_SIZE=262144
# httpx with h2 (typical):
# HEADER_TABLE_SIZE=4096, INITIAL_WINDOW_SIZE=65535, MAX_HEADER_LIST_SIZE=16384
# These differences are detectable and logged by CDN providers
Additionally, browsers send HTTP/2 frames in a specific order with specific priority values (PRIORITY frames, WINDOW_UPDATE timing). This behavior is called the HTTP/2 fingerprint and is tracked by Cloudflare, Akamai, and other CDN/anti-bot providers.
JA4+: The Next Generation
JA4+ is a suite of fingerprinting methods developed by FoxIO that provides much more granular identification than JA3. It includes:
JA4: Improved TLS Client Fingerprinting
JA4 improves on JA3 by: - Separating cipher suites from extensions (so they can be analyzed independently) - Sorting cipher suites and extensions alphabetically (removing ordering as a variable that changes between library versions) - Including the ALPN protocol negotiation - Using truncated SHA256 instead of MD5
JA4 format: [type][version][SNI][ciphers_count][extensions_count]_[sorted_ciphers_hash]_[sorted_extensions_hash]
Example: t13d1517h2_8daaf6152771_b0da82dd1658
t = TCP
13 = TLS 1.3
d = destination: domain (vs IP)
15 = 15 cipher suites
17 = 17 extensions
h2 = ALPN: HTTP/2
8daaf6152771 = truncated hash of sorted cipher suites
b0da82dd1658 = truncated hash of sorted extensions
JA4H: HTTP Client Fingerprinting
JA4H fingerprints HTTP headers — not just which headers are present, but their exact order. Browsers send headers in a specific order that differs from programmatic HTTP clients:
# Chrome sends headers in this order:
# Host, Connection, sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform,
# Upgrade-Insecure-Requests, User-Agent, Accept, Sec-Fetch-Site, ...
# Python httpx sends headers in this order:
# Host, User-Agent, Accept, Accept-Encoding, Connection, ...
# Even if the header VALUES are identical, the ORDER reveals the client
JA4S, JA4X, JA4SSH
The JA4+ suite also includes: - JA4S: Server TLS fingerprint (ServerHello analysis) - JA4X: X.509 certificate fingerprint - JA4SSH: SSH client/server fingerprint - JA4T: TCP fingerprint (window size, TTL, options)
Together, these create a multi-dimensional fingerprint that is extremely difficult to spoof comprehensively.
What Actually Works: Practical Solutions
Solution 1: curl_cffi (Best Python Option)
curl_cffi is a Python wrapper around libcurl compiled with BoringSSL. It can impersonate specific browser versions by reproducing their exact TLS ClientHello, including cipher suite order, extension order, GREASE values, and ALPN settings.
from curl_cffi import requests as cffi_requests
import json
class TLSStealthClient:
"""HTTP client with browser-grade TLS fingerprinting."""
# Supported browser impersonation targets
BROWSERS = {
"chrome131": "chrome131",
"chrome130": "chrome130",
"chrome124": "chrome124",
"chrome120": "chrome120",
"edge131": "edge131",
"safari18": "safari18_0",
"firefox132": "firefox132",
}
def __init__(
self,
browser: str = "chrome131",
proxy: str | None = None,
timeout: int = 15,
):
if browser not in self.BROWSERS:
raise ValueError(f"Unknown browser: {browser}. Use one of: {list(self.BROWSERS.keys())}")
proxy_dict = {"https": proxy, "http": proxy} if proxy else None
self.session = cffi_requests.Session(
impersonate=self.BROWSERS[browser],
proxies=proxy_dict,
timeout=timeout,
)
self.browser = browser
def get(self, url: str, **kwargs) -> cffi_requests.Response:
"""Send GET request with browser TLS fingerprint."""
return self.session.get(url, **kwargs)
def post(self, url: str, **kwargs) -> cffi_requests.Response:
"""Send POST request with browser TLS fingerprint."""
return self.session.post(url, **kwargs)
def verify_fingerprint(self) -> dict:
"""Check your TLS fingerprint against a public checker."""
resp = self.session.get("https://tls.browserleaks.com/json")
data = resp.json()
return {
"ja3_hash": data.get("ja3_hash"),
"ja3_text": data.get("ja3_text"),
"user_agent": data.get("user_agent"),
"akamai_hash": data.get("akamai_hash"),
"impersonating": self.browser,
}
def close(self):
self.session.close()
def __enter__(self):
return self
def __exit__(self, *args):
self.close()
# Usage: verify your fingerprint matches Chrome
with TLSStealthClient(browser="chrome131") as client:
fp = client.verify_fingerprint()
print(json.dumps(fp, indent=2))
# Then scrape with the same client
resp = client.get("https://target-site.com/api/data")
print(resp.status_code)
Why curl_cffi Works So Well
When you set impersonate="chrome131", curl_cffi does not just change the User-Agent. It reproduces the exact TLS ClientHello that Chrome 131 sends:
- Same cipher suite list in the same order
- Same TLS extensions in the same order
- Same GREASE values and patterns
- Same supported groups and signature algorithms
- Same ALPN protocols (h2, http/1.1)
- Same BoringSSL-specific behaviors
The result is a JA3 hash that is identical to a real Chrome 131 browser. From the server's perspective, there is no TLS-layer difference between your Python script and a real Chrome user.
Combining curl_cffi with Rotating Proxies
For maximum effectiveness, pair curl_cffi with residential proxy rotation:
from curl_cffi import requests as cffi_requests
import random
import time
class StealthScraper:
"""Production scraper with TLS impersonation and proxy rotation."""
def __init__(self, proxy_url: str):
self.proxy_url = proxy_url
self.browsers = ["chrome131", "chrome130", "chrome124"]
self._new_session()
def _new_session(self):
"""Create a new session with random browser impersonation."""
browser = random.choice(self.browsers)
self.session = cffi_requests.Session(
impersonate=browser,
proxies={"https": self.proxy_url, "http": self.proxy_url},
timeout=15,
)
self.request_count = 0
def get(self, url: str, **kwargs) -> cffi_requests.Response:
"""GET request with automatic session rotation."""
if self.request_count >= 15: # Rotate every 15 requests
self.session.close()
self._new_session()
resp = self.session.get(url, **kwargs)
self.request_count += 1
return resp
def close(self):
self.session.close()
# With ThorData rotating residential proxies
scraper = StealthScraper(
proxy_url="http://user:[email protected]:9000"
)
# Each request gets a real browser TLS fingerprint + residential IP
resp = scraper.get("https://protected-site.com/data")
print(resp.status_code)
scraper.close()
Using ThorData residential proxies with curl_cffi is particularly effective because you are combining two critical anti-detection layers: genuine browser TLS fingerprints from BoringSSL impersonation plus real residential IP addresses that have high trust scores with CDN providers. This combination defeats both the network-layer (IP reputation) and transport-layer (TLS fingerprint) detection that catches most scrapers.
Solution 2: Use a Real Browser via CDP
If you need full browser rendering anyway, connect to a real (headed) Chrome instance via Chrome DevTools Protocol for a 100% authentic fingerprint:
import subprocess
import websocket
import json
import time
import tempfile
import os
class RealBrowserClient:
"""Control a real Chrome browser for authentic TLS fingerprints."""
def __init__(self, chrome_path: str | None = None, port: int = 9222):
self.port = port
self.chrome_path = chrome_path or self._find_chrome()
self.process = None
self.user_data_dir = tempfile.mkdtemp()
def _find_chrome(self) -> str:
"""Find Chrome binary on the system."""
candidates = [
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"/usr/bin/google-chrome",
"/usr/bin/chromium-browser",
"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
]
for path in candidates:
if os.path.exists(path):
return path
raise FileNotFoundError("Chrome not found. Specify chrome_path.")
def start(self):
"""Launch Chrome with remote debugging enabled."""
self.process = subprocess.Popen([
self.chrome_path,
f"--remote-debugging-port={self.port}",
f"--user-data-dir={self.user_data_dir}",
"--no-first-run",
"--no-default-browser-check",
])
time.sleep(3) # Wait for Chrome to start
def navigate(self, url: str) -> str:
"""Navigate to a URL and return the page HTML."""
import httpx
# Get the WebSocket debugger URL
resp = httpx.get(f"http://localhost:{self.port}/json")
pages = resp.json()
ws_url = pages[0]["webSocketDebuggerUrl"]
# Connect and navigate
ws = websocket.create_connection(ws_url)
ws.send(json.dumps({
"id": 1,
"method": "Page.navigate",
"params": {"url": url},
}))
ws.recv() # Navigation response
time.sleep(3) # Wait for page to load
# Get the page HTML
ws.send(json.dumps({
"id": 2,
"method": "Runtime.evaluate",
"params": {"expression": "document.documentElement.outerHTML"},
}))
result = json.loads(ws.recv())
html = result["result"]["result"]["value"]
ws.close()
return html
def stop(self):
"""Shut down Chrome."""
if self.process:
self.process.terminate()
self.process.wait()
# Usage
browser = RealBrowserClient()
browser.start()
html = browser.navigate("https://protected-site.com/data")
print(f"Got {len(html)} bytes of HTML")
browser.stop()
The TLS fingerprint is perfectly authentic because it IS a real Chrome browser. The downside is the resource overhead of running a full browser process.
Solution 3: tls-client (Go-based Python Library)
Another option is tls-client, a Python library that uses Go's crypto/tls under the hood to impersonate browsers:
import tls_client
session = tls_client.Session(
client_identifier="chrome_131",
random_tls_extension_order=True,
)
# Set headers to match the impersonated browser
session.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
resp = session.get("https://protected-site.com")
print(resp.status_code, len(resp.text))
Building a Complete Anti-Detection Stack
TLS fingerprinting is one layer of a multi-layer detection system. Here is how to build a comprehensive anti-detection stack:
from curl_cffi import requests as cffi_requests
import random
import time
import json
from dataclasses import dataclass
@dataclass
class AntiDetectionConfig:
"""Configuration for multi-layer anti-detection."""
# TLS layer
browser_impersonation: str = "chrome131"
rotate_browser_version: bool = True
# Network layer
proxy_url: str | None = None
rotate_proxy_per_request: bool = False
# HTTP layer
randomize_header_order: bool = True
include_sec_ch_headers: bool = True
# Behavioral layer
min_delay: float = 2.0
max_delay: float = 8.0
max_requests_per_session: int = 20
class AntiDetectionScraper:
"""Multi-layer anti-detection scraper."""
CHROME_VERSIONS = ["chrome131", "chrome130", "chrome124"]
USER_AGENTS = {
"chrome131": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
"chrome130": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36",
"chrome124": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
}
def __init__(self, config: AntiDetectionConfig):
self.config = config
self.request_count = 0
self._create_session()
def _create_session(self):
"""Create a new session with fresh fingerprint."""
browser = (
random.choice(self.CHROME_VERSIONS)
if self.config.rotate_browser_version
else self.config.browser_impersonation
)
self.current_browser = browser
proxy_dict = None
if self.config.proxy_url:
proxy_dict = {
"https": self.config.proxy_url,
"http": self.config.proxy_url,
}
self.session = cffi_requests.Session(
impersonate=browser,
proxies=proxy_dict,
timeout=15,
)
def _build_headers(self, url: str) -> dict:
"""Build browser-consistent headers for the current impersonation."""
ua = self.USER_AGENTS.get(self.current_browser, self.USER_AGENTS["chrome131"])
headers = {
"User-Agent": ua,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"DNT": "1",
"Upgrade-Insecure-Requests": "1",
"Connection": "keep-alive",
}
if self.config.include_sec_ch_headers and "chrome" in self.current_browser:
version = self.current_browser.replace("chrome", "")
headers.update({
"Sec-Ch-Ua": f'"Google Chrome";v="{version}", "Chromium";v="{version}", "Not_A Brand";v="24"',
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": '"Windows"',
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
})
return headers
def get(self, url: str, **kwargs) -> cffi_requests.Response:
"""Make a GET request with full anti-detection measures."""
# Rotate session if needed
if self.request_count >= self.config.max_requests_per_session:
self.session.close()
self._create_session()
self.request_count = 0
# Add delay between requests
if self.request_count > 0:
delay = random.triangular(
self.config.min_delay,
self.config.max_delay,
self.config.min_delay + 1.0,
)
time.sleep(delay)
headers = self._build_headers(url)
if "headers" in kwargs:
headers.update(kwargs.pop("headers"))
resp = self.session.get(url, headers=headers, **kwargs)
self.request_count += 1
return resp
def close(self):
self.session.close()
# Production usage with ThorData proxies
config = AntiDetectionConfig(
proxy_url="http://user:[email protected]:9000",
rotate_browser_version=True,
min_delay=3.0,
max_delay=10.0,
max_requests_per_session=15,
)
scraper = AntiDetectionScraper(config)
urls = [
"https://target-site.com/page/1",
"https://target-site.com/page/2",
"https://target-site.com/page/3",
]
for url in urls:
resp = scraper.get(url)
print(f"{url}: {resp.status_code} ({len(resp.text)} bytes)")
scraper.close()
How to Verify Your Fingerprint
Before deploying any scraper, verify what you actually look like to the target server. Do not assume — test.
Check Your JA3 Hash
from curl_cffi import requests as cffi_requests
import json
def check_fingerprint(impersonate: str = "chrome131", proxy: str | None = None):
"""Check your TLS fingerprint against multiple services."""
proxy_dict = {"https": proxy, "http": proxy} if proxy else None
session = cffi_requests.Session(
impersonate=impersonate,
proxies=proxy_dict,
)
# Service 1: BrowserLeaks TLS check
try:
resp = session.get("https://tls.browserleaks.com/json")
tls_data = resp.json()
print("=== BrowserLeaks TLS ===")
print(f"JA3 Hash: {tls_data.get('ja3_hash')}")
print(f"JA3 Text: {tls_data.get('ja3_text', '')[:80]}...")
print(f"Protocol: {tls_data.get('tls_version')}")
print()
except Exception as e:
print(f"BrowserLeaks failed: {e}")
# Service 2: Scrapfly fingerprint check
try:
resp = session.get("https://tools.scrapfly.io/api/fp/ja3")
fp_data = resp.json()
print("=== Scrapfly JA3 ===")
print(f"JA3 Hash: {fp_data.get('ja3_digest')}")
print(f"JA3N Hash: {fp_data.get('ja3n_digest')}")
print()
except Exception as e:
print(f"Scrapfly failed: {e}")
session.close()
# Compare Python default vs browser impersonation
print("--- Default Python (httpx) ---")
import httpx
try:
r = httpx.get("https://tls.browserleaks.com/json")
d = r.json()
print(f"JA3 Hash: {d.get('ja3_hash')}")
except Exception as e:
print(f"Error: {e}")
print()
print("--- curl_cffi Chrome 131 ---")
check_fingerprint("chrome131")
Monitor for Fingerprint Changes
Browser TLS fingerprints change between versions. Set up monitoring to ensure your impersonation stays current:
import json
import hashlib
from datetime import datetime
KNOWN_FINGERPRINTS = {
"chrome131": {
"ja3_hash": None, # Will be populated on first run
"last_checked": None,
},
"chrome130": {
"ja3_hash": None,
"last_checked": None,
},
}
def update_fingerprint_database(browser: str):
"""Check and record the current JA3 for a browser impersonation."""
from curl_cffi import requests as cffi_requests
session = cffi_requests.Session(impersonate=browser)
resp = session.get("https://tls.browserleaks.com/json")
data = resp.json()
session.close()
current_hash = data.get("ja3_hash")
stored = KNOWN_FINGERPRINTS.get(browser, {})
if stored.get("ja3_hash") and stored["ja3_hash"] != current_hash:
print(f"WARNING: {browser} fingerprint changed!")
print(f" Old: {stored['ja3_hash']}")
print(f" New: {current_hash}")
print(" -> Update your curl_cffi library")
KNOWN_FINGERPRINTS[browser] = {
"ja3_hash": current_hash,
"last_checked": datetime.utcnow().isoformat(),
}
return current_hash
Error Handling for TLS-Related Blocks
When your TLS fingerprint triggers a block, you need to detect and handle it properly:
from enum import Enum
from dataclasses import dataclass
class TLSBlockType(Enum):
FINGERPRINT_MISMATCH = "fingerprint_mismatch"
CDN_CHALLENGE = "cdn_challenge"
WAF_BLOCK = "waf_block"
RATE_LIMIT = "rate_limit"
CLEAN = "clean"
@dataclass
class TLSBlockDetection:
block_type: TLSBlockType
confidence: float # 0-1
detail: str
def detect_tls_block(response) -> TLSBlockDetection:
"""Detect if a response indicates TLS-level blocking."""
# Cloudflare challenge page
if response.status_code == 403 and "cf-chl-bypass" in response.text:
return TLSBlockDetection(
TLSBlockType.CDN_CHALLENGE,
0.95,
"Cloudflare challenge page - likely TLS fingerprint mismatch",
)
# Cloudflare 1020 error
if "error code: 1020" in response.text:
return TLSBlockDetection(
TLSBlockType.WAF_BLOCK,
0.9,
"Cloudflare 1020 Access Denied - WAF rule triggered",
)
# Akamai bot detection
if response.status_code == 403 and "akamai" in response.headers.get("server", "").lower():
return TLSBlockDetection(
TLSBlockType.FINGERPRINT_MISMATCH,
0.85,
"Akamai 403 - likely bot detection via TLS/HTTP fingerprint",
)
# Generic 403 with empty or minimal body
if response.status_code == 403 and len(response.text) < 500:
return TLSBlockDetection(
TLSBlockType.WAF_BLOCK,
0.7,
"Generic 403 with minimal body - possible fingerprint block",
)
# 429 rate limit
if response.status_code == 429:
return TLSBlockDetection(
TLSBlockType.RATE_LIMIT,
0.9,
"Rate limited - may be IP or fingerprint based",
)
return TLSBlockDetection(TLSBlockType.CLEAN, 1.0, "No block detected")
def handle_tls_block(detection: TLSBlockDetection) -> str:
"""Return recommended action for a detected block."""
actions = {
TLSBlockType.FINGERPRINT_MISMATCH: (
"Switch to curl_cffi with browser impersonation. "
"Your current TLS fingerprint is being detected."
),
TLSBlockType.CDN_CHALLENGE: (
"The CDN is serving a JS challenge. Options: "
"1) Use curl_cffi to pass TLS check, "
"2) Use Playwright for JS execution, "
"3) Use residential proxies to reduce challenge frequency."
),
TLSBlockType.WAF_BLOCK: (
"WAF is blocking this request. Check: "
"1) TLS fingerprint matches a real browser, "
"2) Headers are consistent with impersonated browser, "
"3) Request rate is within human norms."
),
TLSBlockType.RATE_LIMIT: (
"Rate limited. Increase delay between requests and "
"rotate to a fresh residential IP."
),
}
return actions.get(detection.block_type, "No action needed.")
Real-World Use Cases for TLS Fingerprint Awareness
Price Monitoring on Protected E-commerce Sites
from curl_cffi import requests as cffi_requests
from bs4 import BeautifulSoup
import json
def scrape_protected_prices(
product_urls: list[str],
proxy_url: str,
) -> list[dict]:
"""Scrape prices from Cloudflare-protected e-commerce sites."""
session = cffi_requests.Session(
impersonate="chrome131",
proxies={"https": proxy_url, "http": proxy_url},
)
prices = []
for url in product_urls:
resp = session.get(url)
# Check for blocks
detection = detect_tls_block(resp)
if detection.block_type != TLSBlockType.CLEAN:
prices.append({"url": url, "error": detection.detail})
continue
soup = BeautifulSoup(resp.text, "lxml")
# Extract price (adapt selectors to target site)
price_el = soup.select_one("[data-price], .price, .product-price")
title_el = soup.select_one("h1, .product-title")
prices.append({
"url": url,
"title": title_el.get_text(strip=True) if title_el else "",
"price": price_el.get_text(strip=True) if price_el else "N/A",
})
import time, random
time.sleep(random.uniform(2, 5))
session.close()
return prices
API Scraping Behind Cloudflare
def scrape_api_behind_cloudflare(
api_url: str,
params: dict,
proxy_url: str | None = None,
) -> dict:
"""Access APIs protected by Cloudflare bot management."""
proxy_dict = {"https": proxy_url, "http": proxy_url} if proxy_url else None
session = cffi_requests.Session(
impersonate="chrome131",
proxies=proxy_dict,
)
# Some APIs require you to first visit the main page to get cookies
base_url = api_url.split("/api/")[0] if "/api/" in api_url else api_url.rsplit("/", 1)[0]
session.get(base_url) # Get cf_clearance cookie
import time
time.sleep(1)
# Now make the API call with the Cloudflare cookies
resp = session.get(api_url, params=params)
session.close()
return resp.json() if resp.status_code == 200 else {"error": resp.status_code}
The Practical Takeaway
If you are building scrapers or data collection tools in 2025, here is what you need to know:
-
Stop using
requests/httpx/aiohttpfor protected sites. Their TLS fingerprint is a neon sign saying "I am a script." No amount of header spoofing will fix this. -
Use
curl_cffiwith browser impersonation for most use cases. It is the best effort-to-results ratio. Install it withpip install curl-cffiand addimpersonate="chrome131"to your requests. -
Keep impersonation versions current. A Chrome 120 fingerprint when everyone is on Chrome 131 is suspicious. Update
curl_cffiregularly and use the latest browser identifier. -
Test your fingerprint before deploying. Use the verification code above. Do not assume — verify that your JA3 hash matches a real browser.
-
Layer your approach: TLS fingerprint is necessary but not sufficient. You still need proper headers, realistic timing, and residential proxies from ThorData or similar providers for serious targets. The combination of genuine browser TLS + residential IP + human-like behavior is what gets you past modern anti-bot systems.
-
Understand that this is an arms race. JA3 was just the beginning. JA4+, HTTP/2 fingerprinting, and behavioral analysis are all advancing. The developers who understand the full detection stack have a significant advantage over those who only think about HTTP headers.
The anti-bot industry is getting better at this faster than most scrapers adapt. But with the right tools and understanding, you can build scrapers that reliably access even well-protected sites. The key is working at every layer of the stack — not just the HTTP layer that most tutorials focus on.
Further Reading
- How to Scrape Google Search Results Without Getting Blocked — Practical SERP scraping guide
- httpx vs Playwright: When to Use Each — Decision framework for choosing tools
- Residential vs Datacenter Proxies — Why proxy type matters for detection