Scrape Apartments.com & Rent.com: Rental Listings, Prices & Neighborhood Data (2026)
Scrape Apartments.com & Rent.com: Rental Listings, Prices & Neighborhood Data (2026)
Apartments.com and Rent.com (both owned by CoStar Group) list millions of rental properties across the US. The data they aggregate — rent prices, amenities, neighborhood scores, availability, floor plans — is invaluable for market analysis, building rental comparison tools, tracking price trends in specific markets, or building investment decision support tools.
CoStar spends heavily protecting this data. But their listings are publicly accessible, and with the right technical approach, you can extract clean, structured rental data at scale.
Available Data Points
Each listing on Apartments.com includes:
Property-Level Data
- Property name and address with geocoordinates (lat/lng)
- Property type — apartment complex, condo, townhouse, house
- Year built and last renovation date
- Total unit count
- Management company and contact info
Pricing Data
- Rent ranges by unit type (studio, 1BR, 2BR, 3BR, 4BR+)
- Low and high prices per unit type
- Price per square foot
- Specials and promotions — first month free, reduced deposit, etc.
Unit Details
- Floor plans — layout names, bed/bath counts
- Square footage ranges per floor plan
- Availability status and move-in dates
- Virtual tour links
Amenities
- In-unit: washer/dryer, dishwasher, A/C, balcony, fireplace
- Community: pool, gym, parking (type and cost), dog park, business center
- Building: doorman, elevator, package lockers, EV charging
Scores and Context
- Walk Score, Transit Score, Bike Score
- Neighborhood type and nearby transit lines
- School ratings (elementary, middle, high)
- Pet policy — allowed animals, breed restrictions, deposits and monthly fees
Anti-Bot Protections
CoStar protects their listings aggressively:
Datadome
Apartments.com uses Datadome, one of the more sophisticated bot detection systems. It: - Runs a JavaScript challenge on first visit - Builds a behavioral fingerprint (mouse movements, keystrokes, scroll patterns) - Maintains a device graph across sessions - Specifically targets datacenter IP ASNs
Datadome is why simple requests or plain httpx scraping fails immediately. Residential proxies are not optional — they're baseline.
Map-Based Results
Listings load based on your map viewport. The URL can encode lat, lng, and zoom but the primary mechanism is viewport-based: when you pan the map, new listings load. Traditional pagination (?page=2) works via URL for simple city searches, but comprehensive coverage of a metro area requires viewport manipulation.
Dynamic Element IDs
React-generated class names change on every deployment. Selectors based on class names like styles__PropertyCard__3x9Qz break constantly. Use data-* attributes, semantic HTML, and JSON-LD structured data instead.
Rate Limiting
CoStar's internal API throttles at roughly 80–100 requests per IP per hour. Proxy rotation is essential for any serious data collection.
Setup
pip install httpx selectolax playwright lxml
playwright install chromium
Approach 1: JSON-LD Structured Data
The most reliable data source — Apartments.com embeds rich schema.org structured data in every page. This doesn't change with React deployments:
import httpx
import json
from selectolax.parser import HTMLParser
import time
import random
HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
}
def extract_json_ld(html):
"""Extract all JSON-LD structured data from a page."""
tree = HTMLParser(html)
results = []
for script in tree.css('script[type="application/ld+json"]'):
try:
data = json.loads(script.text())
if isinstance(data, list):
results.extend(data)
else:
results.append(data)
except json.JSONDecodeError:
continue
return results
def parse_apartment_json_ld(json_ld_items):
"""Extract apartment data from JSON-LD structured data."""
listings = []
for item in json_ld_items:
schema_type = item.get("@type", "")
if schema_type in ("ApartmentComplex", "Apartment", "LodgingBusiness"):
listing = {
"name": item.get("name"),
"description": item.get("description"),
"url": item.get("url"),
"image": item.get("image"),
"telephone": item.get("telephone"),
"latitude": None,
"longitude": None,
"address": None,
"amenities": [],
"price_range": None,
}
# Geo coordinates
geo = item.get("geo", {})
listing["latitude"] = geo.get("latitude")
listing["longitude"] = geo.get("longitude")
# Address
addr = item.get("address", {})
if isinstance(addr, dict):
listing["address"] = {
"street": addr.get("streetAddress"),
"city": addr.get("addressLocality"),
"state": addr.get("addressRegion"),
"zip": addr.get("postalCode"),
}
elif isinstance(addr, str):
listing["address"] = {"full": addr}
# Amenities
for amenity in item.get("amenityFeature", []):
if isinstance(amenity, dict):
listing["amenities"].append(amenity.get("name", ""))
elif isinstance(amenity, str):
listing["amenities"].append(amenity)
# Price range
listing["price_range"] = item.get("priceRange")
listings.append(listing)
return listings
def fetch_page(url, proxy_url=None, cookies=None):
"""Fetch a page with optional proxy and cookies."""
client_kwargs = {
"headers": HEADERS,
"follow_redirects": True,
"timeout": 30,
}
if proxy_url:
client_kwargs["proxies"] = {"all://": proxy_url}
if cookies:
client_kwargs["cookies"] = cookies
try:
with httpx.Client(**client_kwargs) as client:
resp = client.get(url)
return resp if resp.status_code == 200 else None
except Exception as e:
print(f"Fetch error: {e}")
return None
Approach 2: Session Cookie Method
Use Playwright to get valid Datadome session cookies, then use httpx for subsequent requests:
from playwright.sync_api import sync_playwright
import time
def get_datadome_cookies(city, state, proxy_config=None):
"""
Use Playwright to pass Datadome JS challenge and collect session cookies.
Returns cookies dict for use with httpx.
"""
with sync_playwright() as p:
launch_kwargs = {
"headless": True,
"args": [
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-web-security",
],
}
if proxy_config:
launch_kwargs["proxy"] = proxy_config
browser = p.chromium.launch(**launch_kwargs)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=HEADERS["User-Agent"],
locale="en-US",
)
page = context.new_page()
# Visit a landing page first, not the target directly
page.goto("https://www.apartments.com/", wait_until="domcontentloaded")
time.sleep(random.uniform(2, 4))
# Then navigate to target city
page.goto(
f"https://www.apartments.com/{city}-{state}/",
wait_until="networkidle",
timeout=30000,
)
time.sleep(random.uniform(2, 4))
# Simulate human behavior
page.mouse.move(
random.randint(200, 800),
random.randint(200, 600),
)
page.mouse.wheel(0, random.randint(200, 500))
time.sleep(random.uniform(1, 2))
cookies = context.cookies()
browser.close()
return {c["name"]: c["value"] for c in cookies}
def scrape_city_listings(city, state, proxy_url=None, max_pages=15):
"""
Scrape all listings for a city using session cookies from Playwright.
"""
# Build proxy config for Playwright
proxy_config = None
if proxy_url:
# Parse proxy URL: http://user:pass@host:port
import re
m = re.match(r'http://([^:]+):([^@]+)@([^:]+):(\d+)', proxy_url)
if m:
proxy_config = {
"server": f"http://{m.group(3)}:{m.group(4)}",
"username": m.group(1),
"password": m.group(2),
}
print(f"Getting Datadome cookies for {city}-{state}...")
cookies = get_datadome_cookies(city, state, proxy_config)
print(f"Got {len(cookies)} cookies")
all_listings = []
for page_num in range(1, max_pages + 1):
url = f"https://www.apartments.com/{city}-{state}/{page_num}/"
resp = fetch_page(url, proxy_url, cookies)
if not resp:
print(f"Page {page_num}: failed to fetch")
break
# Check for Datadome block
if "datadome" in resp.text.lower() and "captcha" in resp.text.lower():
print(f"Page {page_num}: Datadome challenge — refreshing cookies")
cookies = get_datadome_cookies(city, state, proxy_config)
continue
# Parse JSON-LD first (most reliable)
json_ld = extract_json_ld(resp.text)
listings = parse_apartment_json_ld(json_ld)
# Fall back to DOM parsing if JSON-LD is empty
if not listings:
listings = parse_listings_dom(resp.text)
if not listings:
print(f"Page {page_num}: no listings, done.")
break
all_listings.extend(listings)
print(f"Page {page_num}: {len(listings)} listings (total: {len(all_listings)})")
time.sleep(random.uniform(2.5, 5.0))
return all_listings
Approach 3: Full Playwright Scraping
For detail pages with lazy-loaded content:
import asyncio
from playwright.async_api import async_playwright
async def scrape_listing_detail(url, proxy_config=None):
"""Extract full details from a single listing page."""
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy=proxy_config,
args=["--disable-blink-features=AutomationControlled"],
)
context = await browser.new_context(
user_agent=HEADERS["User-Agent"],
viewport={"width": 1440, "height": 900},
)
page = await context.new_page()
try:
await page.goto(url, wait_until="networkidle", timeout=30000)
await page.wait_for_timeout(2000)
# Scroll to trigger lazy loading
await page.evaluate("window.scrollTo(0, document.body.scrollHeight / 2)")
await page.wait_for_timeout(1000)
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(1500)
details = await page.evaluate("""
() => {
const getText = (sel, root = document) => root.querySelector(sel)?.textContent?.trim();
const getAll = (sel, root = document) => Array.from(
root.querySelectorAll(sel)
).map(e => e.textContent.trim()).filter(Boolean);
// Floor plans
const floorPlans = [];
document.querySelectorAll('[class*="floorPlan"], [data-testid*="floor-plan"], .pricingGridItem').forEach(fp => {
floorPlans.push({
name: fp.querySelector('[class*="name"], .modelName')?.textContent?.trim(),
beds: fp.querySelector('[class*="bed"], .detailsTextWrapper')?.textContent?.trim(),
baths: fp.querySelector('[class*="bath"]')?.textContent?.trim(),
sqft: fp.querySelector('[class*="sqft"], [class*="squareFeet"]')?.textContent?.trim(),
price: fp.querySelector('[class*="price"], .rentLabel')?.textContent?.trim(),
available: fp.querySelector('[class*="available"], [class*="availability"]')?.textContent?.trim(),
});
});
// Amenities
const amenities = getAll('.amenityItems li, [class*="amenity"] li, .featureItem');
// Neighborhood scores
const walkScore = document.querySelector('[class*="walkScore"] .score, [id*="walk-score"]')?.textContent?.trim();
const transitScore = document.querySelector('[class*="transitScore"] .score')?.textContent?.trim();
const bikeScore = document.querySelector('[class*="bikeScore"] .score')?.textContent?.trim();
// Pet policy
const petSection = document.querySelector('[class*="petPolicy"], [data-testid="pet-policy"]');
const petPolicy = petSection?.textContent?.trim();
// Office hours / contact
const phone = getText('[class*="phoneNumber"], [data-testid="phone"]');
// Parking
const parking = getAll('[class*="parking"] li, [data-testid*="parking"]');
return {
name: getText('h1, [class*="propertyName"]'),
address: getText('[class*="propertyAddress"], [itemprop="address"]'),
price_range: getText('[class*="priceRange"], [class*="rentRange"]'),
floor_plans: floorPlans,
amenities: amenities,
neighborhood: {
walk_score: walkScore,
transit_score: transitScore,
bike_score: bikeScore,
},
pet_policy: petPolicy,
phone: phone,
parking: parking,
};
}
""")
finally:
await browser.close()
return details
async def scrape_listings_batch(urls, proxy_config=None, concurrency=3):
"""Scrape multiple listing detail pages concurrently."""
semaphore = asyncio.Semaphore(concurrency)
async def scrape_one(url):
async with semaphore:
result = await scrape_listing_detail(url, proxy_config)
await asyncio.sleep(random.uniform(2, 4))
return result
tasks = [scrape_one(url) for url in urls]
return await asyncio.gather(*tasks, return_exceptions=True)
DOM Fallback Parser
When JSON-LD is absent or incomplete:
from selectolax.parser import HTMLParser
import re
def parse_listings_dom(html):
"""Parse listings from DOM when JSON-LD is insufficient."""
tree = HTMLParser(html)
listings = []
# Multiple selector strategies for resilience
card_selectors = [
'[data-listingid]',
'[class*="placard"]',
'.placardContainer article',
'[data-id]',
]
cards = []
for selector in card_selectors:
cards = tree.css(selector)
if cards:
break
for card in cards:
listing_id = (
card.attributes.get("data-listingid")
or card.attributes.get("data-id")
or ""
)
# Name — try multiple selectors
name = None
for sel in ['[class*="title"]', '[class*="propertyName"]', "h3", "h2"]:
el = card.css_first(sel)
if el:
name = el.text(strip=True)
break
# Price — look for $ patterns
price_range = None
for sel in ['[class*="price"]', '[class*="rent"]', '[class*="pricing"]']:
el = card.css_first(sel)
if el:
text = el.text(strip=True)
if "$" in text:
price_range = text
break
# Beds
beds = None
for sel in ['[class*="bed"]', '[class*="unit"]']:
el = card.css_first(sel)
if el:
beds = el.text(strip=True)
break
# Address
address = None
for sel in ['[class*="address"]', 'address', '[itemprop="streetAddress"]']:
el = card.css_first(sel)
if el:
address = el.text(strip=True)
break
# Rating/reviews
rating = None
for sel in ['[class*="rating"]', '[aria-label*="rating"]']:
el = card.css_first(sel)
if el:
aria = el.attributes.get("aria-label", "")
m = re.search(r'([\d.]+) out of', aria)
if m:
rating = float(m.group(1))
break
if name or listing_id:
listings.append({
"id": listing_id,
"name": name,
"price_range": price_range,
"beds": beds,
"address": address,
"rating": rating,
})
return listings
def parse_price_range(price_str):
"""Extract min/max from price strings like '$1,200 - $2,400/mo'."""
if not price_str:
return {"min": None, "max": None}
nums = re.findall(r"\$([\d,]+)", price_str)
cleaned = [int(n.replace(",", "")) for n in nums]
if len(cleaned) >= 2:
return {"min": min(cleaned), "max": max(cleaned)}
elif len(cleaned) == 1:
return {"min": cleaned[0], "max": cleaned[0]}
return {"min": None, "max": None}
ThorData Proxy Integration
ThorData residential proxies are essential for Apartments.com. Datadome specifically blocks datacenter IP ranges. US residential IPs from ThorData pass Datadome's bot scoring and carry the geographic consistency that CoStar expects (an IP from the same metro area as the listings being searched is ideal).
THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "proxy.thordata.com"
THORDATA_PORT = 9000
def get_proxy_url(city_state=None, session_id=None):
"""
Build ThorData proxy URL with optional city targeting.
city_state: e.g. "new-york-ny" for city-matched IP
session_id: for sticky sessions across multiple requests
"""
user_parts = [THORDATA_USER]
if session_id:
user_parts.append(f"session-{session_id}")
# Country targeting (always US for Apartments.com)
user_parts.append("country-us")
user = "-".join(user_parts)
return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"
def get_playwright_proxy(session_id=None):
"""Get proxy config dict for Playwright."""
user_parts = [THORDATA_USER, "country-us"]
if session_id:
user_parts = [THORDATA_USER, f"session-{session_id}", "country-us"]
user = "-".join(user_parts)
return {
"server": f"http://{THORDATA_HOST}:{THORDATA_PORT}",
"username": user,
"password": THORDATA_PASS,
}
# Example: scrape Seattle rentals with rotating residential IPs
session_id = random.randint(10000, 99999)
proxy_url = get_proxy_url(session_id=session_id)
listings = scrape_city_listings("seattle", "wa", proxy_url=proxy_url, max_pages=10)
Price Trend Database
Track rental prices over time for market analysis:
import sqlite3
import json
from datetime import datetime
def init_db(db_path="rental_tracker.db"):
conn = sqlite3.connect(db_path)
conn.executescript("""
CREATE TABLE IF NOT EXISTS properties (
id TEXT PRIMARY KEY,
name TEXT,
address_street TEXT,
address_city TEXT,
address_state TEXT,
address_zip TEXT,
latitude REAL,
longitude REAL,
property_type TEXT,
amenities TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS price_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
property_id TEXT,
name TEXT,
city TEXT,
state TEXT,
min_price REAL,
max_price REAL,
unit_types TEXT,
beds_range TEXT,
walk_score INTEGER,
transit_score INTEGER,
in_stock INTEGER DEFAULT 1,
captured_at TEXT DEFAULT (datetime('now')),
FOREIGN KEY (property_id) REFERENCES properties(id)
);
CREATE TABLE IF NOT EXISTS floor_plans (
id INTEGER PRIMARY KEY AUTOINCREMENT,
property_id TEXT,
plan_name TEXT,
beds TEXT,
baths TEXT,
sqft_min INTEGER,
sqft_max INTEGER,
price_min REAL,
price_max REAL,
available_units INTEGER,
captured_at TEXT DEFAULT (datetime('now')),
FOREIGN KEY (property_id) REFERENCES properties(id)
);
CREATE INDEX IF NOT EXISTS idx_city_state ON price_snapshots(city, state);
CREATE INDEX IF NOT EXISTS idx_captured ON price_snapshots(captured_at);
CREATE INDEX IF NOT EXISTS idx_price ON price_snapshots(min_price, max_price);
""")
conn.commit()
return conn
def save_listing(conn, listing, city, state):
"""Save a listing snapshot to the database."""
prop_id = listing.get("id") or listing.get("name", "")[:50]
# Upsert property
addr = listing.get("address", {})
conn.execute("""
INSERT OR REPLACE INTO properties
(id, name, address_street, address_city, address_state, address_zip,
latitude, longitude, amenities)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
""", (
prop_id,
listing.get("name"),
addr.get("street") if isinstance(addr, dict) else addr,
addr.get("city", city) if isinstance(addr, dict) else city,
addr.get("state", state) if isinstance(addr, dict) else state,
addr.get("zip") if isinstance(addr, dict) else None,
listing.get("latitude"),
listing.get("longitude"),
json.dumps(listing.get("amenities", [])),
))
# Price snapshot
price = parse_price_range(listing.get("price_range"))
conn.execute("""
INSERT INTO price_snapshots
(property_id, name, city, state, min_price, max_price,
walk_score, transit_score)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
prop_id, listing.get("name"), city, state,
price["min"], price["max"],
listing.get("neighborhood", {}).get("walk_score"),
listing.get("neighborhood", {}).get("transit_score"),
))
# Floor plans if available
for plan in listing.get("floor_plans", []):
price_plan = parse_price_range(plan.get("price", ""))
sqft_match = re.search(r"([\d,]+)", plan.get("sqft", "") or "")
sqft = int(sqft_match.group(1).replace(",", "")) if sqft_match else None
conn.execute("""
INSERT INTO floor_plans
(property_id, plan_name, beds, baths, sqft_min, price_min, price_max)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (
prop_id, plan.get("name"), plan.get("beds"),
plan.get("baths"), sqft,
price_plan["min"], price_plan["max"],
))
conn.commit()
def get_price_trends(conn, city, state, weeks_back=12):
"""Query price trends over time for a market."""
cursor = conn.execute("""
SELECT
strftime('%Y-W%W', captured_at) as week,
COUNT(*) as listings,
AVG(min_price) as avg_min_price,
AVG(max_price) as avg_max_price,
MIN(min_price) as absolute_min,
MAX(max_price) as absolute_max
FROM price_snapshots
WHERE city = ? AND state = ?
AND captured_at > datetime('now', '-' || ? || ' weeks')
GROUP BY week
ORDER BY week
""", (city, state, weeks_back))
return [
{
"week": row[0], "listings": row[1],
"avg_min": row[2], "avg_max": row[3],
"absolute_min": row[4], "absolute_max": row[5],
}
for row in cursor.fetchall()
]
def find_price_drops(conn, city, state, min_drop_pct=10):
"""Find listings with significant price drops since last snapshot."""
cursor = conn.execute("""
SELECT a.name, b.min_price as old_price, a.min_price as new_price,
ROUND((b.min_price - a.min_price) * 100.0 / b.min_price, 1) as drop_pct
FROM price_snapshots a
JOIN price_snapshots b ON a.property_id = b.property_id
WHERE a.city = ? AND a.state = ?
AND a.captured_at > datetime('now', '-1 day')
AND b.captured_at < a.captured_at
AND b.captured_at > datetime('now', '-8 days')
AND (b.min_price - a.min_price) * 100.0 / b.min_price >= ?
GROUP BY a.property_id
ORDER BY drop_pct DESC
""", (city, state, min_drop_pct))
return cursor.fetchall()
Map-Based Loading
For complete metro area coverage, simulate map viewport panning:
from playwright.sync_api import sync_playwright
import json
# Define grid of coordinates covering a metro area
def generate_metro_grid(center_lat, center_lng, radius_km=15, grid_steps=5):
"""Generate a grid of lat/lng points covering a metro area."""
# Approximate degrees per km
lat_per_km = 1 / 110.574
lng_per_km = 1 / (111.320 * abs(center_lat) * 3.14159 / 180)
points = []
step = (radius_km * 2) / grid_steps
for i in range(grid_steps):
for j in range(grid_steps):
lat = center_lat - radius_km * lat_per_km + i * step * lat_per_km
lng = center_lng - radius_km * lng_per_km + j * step * lng_per_km
points.append((round(lat, 6), round(lng, 6)))
return points
def scrape_by_map_viewport(center_lat, center_lng, proxy_config=None):
"""Scrape by panning a Playwright browser over a map grid."""
grid = generate_metro_grid(center_lat, center_lng)
all_listings = []
seen_ids = set()
with sync_playwright() as p:
browser = p.chromium.launch(headless=True, proxy=proxy_config)
page = browser.new_page(viewport={"width": 1920, "height": 1080})
# Initial load
page.goto("https://www.apartments.com/", wait_until="domcontentloaded")
time.sleep(2)
for lat, lng in grid:
# Navigate to coordinates via URL
url = f"https://www.apartments.com/apartments/?bb={lat+0.1},{lng-0.1}_{lat-0.1},{lng+0.1}"
page.goto(url, wait_until="networkidle", timeout=30000)
time.sleep(random.uniform(2, 4))
html = page.content()
json_ld = extract_json_ld(html)
listings = parse_apartment_json_ld(json_ld)
new_count = 0
for listing in listings:
lid = listing.get("name", "") + str(listing.get("latitude", ""))
if lid not in seen_ids:
seen_ids.add(lid)
all_listings.append(listing)
new_count += 1
print(f"({lat}, {lng}): {new_count} new listings (total: {len(all_listings)})")
browser.close()
return all_listings
# Cover Seattle metro area (47.6062, -122.3321)
listings = scrape_by_map_viewport(47.6062, -122.3321)
Real-World Use Cases
1. Rental Affordability Dashboard
Track affordability metrics across neighborhoods:
def affordability_report(conn, city, state, median_income=70000):
"""Calculate rent-to-income ratios by neighborhood."""
cursor = conn.execute("""
SELECT name, address_street, min_price, max_price,
walk_score, transit_score
FROM price_snapshots ps
JOIN properties p ON ps.property_id = p.id
WHERE ps.city = ? AND ps.state = ?
AND ps.min_price IS NOT NULL
AND ps.captured_at > datetime('now', '-7 days')
ORDER BY ps.min_price ASC
""", (city, state))
monthly_take_home = median_income * 0.67 / 12 # ~33% tax estimate
affordable = []
for row in cursor.fetchall():
name, addr, min_price, max_price, walk_score, transit_score = row
rent_to_income = min_price / monthly_take_home if min_price else None
affordable.append({
"name": name,
"min_rent": min_price,
"rent_pct_income": round(rent_to_income * 100, 1) if rent_to_income else None,
"walk_score": walk_score,
"transit_score": transit_score,
})
return affordable
2. Investment Property Finder
def find_value_properties(conn, city, state, max_rent_per_sqft=2.5):
"""Find listings with below-market rent per square foot."""
cursor = conn.execute("""
SELECT p.name, p.address_street, ps.min_price,
fp.sqft_min, fp.beds,
ROUND(CAST(ps.min_price AS REAL) / NULLIF(fp.sqft_min, 0), 2) as rent_per_sqft
FROM price_snapshots ps
JOIN properties p ON ps.property_id = p.id
JOIN floor_plans fp ON fp.property_id = p.id
WHERE ps.city = ? AND ps.state = ?
AND fp.sqft_min > 500
AND ps.min_price IS NOT NULL
AND CAST(ps.min_price AS REAL) / fp.sqft_min < ?
AND ps.captured_at > datetime('now', '-7 days')
ORDER BY rent_per_sqft ASC
LIMIT 20
""", (city, state, max_rent_per_sqft))
return cursor.fetchall()
3. Price Alert System
def check_price_alerts(conn, city, state, target_max_rent=2000):
"""Find newly listed properties under a price threshold."""
cursor = conn.execute("""
SELECT name, min_price, max_price, captured_at
FROM price_snapshots
WHERE city = ? AND state = ?
AND min_price <= ?
AND min_price IS NOT NULL
AND captured_at > datetime('now', '-24 hours')
ORDER BY min_price ASC
""", (city, state, target_max_rent))
new_listings = cursor.fetchall()
if new_listings:
print(f"\n{len(new_listings)} new listings under ${target_max_rent}/mo in {city}:")
for name, min_p, max_p, ts in new_listings:
print(f" {name}: ${min_p} - ${max_p} (found {ts})")
return new_listings
Complete Scraping Pipeline
def run_market_scrape(
markets,
db_path="rental_tracker.db",
max_pages=15,
):
"""
Full pipeline: scrape multiple markets, save to DB.
markets: list of (city, state) tuples
"""
conn = init_db(db_path)
total_saved = 0
for city, state in markets:
print(f"\n=== Market: {city}, {state} ===")
session_id = random.randint(10000, 99999)
proxy_url = get_proxy_url(session_id=session_id)
listings = scrape_city_listings(city, state, proxy_url, max_pages)
for listing in listings:
try:
save_listing(conn, listing, city, state)
total_saved += 1
except Exception as e:
print(f" Error saving {listing.get('name')}: {e}")
print(f" Saved {len(listings)} listings for {city}")
time.sleep(random.uniform(5, 10))
print(f"\nTotal saved: {total_saved} listings")
return total_saved
if __name__ == "__main__":
markets = [
("seattle", "wa"),
("portland", "or"),
("denver", "co"),
("austin", "tx"),
]
run_market_scrape(markets, max_pages=20)
Practical Tips
Scrape search pages first, detail pages second. Search result pages give you name, price range, and address for 25 listings per page with one request. Only hit individual listing pages when you need floor plans, amenities, and walk scores.
JSON-LD is gold. Apartments.com embeds rich structured data that remains stable across React deployments. Parse <script type="application/ld+json"> before touching the DOM.
Deduplicate by property name + coordinates. The same property appears in overlapping search results. Use (name, lat, lng) as a composite unique key.
Datadome freshness matters. The session cookies from Playwright are valid for 30–60 minutes. Refresh them proactively rather than waiting for a 403.
Run at off-peak hours. 2–5 AM local time has lighter traffic and less aggressive rate limiting.
ThorData residential proxies pass Datadome reliably for US real estate sites. City-level geo-targeting matches your IP location to the market you're scraping, which also affects which listings CoStar serves you.
Summary
Apartments.com rental data is accessible with the right technical approach. The main obstacles are Datadome bot protection and React-generated class names. Solutions:
- Session cookie method — Playwright handles Datadome once, httpx handles subsequent requests
- JSON-LD parsing — stable, deployment-proof structured data
- DOM fallbacks —
data-listingid, semantic HTML, and text patterns when JSON-LD is sparse - Residential proxies — ThorData for Datadome bypass
- SQLite with time-series snapshots — enables price trend analysis and drop detection
With weekly scrapes across target markets, you can build a rental price trend database, affordability dashboard, or deal-finder tool within a few weeks of data collection.