← Back to blog

Scrape Apartments.com & Rent.com: Rental Listings, Prices & Neighborhood Data (2026)

Scrape Apartments.com & Rent.com: Rental Listings, Prices & Neighborhood Data (2026)

Apartments.com and Rent.com (both owned by CoStar Group) list millions of rental properties across the US. The data they aggregate — rent prices, amenities, neighborhood scores, availability, floor plans — is invaluable for market analysis, building rental comparison tools, tracking price trends in specific markets, or building investment decision support tools.

CoStar spends heavily protecting this data. But their listings are publicly accessible, and with the right technical approach, you can extract clean, structured rental data at scale.


Available Data Points

Each listing on Apartments.com includes:

Property-Level Data

Pricing Data

Unit Details

Amenities

Scores and Context


Anti-Bot Protections

CoStar protects their listings aggressively:

Datadome

Apartments.com uses Datadome, one of the more sophisticated bot detection systems. It: - Runs a JavaScript challenge on first visit - Builds a behavioral fingerprint (mouse movements, keystrokes, scroll patterns) - Maintains a device graph across sessions - Specifically targets datacenter IP ASNs

Datadome is why simple requests or plain httpx scraping fails immediately. Residential proxies are not optional — they're baseline.

Map-Based Results

Listings load based on your map viewport. The URL can encode lat, lng, and zoom but the primary mechanism is viewport-based: when you pan the map, new listings load. Traditional pagination (?page=2) works via URL for simple city searches, but comprehensive coverage of a metro area requires viewport manipulation.

Dynamic Element IDs

React-generated class names change on every deployment. Selectors based on class names like styles__PropertyCard__3x9Qz break constantly. Use data-* attributes, semantic HTML, and JSON-LD structured data instead.

Rate Limiting

CoStar's internal API throttles at roughly 80–100 requests per IP per hour. Proxy rotation is essential for any serious data collection.


Setup

pip install httpx selectolax playwright lxml
playwright install chromium

Approach 1: JSON-LD Structured Data

The most reliable data source — Apartments.com embeds rich schema.org structured data in every page. This doesn't change with React deployments:

import httpx
import json
from selectolax.parser import HTMLParser
import time
import random

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
}

def extract_json_ld(html):
    """Extract all JSON-LD structured data from a page."""
    tree = HTMLParser(html)
    results = []

    for script in tree.css('script[type="application/ld+json"]'):
        try:
            data = json.loads(script.text())
            if isinstance(data, list):
                results.extend(data)
            else:
                results.append(data)
        except json.JSONDecodeError:
            continue

    return results

def parse_apartment_json_ld(json_ld_items):
    """Extract apartment data from JSON-LD structured data."""
    listings = []

    for item in json_ld_items:
        schema_type = item.get("@type", "")

        if schema_type in ("ApartmentComplex", "Apartment", "LodgingBusiness"):
            listing = {
                "name": item.get("name"),
                "description": item.get("description"),
                "url": item.get("url"),
                "image": item.get("image"),
                "telephone": item.get("telephone"),
                "latitude": None,
                "longitude": None,
                "address": None,
                "amenities": [],
                "price_range": None,
            }

            # Geo coordinates
            geo = item.get("geo", {})
            listing["latitude"] = geo.get("latitude")
            listing["longitude"] = geo.get("longitude")

            # Address
            addr = item.get("address", {})
            if isinstance(addr, dict):
                listing["address"] = {
                    "street": addr.get("streetAddress"),
                    "city": addr.get("addressLocality"),
                    "state": addr.get("addressRegion"),
                    "zip": addr.get("postalCode"),
                }
            elif isinstance(addr, str):
                listing["address"] = {"full": addr}

            # Amenities
            for amenity in item.get("amenityFeature", []):
                if isinstance(amenity, dict):
                    listing["amenities"].append(amenity.get("name", ""))
                elif isinstance(amenity, str):
                    listing["amenities"].append(amenity)

            # Price range
            listing["price_range"] = item.get("priceRange")

            listings.append(listing)

    return listings

def fetch_page(url, proxy_url=None, cookies=None):
    """Fetch a page with optional proxy and cookies."""
    client_kwargs = {
        "headers": HEADERS,
        "follow_redirects": True,
        "timeout": 30,
    }
    if proxy_url:
        client_kwargs["proxies"] = {"all://": proxy_url}
    if cookies:
        client_kwargs["cookies"] = cookies

    try:
        with httpx.Client(**client_kwargs) as client:
            resp = client.get(url)
            return resp if resp.status_code == 200 else None
    except Exception as e:
        print(f"Fetch error: {e}")
        return None

Use Playwright to get valid Datadome session cookies, then use httpx for subsequent requests:

from playwright.sync_api import sync_playwright
import time

def get_datadome_cookies(city, state, proxy_config=None):
    """
    Use Playwright to pass Datadome JS challenge and collect session cookies.
    Returns cookies dict for use with httpx.
    """
    with sync_playwright() as p:
        launch_kwargs = {
            "headless": True,
            "args": [
                "--disable-blink-features=AutomationControlled",
                "--disable-dev-shm-usage",
                "--no-sandbox",
                "--disable-web-security",
            ],
        }
        if proxy_config:
            launch_kwargs["proxy"] = proxy_config

        browser = p.chromium.launch(**launch_kwargs)
        context = browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent=HEADERS["User-Agent"],
            locale="en-US",
        )

        page = context.new_page()

        # Visit a landing page first, not the target directly
        page.goto("https://www.apartments.com/", wait_until="domcontentloaded")
        time.sleep(random.uniform(2, 4))

        # Then navigate to target city
        page.goto(
            f"https://www.apartments.com/{city}-{state}/",
            wait_until="networkidle",
            timeout=30000,
        )
        time.sleep(random.uniform(2, 4))

        # Simulate human behavior
        page.mouse.move(
            random.randint(200, 800),
            random.randint(200, 600),
        )
        page.mouse.wheel(0, random.randint(200, 500))
        time.sleep(random.uniform(1, 2))

        cookies = context.cookies()
        browser.close()

        return {c["name"]: c["value"] for c in cookies}

def scrape_city_listings(city, state, proxy_url=None, max_pages=15):
    """
    Scrape all listings for a city using session cookies from Playwright.
    """
    # Build proxy config for Playwright
    proxy_config = None
    if proxy_url:
        # Parse proxy URL: http://user:pass@host:port
        import re
        m = re.match(r'http://([^:]+):([^@]+)@([^:]+):(\d+)', proxy_url)
        if m:
            proxy_config = {
                "server": f"http://{m.group(3)}:{m.group(4)}",
                "username": m.group(1),
                "password": m.group(2),
            }

    print(f"Getting Datadome cookies for {city}-{state}...")
    cookies = get_datadome_cookies(city, state, proxy_config)
    print(f"Got {len(cookies)} cookies")

    all_listings = []

    for page_num in range(1, max_pages + 1):
        url = f"https://www.apartments.com/{city}-{state}/{page_num}/"
        resp = fetch_page(url, proxy_url, cookies)

        if not resp:
            print(f"Page {page_num}: failed to fetch")
            break

        # Check for Datadome block
        if "datadome" in resp.text.lower() and "captcha" in resp.text.lower():
            print(f"Page {page_num}: Datadome challenge — refreshing cookies")
            cookies = get_datadome_cookies(city, state, proxy_config)
            continue

        # Parse JSON-LD first (most reliable)
        json_ld = extract_json_ld(resp.text)
        listings = parse_apartment_json_ld(json_ld)

        # Fall back to DOM parsing if JSON-LD is empty
        if not listings:
            listings = parse_listings_dom(resp.text)

        if not listings:
            print(f"Page {page_num}: no listings, done.")
            break

        all_listings.extend(listings)
        print(f"Page {page_num}: {len(listings)} listings (total: {len(all_listings)})")
        time.sleep(random.uniform(2.5, 5.0))

    return all_listings

Approach 3: Full Playwright Scraping

For detail pages with lazy-loaded content:

import asyncio
from playwright.async_api import async_playwright

async def scrape_listing_detail(url, proxy_config=None):
    """Extract full details from a single listing page."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy=proxy_config,
            args=["--disable-blink-features=AutomationControlled"],
        )
        context = await browser.new_context(
            user_agent=HEADERS["User-Agent"],
            viewport={"width": 1440, "height": 900},
        )
        page = await context.new_page()

        try:
            await page.goto(url, wait_until="networkidle", timeout=30000)
            await page.wait_for_timeout(2000)

            # Scroll to trigger lazy loading
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight / 2)")
            await page.wait_for_timeout(1000)
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(1500)

            details = await page.evaluate("""
                () => {
                    const getText = (sel, root = document) => root.querySelector(sel)?.textContent?.trim();
                    const getAll = (sel, root = document) => Array.from(
                        root.querySelectorAll(sel)
                    ).map(e => e.textContent.trim()).filter(Boolean);

                    // Floor plans
                    const floorPlans = [];
                    document.querySelectorAll('[class*="floorPlan"], [data-testid*="floor-plan"], .pricingGridItem').forEach(fp => {
                        floorPlans.push({
                            name: fp.querySelector('[class*="name"], .modelName')?.textContent?.trim(),
                            beds: fp.querySelector('[class*="bed"], .detailsTextWrapper')?.textContent?.trim(),
                            baths: fp.querySelector('[class*="bath"]')?.textContent?.trim(),
                            sqft: fp.querySelector('[class*="sqft"], [class*="squareFeet"]')?.textContent?.trim(),
                            price: fp.querySelector('[class*="price"], .rentLabel')?.textContent?.trim(),
                            available: fp.querySelector('[class*="available"], [class*="availability"]')?.textContent?.trim(),
                        });
                    });

                    // Amenities
                    const amenities = getAll('.amenityItems li, [class*="amenity"] li, .featureItem');

                    // Neighborhood scores
                    const walkScore = document.querySelector('[class*="walkScore"] .score, [id*="walk-score"]')?.textContent?.trim();
                    const transitScore = document.querySelector('[class*="transitScore"] .score')?.textContent?.trim();
                    const bikeScore = document.querySelector('[class*="bikeScore"] .score')?.textContent?.trim();

                    // Pet policy
                    const petSection = document.querySelector('[class*="petPolicy"], [data-testid="pet-policy"]');
                    const petPolicy = petSection?.textContent?.trim();

                    // Office hours / contact
                    const phone = getText('[class*="phoneNumber"], [data-testid="phone"]');

                    // Parking
                    const parking = getAll('[class*="parking"] li, [data-testid*="parking"]');

                    return {
                        name: getText('h1, [class*="propertyName"]'),
                        address: getText('[class*="propertyAddress"], [itemprop="address"]'),
                        price_range: getText('[class*="priceRange"], [class*="rentRange"]'),
                        floor_plans: floorPlans,
                        amenities: amenities,
                        neighborhood: {
                            walk_score: walkScore,
                            transit_score: transitScore,
                            bike_score: bikeScore,
                        },
                        pet_policy: petPolicy,
                        phone: phone,
                        parking: parking,
                    };
                }
            """)

        finally:
            await browser.close()

    return details

async def scrape_listings_batch(urls, proxy_config=None, concurrency=3):
    """Scrape multiple listing detail pages concurrently."""
    semaphore = asyncio.Semaphore(concurrency)

    async def scrape_one(url):
        async with semaphore:
            result = await scrape_listing_detail(url, proxy_config)
            await asyncio.sleep(random.uniform(2, 4))
            return result

    tasks = [scrape_one(url) for url in urls]
    return await asyncio.gather(*tasks, return_exceptions=True)

DOM Fallback Parser

When JSON-LD is absent or incomplete:

from selectolax.parser import HTMLParser
import re

def parse_listings_dom(html):
    """Parse listings from DOM when JSON-LD is insufficient."""
    tree = HTMLParser(html)
    listings = []

    # Multiple selector strategies for resilience
    card_selectors = [
        '[data-listingid]',
        '[class*="placard"]',
        '.placardContainer article',
        '[data-id]',
    ]

    cards = []
    for selector in card_selectors:
        cards = tree.css(selector)
        if cards:
            break

    for card in cards:
        listing_id = (
            card.attributes.get("data-listingid")
            or card.attributes.get("data-id")
            or ""
        )

        # Name — try multiple selectors
        name = None
        for sel in ['[class*="title"]', '[class*="propertyName"]', "h3", "h2"]:
            el = card.css_first(sel)
            if el:
                name = el.text(strip=True)
                break

        # Price — look for $ patterns
        price_range = None
        for sel in ['[class*="price"]', '[class*="rent"]', '[class*="pricing"]']:
            el = card.css_first(sel)
            if el:
                text = el.text(strip=True)
                if "$" in text:
                    price_range = text
                    break

        # Beds
        beds = None
        for sel in ['[class*="bed"]', '[class*="unit"]']:
            el = card.css_first(sel)
            if el:
                beds = el.text(strip=True)
                break

        # Address
        address = None
        for sel in ['[class*="address"]', 'address', '[itemprop="streetAddress"]']:
            el = card.css_first(sel)
            if el:
                address = el.text(strip=True)
                break

        # Rating/reviews
        rating = None
        for sel in ['[class*="rating"]', '[aria-label*="rating"]']:
            el = card.css_first(sel)
            if el:
                aria = el.attributes.get("aria-label", "")
                m = re.search(r'([\d.]+) out of', aria)
                if m:
                    rating = float(m.group(1))
                    break

        if name or listing_id:
            listings.append({
                "id": listing_id,
                "name": name,
                "price_range": price_range,
                "beds": beds,
                "address": address,
                "rating": rating,
            })

    return listings

def parse_price_range(price_str):
    """Extract min/max from price strings like '$1,200 - $2,400/mo'."""
    if not price_str:
        return {"min": None, "max": None}

    nums = re.findall(r"\$([\d,]+)", price_str)
    cleaned = [int(n.replace(",", "")) for n in nums]

    if len(cleaned) >= 2:
        return {"min": min(cleaned), "max": max(cleaned)}
    elif len(cleaned) == 1:
        return {"min": cleaned[0], "max": cleaned[0]}

    return {"min": None, "max": None}

ThorData Proxy Integration

ThorData residential proxies are essential for Apartments.com. Datadome specifically blocks datacenter IP ranges. US residential IPs from ThorData pass Datadome's bot scoring and carry the geographic consistency that CoStar expects (an IP from the same metro area as the listings being searched is ideal).

THORDATA_USER = "your_username"
THORDATA_PASS = "your_password"
THORDATA_HOST = "proxy.thordata.com"
THORDATA_PORT = 9000

def get_proxy_url(city_state=None, session_id=None):
    """
    Build ThorData proxy URL with optional city targeting.
    city_state: e.g. "new-york-ny" for city-matched IP
    session_id: for sticky sessions across multiple requests
    """
    user_parts = [THORDATA_USER]

    if session_id:
        user_parts.append(f"session-{session_id}")

    # Country targeting (always US for Apartments.com)
    user_parts.append("country-us")

    user = "-".join(user_parts)
    return f"http://{user}:{THORDATA_PASS}@{THORDATA_HOST}:{THORDATA_PORT}"

def get_playwright_proxy(session_id=None):
    """Get proxy config dict for Playwright."""
    user_parts = [THORDATA_USER, "country-us"]
    if session_id:
        user_parts = [THORDATA_USER, f"session-{session_id}", "country-us"]
    user = "-".join(user_parts)
    return {
        "server": f"http://{THORDATA_HOST}:{THORDATA_PORT}",
        "username": user,
        "password": THORDATA_PASS,
    }

# Example: scrape Seattle rentals with rotating residential IPs
session_id = random.randint(10000, 99999)
proxy_url = get_proxy_url(session_id=session_id)
listings = scrape_city_listings("seattle", "wa", proxy_url=proxy_url, max_pages=10)

Price Trend Database

Track rental prices over time for market analysis:

import sqlite3
import json
from datetime import datetime

def init_db(db_path="rental_tracker.db"):
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS properties (
            id TEXT PRIMARY KEY,
            name TEXT,
            address_street TEXT,
            address_city TEXT,
            address_state TEXT,
            address_zip TEXT,
            latitude REAL,
            longitude REAL,
            property_type TEXT,
            amenities TEXT,
            created_at TEXT DEFAULT (datetime('now'))
        );

        CREATE TABLE IF NOT EXISTS price_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            property_id TEXT,
            name TEXT,
            city TEXT,
            state TEXT,
            min_price REAL,
            max_price REAL,
            unit_types TEXT,
            beds_range TEXT,
            walk_score INTEGER,
            transit_score INTEGER,
            in_stock INTEGER DEFAULT 1,
            captured_at TEXT DEFAULT (datetime('now')),
            FOREIGN KEY (property_id) REFERENCES properties(id)
        );

        CREATE TABLE IF NOT EXISTS floor_plans (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            property_id TEXT,
            plan_name TEXT,
            beds TEXT,
            baths TEXT,
            sqft_min INTEGER,
            sqft_max INTEGER,
            price_min REAL,
            price_max REAL,
            available_units INTEGER,
            captured_at TEXT DEFAULT (datetime('now')),
            FOREIGN KEY (property_id) REFERENCES properties(id)
        );

        CREATE INDEX IF NOT EXISTS idx_city_state ON price_snapshots(city, state);
        CREATE INDEX IF NOT EXISTS idx_captured ON price_snapshots(captured_at);
        CREATE INDEX IF NOT EXISTS idx_price ON price_snapshots(min_price, max_price);
    """)
    conn.commit()
    return conn

def save_listing(conn, listing, city, state):
    """Save a listing snapshot to the database."""
    prop_id = listing.get("id") or listing.get("name", "")[:50]

    # Upsert property
    addr = listing.get("address", {})
    conn.execute("""
        INSERT OR REPLACE INTO properties
        (id, name, address_street, address_city, address_state, address_zip,
         latitude, longitude, amenities)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        prop_id,
        listing.get("name"),
        addr.get("street") if isinstance(addr, dict) else addr,
        addr.get("city", city) if isinstance(addr, dict) else city,
        addr.get("state", state) if isinstance(addr, dict) else state,
        addr.get("zip") if isinstance(addr, dict) else None,
        listing.get("latitude"),
        listing.get("longitude"),
        json.dumps(listing.get("amenities", [])),
    ))

    # Price snapshot
    price = parse_price_range(listing.get("price_range"))
    conn.execute("""
        INSERT INTO price_snapshots
        (property_id, name, city, state, min_price, max_price,
         walk_score, transit_score)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?)
    """, (
        prop_id, listing.get("name"), city, state,
        price["min"], price["max"],
        listing.get("neighborhood", {}).get("walk_score"),
        listing.get("neighborhood", {}).get("transit_score"),
    ))

    # Floor plans if available
    for plan in listing.get("floor_plans", []):
        price_plan = parse_price_range(plan.get("price", ""))
        sqft_match = re.search(r"([\d,]+)", plan.get("sqft", "") or "")
        sqft = int(sqft_match.group(1).replace(",", "")) if sqft_match else None

        conn.execute("""
            INSERT INTO floor_plans
            (property_id, plan_name, beds, baths, sqft_min, price_min, price_max)
            VALUES (?, ?, ?, ?, ?, ?, ?)
        """, (
            prop_id, plan.get("name"), plan.get("beds"),
            plan.get("baths"), sqft,
            price_plan["min"], price_plan["max"],
        ))

    conn.commit()

def get_price_trends(conn, city, state, weeks_back=12):
    """Query price trends over time for a market."""
    cursor = conn.execute("""
        SELECT
            strftime('%Y-W%W', captured_at) as week,
            COUNT(*) as listings,
            AVG(min_price) as avg_min_price,
            AVG(max_price) as avg_max_price,
            MIN(min_price) as absolute_min,
            MAX(max_price) as absolute_max
        FROM price_snapshots
        WHERE city = ? AND state = ?
          AND captured_at > datetime('now', '-' || ? || ' weeks')
        GROUP BY week
        ORDER BY week
    """, (city, state, weeks_back))

    return [
        {
            "week": row[0], "listings": row[1],
            "avg_min": row[2], "avg_max": row[3],
            "absolute_min": row[4], "absolute_max": row[5],
        }
        for row in cursor.fetchall()
    ]

def find_price_drops(conn, city, state, min_drop_pct=10):
    """Find listings with significant price drops since last snapshot."""
    cursor = conn.execute("""
        SELECT a.name, b.min_price as old_price, a.min_price as new_price,
               ROUND((b.min_price - a.min_price) * 100.0 / b.min_price, 1) as drop_pct
        FROM price_snapshots a
        JOIN price_snapshots b ON a.property_id = b.property_id
        WHERE a.city = ? AND a.state = ?
          AND a.captured_at > datetime('now', '-1 day')
          AND b.captured_at < a.captured_at
          AND b.captured_at > datetime('now', '-8 days')
          AND (b.min_price - a.min_price) * 100.0 / b.min_price >= ?
        GROUP BY a.property_id
        ORDER BY drop_pct DESC
    """, (city, state, min_drop_pct))

    return cursor.fetchall()

Map-Based Loading

For complete metro area coverage, simulate map viewport panning:

from playwright.sync_api import sync_playwright
import json

# Define grid of coordinates covering a metro area
def generate_metro_grid(center_lat, center_lng, radius_km=15, grid_steps=5):
    """Generate a grid of lat/lng points covering a metro area."""
    # Approximate degrees per km
    lat_per_km = 1 / 110.574
    lng_per_km = 1 / (111.320 * abs(center_lat) * 3.14159 / 180)

    points = []
    step = (radius_km * 2) / grid_steps

    for i in range(grid_steps):
        for j in range(grid_steps):
            lat = center_lat - radius_km * lat_per_km + i * step * lat_per_km
            lng = center_lng - radius_km * lng_per_km + j * step * lng_per_km
            points.append((round(lat, 6), round(lng, 6)))

    return points

def scrape_by_map_viewport(center_lat, center_lng, proxy_config=None):
    """Scrape by panning a Playwright browser over a map grid."""
    grid = generate_metro_grid(center_lat, center_lng)
    all_listings = []
    seen_ids = set()

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True, proxy=proxy_config)
        page = browser.new_page(viewport={"width": 1920, "height": 1080})

        # Initial load
        page.goto("https://www.apartments.com/", wait_until="domcontentloaded")
        time.sleep(2)

        for lat, lng in grid:
            # Navigate to coordinates via URL
            url = f"https://www.apartments.com/apartments/?bb={lat+0.1},{lng-0.1}_{lat-0.1},{lng+0.1}"
            page.goto(url, wait_until="networkidle", timeout=30000)
            time.sleep(random.uniform(2, 4))

            html = page.content()
            json_ld = extract_json_ld(html)
            listings = parse_apartment_json_ld(json_ld)

            new_count = 0
            for listing in listings:
                lid = listing.get("name", "") + str(listing.get("latitude", ""))
                if lid not in seen_ids:
                    seen_ids.add(lid)
                    all_listings.append(listing)
                    new_count += 1

            print(f"({lat}, {lng}): {new_count} new listings (total: {len(all_listings)})")

        browser.close()

    return all_listings

# Cover Seattle metro area (47.6062, -122.3321)
listings = scrape_by_map_viewport(47.6062, -122.3321)

Real-World Use Cases

1. Rental Affordability Dashboard

Track affordability metrics across neighborhoods:

def affordability_report(conn, city, state, median_income=70000):
    """Calculate rent-to-income ratios by neighborhood."""
    cursor = conn.execute("""
        SELECT name, address_street, min_price, max_price,
               walk_score, transit_score
        FROM price_snapshots ps
        JOIN properties p ON ps.property_id = p.id
        WHERE ps.city = ? AND ps.state = ?
          AND ps.min_price IS NOT NULL
          AND ps.captured_at > datetime('now', '-7 days')
        ORDER BY ps.min_price ASC
    """, (city, state))

    monthly_take_home = median_income * 0.67 / 12  # ~33% tax estimate

    affordable = []
    for row in cursor.fetchall():
        name, addr, min_price, max_price, walk_score, transit_score = row
        rent_to_income = min_price / monthly_take_home if min_price else None

        affordable.append({
            "name": name,
            "min_rent": min_price,
            "rent_pct_income": round(rent_to_income * 100, 1) if rent_to_income else None,
            "walk_score": walk_score,
            "transit_score": transit_score,
        })

    return affordable

2. Investment Property Finder

def find_value_properties(conn, city, state, max_rent_per_sqft=2.5):
    """Find listings with below-market rent per square foot."""
    cursor = conn.execute("""
        SELECT p.name, p.address_street, ps.min_price,
               fp.sqft_min, fp.beds,
               ROUND(CAST(ps.min_price AS REAL) / NULLIF(fp.sqft_min, 0), 2) as rent_per_sqft
        FROM price_snapshots ps
        JOIN properties p ON ps.property_id = p.id
        JOIN floor_plans fp ON fp.property_id = p.id
        WHERE ps.city = ? AND ps.state = ?
          AND fp.sqft_min > 500
          AND ps.min_price IS NOT NULL
          AND CAST(ps.min_price AS REAL) / fp.sqft_min < ?
          AND ps.captured_at > datetime('now', '-7 days')
        ORDER BY rent_per_sqft ASC
        LIMIT 20
    """, (city, state, max_rent_per_sqft))

    return cursor.fetchall()

3. Price Alert System

def check_price_alerts(conn, city, state, target_max_rent=2000):
    """Find newly listed properties under a price threshold."""
    cursor = conn.execute("""
        SELECT name, min_price, max_price, captured_at
        FROM price_snapshots
        WHERE city = ? AND state = ?
          AND min_price <= ?
          AND min_price IS NOT NULL
          AND captured_at > datetime('now', '-24 hours')
        ORDER BY min_price ASC
    """, (city, state, target_max_rent))

    new_listings = cursor.fetchall()
    if new_listings:
        print(f"\n{len(new_listings)} new listings under ${target_max_rent}/mo in {city}:")
        for name, min_p, max_p, ts in new_listings:
            print(f"  {name}: ${min_p} - ${max_p} (found {ts})")

    return new_listings

Complete Scraping Pipeline

def run_market_scrape(
    markets,
    db_path="rental_tracker.db",
    max_pages=15,
):
    """
    Full pipeline: scrape multiple markets, save to DB.
    markets: list of (city, state) tuples
    """
    conn = init_db(db_path)
    total_saved = 0

    for city, state in markets:
        print(f"\n=== Market: {city}, {state} ===")
        session_id = random.randint(10000, 99999)
        proxy_url = get_proxy_url(session_id=session_id)

        listings = scrape_city_listings(city, state, proxy_url, max_pages)

        for listing in listings:
            try:
                save_listing(conn, listing, city, state)
                total_saved += 1
            except Exception as e:
                print(f"  Error saving {listing.get('name')}: {e}")

        print(f"  Saved {len(listings)} listings for {city}")
        time.sleep(random.uniform(5, 10))

    print(f"\nTotal saved: {total_saved} listings")
    return total_saved

if __name__ == "__main__":
    markets = [
        ("seattle", "wa"),
        ("portland", "or"),
        ("denver", "co"),
        ("austin", "tx"),
    ]
    run_market_scrape(markets, max_pages=20)

Practical Tips

Scrape search pages first, detail pages second. Search result pages give you name, price range, and address for 25 listings per page with one request. Only hit individual listing pages when you need floor plans, amenities, and walk scores.

JSON-LD is gold. Apartments.com embeds rich structured data that remains stable across React deployments. Parse <script type="application/ld+json"> before touching the DOM.

Deduplicate by property name + coordinates. The same property appears in overlapping search results. Use (name, lat, lng) as a composite unique key.

Datadome freshness matters. The session cookies from Playwright are valid for 30–60 minutes. Refresh them proactively rather than waiting for a 403.

Run at off-peak hours. 2–5 AM local time has lighter traffic and less aggressive rate limiting.

ThorData residential proxies pass Datadome reliably for US real estate sites. City-level geo-targeting matches your IP location to the market you're scraping, which also affects which listings CoStar serves you.


Summary

Apartments.com rental data is accessible with the right technical approach. The main obstacles are Datadome bot protection and React-generated class names. Solutions:

  1. Session cookie method — Playwright handles Datadome once, httpx handles subsequent requests
  2. JSON-LD parsing — stable, deployment-proof structured data
  3. DOM fallbacksdata-listingid, semantic HTML, and text patterns when JSON-LD is sparse
  4. Residential proxiesThorData for Datadome bypass
  5. SQLite with time-series snapshots — enables price trend analysis and drop detection

With weekly scrapes across target markets, you can build a rental price trend database, affordability dashboard, or deal-finder tool within a few weeks of data collection.