Scraping LoopNet Commercial Real Estate Listings with Python (2026)

2026-04-09 ["loopnet" "web scraping" "python" "commercial real estate" "playwright"]

Scraping LoopNet Commercial Real Estate Listings with Python (2026)

LoopNet is the dominant commercial real estate listing platform in the US. It carries hundreds of thousands of active listings across office space, retail, industrial, multifamily, and land — each with asking prices, lease rates per square foot, cap rates, building details, and broker contact information. For anyone building CRE analytics, deal sourcing pipelines, or market comparison tools, it's the primary data source. The problem is that LoopNet sits behind Imperva's WAF and does not expose a public API, so getting that data requires careful browser automation.

This guide covers the full extraction pipeline: async Playwright with stealth configuration, parsing listing detail pages, routing through residential proxies, handling paginated search results, and storing the data for analysis.

What Data Is Available

A LoopNet listing page exposes:

Asking price — listed price for sale properties
Lease rate — price per square foot per year (NNN, modified gross, or full service)
Cap rate — for income-producing properties, the net operating income / price ratio
Property type — office, retail, industrial, flex, multifamily, land, specialty
Building size — total square footage and available square footage
Year built — construction year and last renovated date where available
Lot size — in acres or square feet
Zoning — commercial zoning classification
Broker name, company, and phone — the listing agent's contact record
Property address and coordinates — geocoded location
Photos — full-resolution listing images
Days on market — how long the listing has been active
Listing ID — stable identifier for deduplication
Property highlights — ceiling height, loading docks, HVAC, parking ratio

Anti-Bot Measures

LoopNet is protected by Imperva (formerly Incapsula), one of the more aggressive commercial WAF products. Several layers stack on top of each other.

Imperva JavaScript challenge. On first contact, Imperva serves a JavaScript fingerprinting page before delivering real content. It checks browser consistency — whether navigator.webdriver is present, whether canvas and WebGL respond like a real browser, whether timing signatures match human interaction. A plain httpx or requests call gets a 403 or an empty challenge page immediately.

TLS fingerprinting. Imperva compares the TLS ClientHello fingerprint (JA3 hash) against known browser fingerprints. Python's default TLS stack produces a JA3 hash that doesn't match any real browser. Playwright with Chromium resolves this since it uses the actual Chrome TLS stack.

Behavioral analysis. Imperva tracks mouse movement patterns, scroll velocity, and click timing. A session that jumps directly to listing pages without any organic browsing behavior gets flagged quickly. The mitigation is to introduce realistic delays and occasionally interact with page elements before extracting data.

CAPTCHA on suspicious traffic. Accounts or IP addresses that trigger anomaly scores above a threshold get served a CAPTCHA interstitial (typically hCaptcha). This happens more readily on search result pages with many requests than on individual listing detail pages.

Rate limiting. LoopNet enforces per-IP request limits on search endpoints. Hitting the same search URL repeatedly from one IP within a short window results in soft blocks — the page loads but returns empty result sets or redirects to the homepage.

Referer and header chain validation. Requests to listing detail URLs without a valid Referer pointing to a LoopNet search or property page are treated as direct API calls and blocked.

Dependencies and Setup

pip install playwright playwright-stealth asyncio
playwright install chromium

Base Playwright Setup with Stealth

import asyncio
import random
from playwright.async_api import async_playwright, Page, BrowserContext
from playwright_stealth import stealth_async

# ThorData residential proxy — required for Imperva bypass
PROXY_URL = "http://USER:[email protected]:9000"


async def create_context(playwright) -> tuple:
    """Create a stealth browser context with residential proxy."""
    browser = await playwright.chromium.launch(
        headless=True,
        args=[
            "--no-sandbox",
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
            "--disable-web-security",
        ],
    )

    context = await browser.new_context(
        proxy={"server": PROXY_URL},
        viewport={"width": 1440, "height": 900},
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/126.0.0.0 Safari/537.36"
        ),
        locale="en-US",
        timezone_id="America/New_York",
        extra_http_headers={
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Sec-Fetch-Mode": "navigate",
        },
    )

    return browser, context


async def new_stealth_page(context: BrowserContext) -> Page:
    """Create a new page with full stealth patches applied."""
    page = await context.new_page()
    await stealth_async(page)

    await page.add_init_script("""
        // Remove webdriver marker
        Object.defineProperty(navigator, 'webdriver', {
            get: () => undefined,
        });

        // Fake realistic plugins
        Object.defineProperty(navigator, 'plugins', {
            get: () => {
                return [
                    { name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
                    { name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
                    { name: 'Native Client', filename: 'internal-nacl-plugin' },
                ];
            },
        });

        // Fake screen dimensions consistent with viewport
        Object.defineProperty(screen, 'width', { get: () => 1440 });
        Object.defineProperty(screen, 'height', { get: () => 900 });

        // Fake chrome runtime
        window.chrome = { runtime: {}, loadTimes: () => {}, csi: () => {} };
    """)

    return page


async def human_delay(min_ms: int = 800, max_ms: int = 2400):
    """Wait a human-realistic random duration."""
    await asyncio.sleep(random.uniform(min_ms / 1000, max_ms / 1000))


async def warm_up_session(page: Page):
    """
    Navigate to LoopNet homepage and perform a realistic interaction
    before hitting listing or search pages. This builds a browsing history
    that Imperva's behavioral analysis expects.
    """
    await page.goto("https://www.loopnet.com", wait_until="domcontentloaded", timeout=30000)
    await human_delay(1500, 3000)

    # Hover over the navigation — simulates realistic browsing
    try:
        nav = await page.query_selector("nav, .navigation")
        if nav:
            await nav.hover()
            await human_delay(500, 1000)
    except Exception:
        pass

    # Scroll slightly
    await page.evaluate("window.scrollBy(0, 200)")
    await human_delay(800, 1500)

Extracting Listing Data

LoopNet listing detail pages use a mix of data-testid attributes and semantic class names. The structure has been consistent across 2025-2026 for the key fields.

async def extract_listing(page: Page, url: str) -> dict:
    """Extract all available data from a LoopNet listing detail page."""
    await page.goto(url, wait_until="domcontentloaded", timeout=30000)
    await human_delay(1200, 2800)

    # Wait for the property summary section to render
    try:
        await page.wait_for_selector(
            "[data-testid='property-summary-section'], .propertyDataSection",
            timeout=15000,
        )
    except Exception:
        # Check for CAPTCHA
        content = await page.content()
        if "hcaptcha" in content.lower() or "imperva" in content.lower():
            raise RuntimeError(f"CAPTCHA/block detected on {url}")

    data = {"url": url}

    # Listing ID from URL
    import re
    lid_match = re.search(r"/Listing/(\d+)/", url)
    if lid_match:
        data["listing_id"] = lid_match.group(1)

    # Extract JSON-LD structured data first (most reliable)
    ld_data = await page.evaluate("""() => {
        const scripts = document.querySelectorAll('script[type="application/ld+json"]');
        for (const s of scripts) {
            try {
                const d = JSON.parse(s.textContent);
                if (d['@type'] === 'RealEstateListing' || d.address) return d;
            } catch(e) {}
        }
        return null;
    }""")

    if ld_data:
        data["address"] = ld_data.get("address", {})
        data["name"] = ld_data.get("name", "")
        geo = ld_data.get("geo", {})
        if geo:
            data["lat"] = geo.get("latitude")
            data["lng"] = geo.get("longitude")

    # Asking price or lease rate
    for selector in ["[data-testid='listing-price']", ".listingPrice", ".propertyPrice"]:
        price_el = await page.query_selector(selector)
        if price_el:
            data["price"] = (await price_el.inner_text()).strip()
            break

    # Price per square foot (lease rate)
    for selector in ["[data-testid='price-per-sqft']", ".pricePerSqFt", "[class*='perSqFt']"]:
        rate_el = await page.query_selector(selector)
        if rate_el:
            data["lease_rate_per_sqft"] = (await rate_el.inner_text()).strip()
            break

    # Cap rate
    for selector in ["[data-testid='cap-rate-value']", ".capRate", "[class*='capRate']"]:
        cap_el = await page.query_selector(selector)
        if cap_el:
            data["cap_rate"] = (await cap_el.inner_text()).strip()
            break

    # Property type
    for selector in [
        ".propertyTypeSection span.property-type-label",
        "[data-testid='property-type']",
        "[class*='propertyType']",
    ]:
        type_el = await page.query_selector(selector)
        if type_el:
            data["property_type"] = (await type_el.inner_text()).strip()
            break

    # Building size, available space, year built, lot size, zoning
    field_map = {
        "building_size": ["[data-testid='building-size']", ".buildingSize", "[class*='buildingSize']"],
        "available_sqft": ["[data-testid='available-space']", ".availableSpace"],
        "year_built": ["[data-testid='year-built']", ".yearBuilt"],
        "lot_size": ["[data-testid='lot-size']", ".lotSize"],
        "zoning": ["[data-testid='zoning-value']", ".zoning"],
    }

    for field, selectors in field_map.items():
        for selector in selectors:
            el = await page.query_selector(selector)
            if el:
                data[field] = (await el.inner_text()).strip()
                break

    # Address
    for selector in [
        "[data-testid='property-address'] .address-line",
        ".propertyAddress",
        "[class*='property-address']",
    ]:
        addr_el = await page.query_selector(selector)
        if addr_el:
            data["address"] = (await addr_el.inner_text()).strip()
            break

    # Days on market
    for selector in [".listingDaysOnMarket span.value", "[data-testid='days-on-market']"]:
        dom_el = await page.query_selector(selector)
        if dom_el:
            data["days_on_market"] = (await dom_el.inner_text()).strip()
            break

    # Broker info
    broker_selectors = {
        "broker_name": ["[data-testid='broker-card'] .broker-name", ".brokerName", "[class*='brokerName']"],
        "broker_company": ["[data-testid='broker-card'] .broker-company-name", ".brokerCompany"],
        "broker_phone": ["[data-testid='broker-contact-phone'] a", ".brokerPhone a"],
    }

    for field, selectors in broker_selectors.items():
        for selector in selectors:
            el = await page.query_selector(selector)
            if el:
                if field == "broker_phone":
                    data[field] = await el.get_attribute("href")
                else:
                    data[field] = (await el.inner_text()).strip()
                break

    # Photos — collect first 5 image URLs
    photo_els = await page.query_selector_all(
        "[data-testid='listing-photo'] img, .propertyPhoto img, .listingGallery img"
    )
    data["photos"] = []
    for img in photo_els[:5]:
        src = await img.get_attribute("src") or await img.get_attribute("data-src")
        if src and src.startswith("http"):
            data["photos"].append(src)

    return data

Parsing Numeric Values

Clean up the raw strings extracted from the DOM:

import re

def parse_price(raw: str) -> float | None:
    """Extract numeric price from strings like '$1,250,000' or '$2.5M'."""
    if not raw:
        return None
    raw = raw.strip().upper()
    # Handle millions shorthand
    if raw.endswith("M"):
        try:
            return float(raw[:-1].replace(",", "").replace("$", "")) * 1_000_000
        except ValueError:
            return None
    # Handle regular format
    cleaned = re.sub(r"[^\d.]", "", raw.replace(",", ""))
    try:
        return float(cleaned)
    except ValueError:
        return None


def parse_cap_rate(raw: str) -> float | None:
    """Extract cap rate percentage from strings like '5.5%' or '5.5 Cap Rate'."""
    if not raw:
        return None
    match = re.search(r"([\d.]+)\s*%?", raw)
    try:
        return float(match.group(1)) if match else None
    except (ValueError, AttributeError):
        return None


def parse_sqft(raw: str) -> int | None:
    """Extract square footage from strings like '12,500 SF' or '12,500 Sq Ft'."""
    if not raw:
        return None
    cleaned = re.sub(r"[^\d]", "", raw.replace(",", ""))
    try:
        return int(cleaned)
    except ValueError:
        return None


def normalize_listing(listing: dict) -> dict:
    """Apply all parsers to produce clean numeric fields."""
    result = dict(listing)
    result["price_numeric"] = parse_price(listing.get("price", ""))
    result["cap_rate_numeric"] = parse_cap_rate(listing.get("cap_rate", ""))
    result["building_size_sqft"] = parse_sqft(listing.get("building_size", ""))
    result["available_sqft_numeric"] = parse_sqft(listing.get("available_sqft", ""))
    return result

Proxy Configuration with City Targeting

LoopNet flags datacenter IPs almost immediately — Imperva maintains blocklists of known datacenter CIDR ranges and auto-blocks them at the WAF level before any JavaScript challenge even runs. Residential proxies are required from the start, not just as a fallback.

ThorData's residential proxies support city-level targeting, which is useful for CRE work because LoopNet may serve different search result sets or pricing data based on the apparent geographic origin of your requests. Targeting a proxy exit node in the same metro as the market you're researching avoids location-based result filtering.

def thordata_proxy(city: str = None, state: str = None) -> str:
    """
    Build a ThorData proxy URL with optional city/state targeting.
    City targeting ensures your requests look like local traffic to LoopNet,
    which avoids geo-based result filtering.
    """
    user = "YOUR_THORDATA_USER"
    password = "YOUR_THORDATA_PASS"

    if city and state:
        # City targeting via username parameter encoding
        city_clean = city.lower().replace(" ", "_").replace("-", "_")
        targeted_user = f"{user}-city-{city_clean}-state-{state.upper()}"
        return f"http://{targeted_user}:{password}@proxy.thordata.com:9000"

    return f"http://{user}:{password}@proxy.thordata.com:9000"


# Research Chicago office market with Chicago-exit proxy
chicago_proxy = thordata_proxy(city="chicago", state="IL")

# Research Dallas industrial market with Dallas-exit proxy
dallas_proxy = thordata_proxy(city="dallas", state="TX")

Handling Pagination

LoopNet search result pages use a dynamic pagination token. Navigate using the "Next" button:

async def scrape_search_results(
    page: Page,
    search_url: str,
    max_pages: int = 5,
) -> list:
    """Return listing URLs from paginated LoopNet search results."""
    await page.goto(search_url, wait_until="domcontentloaded", timeout=30000)
    await human_delay(1500, 3000)

    listing_urls = []

    for page_num in range(max_pages):
        # Wait for property cards
        try:
            await page.wait_for_selector(
                "[data-testid='property-card'], .placardSection .placard",
                timeout=15000,
            )
        except Exception:
            content = await page.content()
            if "hcaptcha" in content.lower():
                print(f"CAPTCHA on page {page_num+1}, stopping")
                break
            print(f"No results on page {page_num+1}")
            break

        # Extract URLs from all cards on this page
        cards = await page.query_selector_all(
            "[data-testid='property-card'] a.property-card-link, "
            ".placardSection .placard a.placardTitle"
        )

        page_urls = []
        for card in cards:
            href = await card.get_attribute("href")
            if href and "/Listing/" in href:
                full_url = (
                    f"https://www.loopnet.com{href}"
                    if href.startswith("/")
                    else href
                )
                if full_url not in listing_urls:
                    page_urls.append(full_url)
                    listing_urls.append(full_url)

        print(f"Page {page_num + 1}: found {len(page_urls)} listings ({len(listing_urls)} total)")

        # Check for and click the next page button
        next_btn = await page.query_selector(
            "[data-testid='pagination-next-button']:not([disabled]), "
            ".pagination-next:not(.disabled) a"
        )
        if not next_btn:
            print("No next page button, pagination complete")
            break

        await next_btn.click()
        await human_delay(2000, 4500)

    return listing_urls

Full Async Pipeline

import asyncio
import json
import sqlite3
from datetime import datetime

async def run_loopnet_pipeline(
    search_url: str,
    proxy: str = None,
    max_search_pages: int = 5,
    max_detail_pages: int = 50,
    db_path: str = "loopnet_listings.db",
):
    """
    Full pipeline: search pagination -> listing URL collection -> detail extraction -> storage.
    """
    conn = init_loopnet_db(db_path)

    async with async_playwright() as pw:
        proxy_config = {"server": proxy} if proxy else None
        browser = await pw.chromium.launch(
            headless=True,
            args=["--no-sandbox", "--disable-blink-features=AutomationControlled"],
        )
        context = await browser.new_context(
            proxy=proxy_config,
            viewport={"width": 1440, "height": 900},
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/126.0.0.0 Safari/537.36"
            ),
            locale="en-US",
            timezone_id="America/New_York",
        )

        # Warm up session on homepage first
        page = await new_stealth_page(context)
        await warm_up_session(page)

        # Collect listing URLs from search results
        print(f"\nScraping search results: {search_url}")
        listing_urls = await scrape_search_results(page, search_url, max_pages=max_search_pages)
        print(f"Collected {len(listing_urls)} listing URLs")

        # Check which URLs we haven't scraped yet
        cursor = conn.execute("SELECT url FROM listings")
        known_urls = {row[0] for row in cursor.fetchall()}
        new_urls = [u for u in listing_urls if u not in known_urls]
        print(f"{len(new_urls)} new listings to scrape")

        # Scrape detail pages
        results = []
        context_request_count = 0

        for i, url in enumerate(new_urls[:max_detail_pages]):
            print(f"[{i+1}/{len(new_urls[:max_detail_pages])}] {url[:80]}...")

            # Rotate context every 15-20 listings to reset Imperva session scoring
            if context_request_count >= 15:
                await context.close()
                context = await browser.new_context(
                    proxy=proxy_config,
                    viewport={"width": 1440, "height": 900},
                    user_agent=(
                        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                        "AppleWebKit/537.36 (KHTML, like Gecko) "
                        "Chrome/126.0.0.0 Safari/537.36"
                    ),
                    locale="en-US",
                    timezone_id="America/New_York",
                )
                page = await new_stealth_page(context)
                await warm_up_session(page)
                context_request_count = 0

            try:
                listing = await extract_listing(page, url)
                normalized = normalize_listing(listing)
                results.append(normalized)
                save_listing(conn, normalized)
                context_request_count += 1
                await human_delay(2500, 5500)
            except RuntimeError as e:
                if "CAPTCHA" in str(e):
                    print(f"  CAPTCHA hit — rotating context")
                    await context.close()
                    context = await browser.new_context(
                        proxy=proxy_config,
                        viewport={"width": 1440, "height": 900},
                        user_agent=(
                            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                            "AppleWebKit/537.36 (KHTML, like Gecko) "
                            "Chrome/126.0.0.0 Safari/537.36"
                        ),
                        locale="en-US",
                    )
                    page = await new_stealth_page(context)
                    await warm_up_session(page)
                    context_request_count = 0
                    await asyncio.sleep(30)
                else:
                    print(f"  Error: {e}")
            except Exception as e:
                print(f"  Unexpected error: {e}")

        await browser.close()

    conn.close()
    print(f"\nPipeline complete. Scraped {len(results)} listings.")
    return results

SQLite Storage

def init_loopnet_db(db_path: str = "loopnet_listings.db") -> sqlite3.Connection:
    """Initialize the LoopNet listings database."""
    conn = sqlite3.connect(db_path)
    conn.executescript("""
        CREATE TABLE IF NOT EXISTS listings (
            listing_id TEXT,
            url TEXT PRIMARY KEY,
            property_type TEXT,
            address TEXT,
            price TEXT,
            price_numeric REAL,
            lease_rate_per_sqft TEXT,
            cap_rate TEXT,
            cap_rate_numeric REAL,
            building_size TEXT,
            building_size_sqft INTEGER,
            available_sqft TEXT,
            available_sqft_numeric INTEGER,
            year_built TEXT,
            lot_size TEXT,
            zoning TEXT,
            days_on_market TEXT,
            broker_name TEXT,
            broker_company TEXT,
            broker_phone TEXT,
            lat REAL,
            lng REAL,
            photos TEXT,
            scraped_at TEXT
        );

        CREATE TABLE IF NOT EXISTS market_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            market TEXT,
            property_type TEXT,
            snapshot_date TEXT,
            total_listings INTEGER,
            avg_price REAL,
            avg_cap_rate REAL,
            avg_lease_rate REAL,
            avg_days_on_market REAL
        );

        CREATE INDEX IF NOT EXISTS idx_listings_type ON listings(property_type);
        CREATE INDEX IF NOT EXISTS idx_listings_price ON listings(price_numeric);
        CREATE INDEX IF NOT EXISTS idx_listings_caprate ON listings(cap_rate_numeric);
    """)
    conn.commit()
    return conn


def save_listing(conn: sqlite3.Connection, listing: dict):
    """Insert or replace a listing record."""
    now = datetime.utcnow().isoformat()
    conn.execute(
        """INSERT OR REPLACE INTO listings VALUES
           (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
        (
            listing.get("listing_id"),
            listing.get("url"),
            listing.get("property_type"),
            listing.get("address") if isinstance(listing.get("address"), str)
                else json.dumps(listing.get("address", {})),
            listing.get("price"),
            listing.get("price_numeric"),
            listing.get("lease_rate_per_sqft"),
            listing.get("cap_rate"),
            listing.get("cap_rate_numeric"),
            listing.get("building_size"),
            listing.get("building_size_sqft"),
            listing.get("available_sqft"),
            listing.get("available_sqft_numeric"),
            listing.get("year_built"),
            listing.get("lot_size"),
            listing.get("zoning"),
            listing.get("days_on_market"),
            listing.get("broker_name"),
            listing.get("broker_company"),
            listing.get("broker_phone"),
            listing.get("lat"),
            listing.get("lng"),
            json.dumps(listing.get("photos", [])),
            now,
        )
    )
    conn.commit()

Market Analytics

Once you have a database of listings, compute market-level analytics:

def compute_market_stats(
    conn: sqlite3.Connection,
    market_query: str = None,
    property_type: str = None,
) -> dict:
    """
    Compute aggregate market statistics from stored listings.

    market_query: SQL WHERE clause fragment for filtering by market/address.
    property_type: filter to a specific property type.
    """
    conditions = []
    params = []

    if market_query:
        conditions.append(f"address LIKE ?")
        params.append(f"%{market_query}%")

    if property_type:
        conditions.append("property_type LIKE ?")
        params.append(f"%{property_type}%")

    where = "WHERE " + " AND ".join(conditions) if conditions else ""

    stats = conn.execute(f"""
        SELECT
            COUNT(*) as total_listings,
            AVG(price_numeric) as avg_price,
            MIN(price_numeric) as min_price,
            MAX(price_numeric) as max_price,
            AVG(cap_rate_numeric) as avg_cap_rate,
            AVG(building_size_sqft) as avg_building_sqft,
            AVG(CAST(REPLACE(days_on_market, ' days', '') AS INTEGER)) as avg_dom
        FROM listings
        {where}
    """, params).fetchone()

    by_type = conn.execute(f"""
        SELECT property_type, COUNT(*) as count,
               AVG(price_numeric) as avg_price,
               AVG(cap_rate_numeric) as avg_cap_rate
        FROM listings
        {where}
        GROUP BY property_type
        ORDER BY count DESC
    """, params).fetchall()

    return {
        "total_listings": stats[0],
        "avg_price": round(stats[1], 0) if stats[1] else None,
        "price_range": f"${stats[2]:,.0f} - ${stats[3]:,.0f}" if stats[2] and stats[3] else None,
        "avg_cap_rate_pct": round(stats[4], 2) if stats[4] else None,
        "avg_building_sqft": round(stats[5], 0) if stats[5] else None,
        "avg_days_on_market": round(stats[6], 1) if stats[6] else None,
        "by_property_type": [
            {"type": r[0], "count": r[1], "avg_price": r[2], "avg_cap_rate": r[3]}
            for r in by_type
        ],
    }


def find_high_cap_rate_deals(
    conn: sqlite3.Connection,
    min_cap_rate: float = 6.0,
    max_price: float = None,
) -> list:
    """Find listings with above-threshold cap rates — core deal sourcing query."""
    conditions = ["cap_rate_numeric >= ?"]
    params = [min_cap_rate]

    if max_price:
        conditions.append("price_numeric <= ?")
        params.append(max_price)

    return conn.execute(f"""
        SELECT listing_id, address, property_type, price, cap_rate,
               building_size, broker_name, broker_phone, url
        FROM listings
        WHERE {' AND '.join(conditions)}
        ORDER BY cap_rate_numeric DESC
    """, params).fetchall()

Common Search URL Patterns

SEARCH_URL_TEMPLATES = {
    "chicago_office_lease": (
        "https://www.loopnet.com/search/office-space/chicago-il/for-lease/"
    ),
    "nyc_retail_sale": (
        "https://www.loopnet.com/search/retail-space/new-york-ny/for-sale/"
    ),
    "la_industrial_lease": (
        "https://www.loopnet.com/search/industrial-properties/los-angeles-ca/for-lease/"
    ),
    "nationwide_multifamily_sale": (
        "https://www.loopnet.com/search/apartment-buildings/usa/for-sale/"
    ),
    "dallas_flex_space": (
        "https://www.loopnet.com/search/flex-space/dallas-tx/for-lease/"
    ),
}

# Example: run pipeline on Chicago office market
if __name__ == "__main__":
    PROXY = thordata_proxy(city="chicago", state="IL")
    asyncio.run(
        run_loopnet_pipeline(
            search_url=SEARCH_URL_TEMPLATES["chicago_office_lease"],
            proxy=PROXY,
            max_search_pages=3,
            max_detail_pages=30,
        )
    )

Production Tips

Rotate contexts, not just pages. Imperva tracks session-level signals across requests. Reusing the same browser context for dozens of listings builds up a behavioral profile that eventually triggers challenges. Create a new context (and therefore a new browser fingerprint and session cookie set) every 15-20 listings.

Warm up the session. Before hitting listing pages, navigate to the LoopNet homepage and perform one or two search interactions naturally. Sessions that arrive cold directly on a listing URL are more likely to hit challenges than sessions with a browsing history.

Cache the search phase separately from the extraction phase. Search result pages change daily. Listing detail pages for a given listing ID are mostly static once scraped. Run the search pagination daily to find new listing IDs, but only re-scrape detail pages when you have a fresh ID you haven't seen before.

Handle CAPTCHA gracefully. When a page returns an hCaptcha element, don't retry immediately from the same context. Close the context, wait 30-60 seconds, create a fresh context with a new proxy exit node, and resume from the listing that triggered it.

Use city-targeted proxies for market-specific research. If you're building a dashboard for Dallas industrial properties, route through Dallas-area residential IPs via ThorData's residential proxies. This avoids any geo-based result filtering and gives you the same view a local broker would see when browsing the platform.

Store results in SQLite with the LoopNet listing ID as the primary key. The ID appears in the listing URL path (e.g., /Listing/123456789/) and is stable across listing updates.

Legal Considerations

LoopNet's Terms of Service restrict automated access. CoStar Group (LoopNet's parent company) has actively litigated against competitors who scraped their platform — there is documented legal risk here beyond the usual ToS-vs-CFAA analysis. For commercial applications, evaluate their data licensing options. For research and personal market analysis at modest volumes, the risk profile is much lower, but understand what you're doing before running at scale.