How to Scrape Quora Questions and Answers in 2026 (Python Guide)

2026-04-09 ["web-scraping" "python" "quora" "playwright" "q-and-a"]

How to Scrape Quora Questions and Answers in 2026 (Complete Python Guide)

Quora is a goldmine for researchers, content marketers, and NLP practitioners. The upvote system means community-validated signal is baked in — popular answers aren't just opinions, they're opinions that thousands of real humans endorsed with a click. The question volume across topics is enormous: millions of questions spanning every industry, profession, and human curiosity. For competitive research, NLP training data, content gap analysis, or understanding what your market is confused about, Quora is one of the richest publicly available Q&A datasets.

The problem is that scraping it is legitimately non-trivial. Quora in 2026 runs Cloudflare, aggressive bot fingerprinting, login walls that appear after a few page views, and a fully React-rendered frontend that naive HTTP scrapers can't parse at all. This guide covers the complete technical stack for pulling Quora data: questions, answers, upvote counts, user profiles, topic feeds, and related questions — with working code and real anti-detection strategies.

Why Playwright Is Non-Negotiable

The fundamental challenge with Quora is that it's a React SPA. When your browser first requests a Quora URL, the server returns an HTML shell that contains almost no content — just the app skeleton. The actual questions, answers, and vote counts are fetched asynchronously via GraphQL API calls that happen after the JavaScript executes.

If you try requests, httpx, urllib3, or even aiohttp, you get either an empty page or an immediate 403 before you see a single answer. Tools like mechanize or scrapy have the same problem — they don't execute JavaScript.

You need a real browser. Playwright is the right choice in 2026:

Faster startup and more scriptable than Selenium
Native async support with asyncio
Direct integration with Chromium, Firefox, and WebKit
Built-in network interception for capturing API responses
Better maintained than Puppeteer for Python use cases

pip install playwright playwright-stealth
playwright install chromium

Use the async API. You'll almost certainly want concurrent workers to scrape at any useful rate, and async lets you run multiple browser contexts without threads.

Installation and Project Setup

# requirements.txt
# playwright>=1.44.0
# playwright-stealth>=1.0.6

import asyncio
import json
import random
import time
import re
from typing import Optional
from playwright.async_api import async_playwright, Page, BrowserContext

Verify your setup:

async def verify_setup():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto("https://httpbin.org/user-agent")
        ua = await page.evaluate("() => navigator.userAgent")
        print(f"User agent: {ua}")
        await browser.close()

asyncio.run(verify_setup())

Stealth Configuration

Quora's bot detection checks for browser automation signals before serving content. Apply stealth patches to every new page before navigation:

from playwright_stealth import stealth_async

REALISTIC_UA = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/124.0.0.0 Safari/537.36"
)

async def create_stealth_context(
    browser,
    proxy_config: dict = None,
    locale: str = "en-US",
) -> BrowserContext:
    """Create a browser context with stealth configuration."""
    context_opts = {
        "user_agent": REALISTIC_UA,
        "viewport": {"width": 1366, "height": 768},
        "locale": locale,
        "timezone_id": "America/New_York",
        "color_scheme": "light",
        "accept_downloads": False,
    }
    if proxy_config:
        context_opts["proxy"] = proxy_config

    context = await browser.new_context(**context_opts)

    # Override automation fingerprints
    await context.add_init_script("""
        // Webdriver flag
        Object.defineProperty(navigator, 'webdriver', {get: () => undefined});

        // Plugin array (headless has none)
        const pluginData = [
            {name: 'PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai'},
            {name: 'Chrome PDF Viewer', filename: 'internal-pdf-viewer'},
            {name: 'Chromium PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai'},
            {name: 'Microsoft Edge PDF Viewer', filename: 'msedgepdf'},
            {name: 'WebKit built-in PDF', filename: 'webkit-fake-pdf-plugin'},
        ];
        Object.defineProperty(navigator, 'plugins', {
            get: () => {
                const arr = Array.from(pluginData);
                arr.length = pluginData.length;
                return arr;
            }
        });

        // Languages
        Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});

        // Chrome runtime
        window.chrome = window.chrome || {
            runtime: {}, app: {isInstalled: false}
        };

        // Permissions API (headless returns different results)
        const originalQuery = window.navigator.permissions.query;
        window.navigator.permissions.query = (parameters) => (
            parameters.name === 'notifications' ?
                Promise.resolve({state: Notification.permission}) :
                originalQuery(parameters)
        );
    """)

    return context


async def new_stealth_page(context: BrowserContext) -> Page:
    """Create a new page with stealth patches applied."""
    page = await context.new_page()
    await stealth_async(page)
    return page

Scraping Q&A Pages

A Quora question URL looks like https://www.quora.com/What-is-the-best-way-to-learn-Python. Answers load dynamically as the page hydrates, so you must wait for content to appear before extracting:

async def scrape_question(
    url: str,
    max_answers: int = 20,
    proxy_config: dict = None,
) -> dict:
    """
    Scrape a single Quora question page.
    Returns question text and a list of answers with upvotes.
    """
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await create_stealth_context(browser, proxy_config)
        page = await new_stealth_page(context)

        # Capture GraphQL responses for vote data
        graphql_data = []
        async def capture_graphql(response):
            if "graphql" in response.url.lower() or "api" in response.url.lower():
                try:
                    data = await response.json()
                    graphql_data.append(data)
                except Exception:
                    pass
        page.on("response", capture_graphql)

        # Navigate to the question
        await page.goto(url, wait_until="networkidle", timeout=30000)
        await asyncio.sleep(random.uniform(2.5, 4.5))

        # Dismiss login modal (appears after ~3 page views in a session)
        await dismiss_login_modal(page)

        # Wait for answer container to appear
        try:
            await page.wait_for_selector(".q-box", timeout=8000)
        except Exception:
            pass  # Continue even if selector times out

        # Scroll to load more answers
        for _ in range(3):
            await page.evaluate("window.scrollBy(0, window.innerHeight * 2)")
            await asyncio.sleep(random.uniform(1.5, 2.5))

        # Extract question title
        question_text = ""
        for selector in ["h1.q-text", "h1", ".q-title", "[data-testid='question-title']"]:
            el = await page.query_selector(selector)
            if el:
                question_text = (await el.text_content() or "").strip()
                if question_text:
                    break

        # Extract answers
        answers = await extract_answers(page, max_answers)

        # Extract related questions
        related = await extract_related_questions(page)

        # Extract topic tags
        topics = await page.evaluate("""
            () => {
                const els = document.querySelectorAll('a[href*="/topic/"]');
                return [...new Set(Array.from(els).map(e => e.textContent.trim()))].filter(Boolean);
            }
        """)

        await browser.close()

        return {
            "url": url,
            "question": question_text,
            "answers": answers[:max_answers],
            "topics": topics[:10],
            "related_questions": related[:5],
            "graphql_responses_captured": len(graphql_data),
        }


async def dismiss_login_modal(page: Page):
    """Try to dismiss Quora's login modal if present."""
    dismiss_selectors = [
        "[aria-label='Close']",
        "button[data-functional-selector='close-button']",
        ".q-modal__close",
        "[class*='modal'] button[aria-label*='lose']",
    ]
    for selector in dismiss_selectors:
        try:
            btn = await page.query_selector(selector)
            if btn:
                await btn.click()
                await asyncio.sleep(random.uniform(0.5, 1.2))
                return True
        except Exception:
            continue
    return False


async def extract_answers(page: Page, max_answers: int = 20) -> list[dict]:
    """Extract answer cards from a loaded Quora page."""

    # Quora's class names partially rotate, so try multiple selector strategies
    answer_selectors = [
        ".Answer",
        "[class*='Answer_answer']",
        ".dom_annotate_question_answer_item",
        "[data-aid]",  # Quora uses data-aid on answer containers
    ]

    answer_elements = []
    for selector in answer_selectors:
        elements = await page.query_selector_all(selector)
        if elements:
            answer_elements = elements
            break

    if not answer_elements:
        # Fallback: extract all substantial text blocks
        return await extract_answers_fallback(page)

    answers = []
    for el in answer_elements[:max_answers + 5]:  # Grab extra, filter below
        try:
            answer = await extract_single_answer(el)
            if answer and len(answer.get("content", "")) >= 50:
                answers.append(answer)
        except Exception:
            continue

    return answers[:max_answers]


async def extract_single_answer(el) -> Optional[dict]:
    """Extract data from a single answer element."""
    # Author name
    author = ""
    author_selectors = [
        ".q-box .ui_profile_header",
        "[class*='author']",
        "a[href*='/profile/']",
        ".UserCredential",
    ]
    for sel in author_selectors:
        author_el = await el.query_selector(sel)
        if author_el:
            author = (await author_el.text_content() or "").strip()
            if author:
                break

    # Author credential/bio line
    credential = ""
    cred_el = await el.query_selector(".CredentialListItem, [class*='credential']")
    if cred_el:
        credential = (await cred_el.text_content() or "").strip()

    # Answer content
    content = ""
    content_selectors = [
        ".q-relative .q-text",
        "[class*='answer_content']",
        ".q-box.spacing_log_answer_content",
    ]
    for sel in content_selectors:
        content_el = await el.query_selector(sel)
        if content_el:
            content = (await content_el.inner_text() or "").strip()
            if len(content) >= 50:
                break

    if not content:
        # Get all text from the element, filter out nav/meta text
        raw_text = (await el.inner_text() or "").strip()
        # Remove short lines that are likely UI chrome
        lines = [l.strip() for l in raw_text.split("\n") if len(l.strip()) > 30]
        content = "\n".join(lines)

    # Upvote count
    upvotes = "0"
    vote_selectors = [
        "[class*='VoterCount']",
        "[class*='upvote'] span",
        "button[aria-label*='pvote'] span",
        ".q-text[class*='upvote']",
    ]
    for sel in vote_selectors:
        vote_el = await el.query_selector(sel)
        if vote_el:
            vote_text = (await vote_el.text_content() or "0").strip()
            if re.search(r'\d', vote_text):
                upvotes = vote_text
                break

    # Share count / views (sometimes available)
    views = ""
    views_el = await el.query_selector("[class*='views'], [class*='Views']")
    if views_el:
        views = (await views_el.text_content() or "").strip()

    # Timestamp
    timestamp = ""
    time_el = await el.query_selector("time, [class*='timestamp'], [datetime]")
    if time_el:
        timestamp = (
            await time_el.get_attribute("datetime") or
            await time_el.text_content() or ""
        ).strip()

    return {
        "author": author[:100] if author else "Anonymous",
        "credential": credential[:200],
        "content": content,
        "upvotes": upvotes,
        "views": views,
        "timestamp": timestamp,
    }


async def extract_answers_fallback(page: Page) -> list[dict]:
    """Fallback extraction when main selectors fail."""
    return await page.evaluate("""
        () => {
            // Find all substantial paragraph blocks
            const paras = document.querySelectorAll('p, .q-relative');
            const answers = [];
            let current = {};

            for (const el of paras) {
                const text = el.textContent.trim();
                if (text.length > 100) {
                    current.content = (current.content || '') + ' ' + text;
                }
                if (Object.keys(current).length > 0 && text.length > 80) {
                    if (!current.upvotes) current.upvotes = '0';
                    if (!current.author) current.author = 'Anonymous';
                    if (current.content && current.content.length > 150) {
                        answers.push({...current});
                        current = {};
                    }
                }
            }
            return answers.slice(0, 20);
        }
    """)


async def extract_related_questions(page: Page) -> list[dict]:
    """Extract related/similar questions from the sidebar."""
    return await page.evaluate("""
        () => {
            const links = document.querySelectorAll('a[href*="/"]');
            const related = [];
            for (const link of links) {
                const href = link.getAttribute('href') || '';
                const text = link.textContent.trim();
                // Quora question URLs end with a question mark in the slug
                if (href.startsWith('/') && text.endsWith('?') && text.length > 15) {
                    related.push({
                        title: text,
                        url: 'https://www.quora.com' + href,
                    });
                }
            }
            return related.slice(0, 10);
        }
    """)

Scraping User Profiles

Profile pages at quora.com/profile/Username expose bio text, credential lines, follower/following counts, answer counts, and sometimes educational/professional history. Useful for building author authority signals:

async def scrape_profile(
    username: str,
    include_recent_answers: bool = False,
    proxy_config: dict = None,
) -> dict:
    """
    Scrape a Quora user profile.
    """
    url = f"https://www.quora.com/profile/{username}"

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await create_stealth_context(browser, proxy_config)
        page = await new_stealth_page(context)

        await page.goto(url, wait_until="networkidle", timeout=30000)
        await asyncio.sleep(random.uniform(2, 3.5))
        await dismiss_login_modal(page)

        profile = {"username": username, "url": url}

        # Display name
        name_el = await page.query_selector("h1, .ProfileNameAndSig, [class*='profile_name']")
        profile["display_name"] = (await name_el.text_content() or username).strip() if name_el else username

        # Bio/description
        bio_el = await page.query_selector(".ProfileAboutMe, .q-text.qu-dynamicFontSize--regular")
        profile["bio"] = (await bio_el.text_content() or "").strip() if bio_el else ""

        # Credentials (job title, education, etc.)
        cred_els = await page.query_selector_all(".CredentialListItem, [class*='credential']")
        profile["credentials"] = [
            (await el.text_content() or "").strip()
            for el in cred_els
            if (await el.text_content() or "").strip()
        ][:5]

        # Stats: followers, following, answers, questions
        stats = {}
        stat_links = await page.query_selector_all("a[href*='followers'], a[href*='following'], a[href*='answers']")
        for link in stat_links:
            href = await link.get_attribute("href") or ""
            text = (await link.text_content() or "").strip()
            if "followers" in href:
                stats["followers"] = text
            elif "following" in href:
                stats["following"] = text
            elif "answers" in href:
                stats["answers"] = text

        profile.update(stats)

        # Knows about topics
        topic_els = await page.query_selector_all("a[href*='/topic/']")
        profile["known_for_topics"] = list({
            (await el.text_content() or "").strip()
            for el in topic_els
            if (await el.text_content() or "").strip()
        })[:10]

        # Recent answers (optional)
        if include_recent_answers:
            answer_links = await page.query_selector_all("a[href*='answer']")
            recent = []
            for link in answer_links[:5]:
                href = await link.get_attribute("href") or ""
                text = (await link.text_content() or "").strip()
                if href and text and len(text) > 20:
                    recent.append({"text_preview": text[:100], "url": href})
            profile["recent_answers"] = recent

        await browser.close()
        return profile

Topic Feeds and Infinite Scroll

Topic pages at quora.com/topic/Machine-Learning list questions tagged with that topic. The feed uses infinite scroll — scroll to the bottom and new questions appear:

async def scrape_topic_feed(
    topic: str,
    max_questions: int = 50,
    proxy_config: dict = None,
) -> list[dict]:
    """
    Scrape questions from a Quora topic feed.
    Handles infinite scroll pagination.
    """
    url = f"https://www.quora.com/topic/{topic}"

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await create_stealth_context(browser, proxy_config)
        page = await new_stealth_page(context)

        await page.goto(url, wait_until="networkidle", timeout=30000)
        await asyncio.sleep(random.uniform(2, 3))
        await dismiss_login_modal(page)

        questions = set()
        last_count = 0

        # Scroll until we have enough or stop getting new content
        for scroll_attempt in range(20):
            # Extract current visible questions
            new_questions = await page.evaluate("""
                () => {
                    const links = document.querySelectorAll('a[href^="/"]');
                    const questions = [];
                    for (const link of links) {
                        const text = link.textContent.trim();
                        const href = link.getAttribute('href');
                        // Quora question slugs are long and descriptive
                        if (text.endsWith('?') && text.length > 20 && href && href.length > 10) {
                            questions.push({
                                title: text,
                                url: 'https://www.quora.com' + href,
                            });
                        }
                    }
                    return questions;
                }
            """)

            for q in new_questions:
                questions.add(json.dumps(q))  # Use JSON string for set deduplication

            if len(questions) >= max_questions:
                break

            if len(questions) == last_count and scroll_attempt > 3:
                # No new content loading
                break

            last_count = len(questions)

            # Scroll down
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await asyncio.sleep(random.uniform(2.5, 4.5))

        await browser.close()

    # Parse back from JSON strings
    result = [json.loads(q) for q in questions]
    return result[:max_questions]


async def scrape_topic_with_answers(
    topic: str,
    max_questions: int = 10,
    proxy_config: dict = None,
) -> list[dict]:
    """
    Scrape a topic feed then scrape each question's top answer.
    """
    questions = await scrape_topic_feed(topic, max_questions=max_questions * 2, proxy_config=proxy_config)

    enriched = []
    for i, q in enumerate(questions[:max_questions]):
        print(f"  [{i+1}/{max_questions}] {q['title'][:60]}...")
        try:
            full = await scrape_question(q["url"], max_answers=3, proxy_config=proxy_config)
            enriched.append({
                **q,
                "top_answer": full["answers"][0] if full["answers"] else None,
                "answer_count": len(full["answers"]),
                "topics": full.get("topics", []),
            })
        except Exception as e:
            print(f"    Error: {e}")
            enriched.append({**q, "error": str(e)})

        # Delay between question scrapes
        await asyncio.sleep(random.uniform(4, 8))

    return enriched

Searching Quora (Search Page)

Quora's search at quora.com/search?q=... works similarly to topic pages:

async def search_quora(
    query: str,
    max_results: int = 30,
    content_type: str = "question",  # question | answer | profile | post
    proxy_config: dict = None,
) -> list[dict]:
    """Search Quora and return matching questions."""
    encoded_query = query.replace(" ", "+")
    url = f"https://www.quora.com/search?q={encoded_query}&type={content_type}"

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await create_stealth_context(browser, proxy_config)
        page = await new_stealth_page(context)

        await page.goto(url, wait_until="networkidle", timeout=30000)
        await asyncio.sleep(random.uniform(2, 3.5))
        await dismiss_login_modal(page)

        # Scroll to load more results
        for _ in range(4):
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await asyncio.sleep(random.uniform(2, 3))

        results = await page.evaluate("""
            () => {
                const links = document.querySelectorAll('a[href^="/"]');
                const seen = new Set();
                const items = [];

                for (const link of links) {
                    const href = link.getAttribute('href');
                    const text = link.textContent.trim();
                    if (!seen.has(href) && text.length > 20 && href.length > 5) {
                        seen.add(href);
                        items.push({
                            title: text,
                            url: 'https://www.quora.com' + href,
                        });
                    }
                }
                return items;
            }
        """)

        await browser.close()

    return [r for r in results if "?" in r["title"] or len(r["title"]) > 30][:max_results]

Quora shows login/signup modals aggressively, particularly after 3-5 page views in a session. Three effective strategies:

Strategy 1: Dismiss the modal. The modal has a close button, but the aria-label and class names vary. Try multiple selectors:

async def aggressive_dismiss(page: Page, max_attempts: int = 3):
    """Try multiple approaches to dismiss the login modal."""
    for attempt in range(max_attempts):
        dismissed = await dismiss_login_modal(page)
        if dismissed:
            return True

        # Press Escape key as fallback
        await page.keyboard.press("Escape")
        await asyncio.sleep(0.8)

        # Look for any overlay and click outside it
        try:
            overlay = await page.query_selector("[class*='overlay'], [class*='modal']")
            if overlay:
                bbox = await overlay.bounding_box()
                if bbox:
                    # Click outside the modal bounds
                    await page.mouse.click(10, 10)
        except Exception:
            pass

        await asyncio.sleep(1)

    return False

Strategy 2: Inject cookies from a logged-in session. Export cookies from a real browser session and load them into Playwright:

async def load_quora_session(context, cookies_path: str):
    """Load saved Quora session cookies."""
    import json
    with open(cookies_path) as f:
        cookies = json.load(f)

    # Ensure cookies have required fields
    cleaned = []
    for c in cookies:
        if c.get("name") and c.get("value") and "quora.com" in c.get("domain", ""):
            cleaned.append({
                "name": c["name"],
                "value": c["value"],
                "domain": c.get("domain", ".quora.com"),
                "path": c.get("path", "/"),
                "httpOnly": c.get("httpOnly", False),
                "secure": c.get("secure", True),
            })

    await context.add_cookies(cleaned)
    print(f"Loaded {len(cleaned)} Quora cookies")

Strategy 3: Use fresh contexts with short sessions. Keep each browser context under 3 question views before creating a new one. This prevents the modal trigger that fires after multiple page views.

Proxy Configuration and Anti-Detection

The technical layers Quora deploys in 2026:

Cloudflare handles initial IP reputation checks. Datacenter IPs (AWS, GCP, Azure, DigitalOcean ranges) fail this check immediately and get the JS challenge or a soft block. Residential IPs pass.

Browser fingerprinting — covered by our create_stealth_context setup above. The key signals checked: navigator.webdriver, plugin array, Chrome runtime presence, Canvas/WebGL rendering characteristics.

Behavioral analysis — request timing, navigation patterns, mouse movement. Our random delays and session length limits address this.

TLS fingerprinting — Playwright using real Chromium passes this automatically since it has a real browser TLS stack.

For IP rotation, ThorData's residential proxy network works well with Playwright's built-in proxy configuration:

THORDATA_CONFIG = {
    "server": "http://proxy.thordata.com:9000",
    "username": "YOUR_THORDATA_USER",
    "password": "YOUR_THORDATA_PASS",
}

# For country-targeted access (e.g., US IP)
THORDATA_US = {
    "server": "http://proxy.thordata.com:9000",
    "username": "YOUR_THORDATA_USER-country-us",
    "password": "YOUR_THORDATA_PASS",
}

async def scrape_with_proxy(question_url: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy=THORDATA_US,
        )
        context = await create_stealth_context(browser, THORDATA_US)
        page = await new_stealth_page(context)

        await page.goto(question_url, wait_until="networkidle")
        await asyncio.sleep(random.uniform(2.5, 4))
        await dismiss_login_modal(page)

        answers = await extract_answers(page)
        await browser.close()
        return answers

Rotate the proxy on each new browser context rather than per-request. Creating a new context with a new proxy IP gives you a fresh IP address, fresh cookies, and a clean behavioral fingerprint — far more convincing than switching IPs mid-session.

Rate limit: one page load per 3-8 seconds per IP. If running parallel workers, your proxy pool must be large enough to distribute the load. Ten workers at 1 request each per 3 seconds = ~200 req/min — you need at least 10 separate IPs cycling.

Session length: under 20 page views per context. Quora's modal trigger scales with session depth. Shorter sessions mean the modal appears less often and behavioral scoring has less data to work with.

Concurrent Scraping with Worker Pool

import asyncio
from asyncio import Queue

async def question_worker(
    worker_id: int,
    queue: Queue,
    results: list,
    proxy_config: dict = None,
    max_answers: int = 10,
):
    """Worker coroutine that processes questions from a queue."""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await create_stealth_context(browser, proxy_config)
        page_views = 0

        while True:
            try:
                url = queue.get_nowait()
            except asyncio.QueueEmpty:
                break

            if page_views > 0 and page_views % 15 == 0:
                # Rotate context to reset session state
                await context.close()
                context = await create_stealth_context(browser, proxy_config)
                print(f"  Worker {worker_id}: rotated context at {page_views} views")

            try:
                page = await new_stealth_page(context)
                await page.goto(url, wait_until="networkidle", timeout=30000)
                await asyncio.sleep(random.uniform(2, 4))
                await dismiss_login_modal(page)

                question_text = ""
                h1 = await page.query_selector("h1")
                if h1:
                    question_text = (await h1.text_content() or "").strip()

                answers = await extract_answers(page, max_answers)
                await page.close()
                page_views += 1

                results.append({
                    "url": url,
                    "question": question_text,
                    "answers": answers,
                    "worker": worker_id,
                })

                queue.task_done()
                await asyncio.sleep(random.uniform(3, 6))

            except Exception as e:
                print(f"  Worker {worker_id} error on {url}: {e}")
                queue.task_done()
                await asyncio.sleep(5)

        await browser.close()


async def scrape_questions_parallel(
    urls: list[str],
    num_workers: int = 3,
    proxy_config: dict = None,
) -> list[dict]:
    """
    Scrape multiple Quora questions concurrently.
    """
    queue = Queue()
    for url in urls:
        await queue.put(url)

    results = []

    workers = [
        question_worker(i, queue, results, proxy_config)
        for i in range(num_workers)
    ]

    await asyncio.gather(*workers)
    return results

Storing and Indexing Scraped Data

import sqlite3
from datetime import datetime

def init_quora_db(db_path: str = "quora_data.db") -> sqlite3.Connection:
    conn = sqlite3.connect(db_path)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS questions (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            url TEXT UNIQUE NOT NULL,
            question_text TEXT,
            topic TEXT,
            scraped_at TEXT,
            answer_count INTEGER DEFAULT 0
        )
    """)

    conn.execute("""
        CREATE TABLE IF NOT EXISTS answers (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            question_url TEXT NOT NULL,
            author TEXT,
            credential TEXT,
            content TEXT,
            upvotes TEXT,
            timestamp TEXT,
            scraped_at TEXT,
            FOREIGN KEY (question_url) REFERENCES questions(url)
        )
    """)

    conn.execute("""
        CREATE VIRTUAL TABLE IF NOT EXISTS answers_fts
        USING fts5(content, question_url, author, tokenize='porter unicode61')
    """)

    conn.execute("CREATE INDEX IF NOT EXISTS idx_q_url ON questions(url)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_a_question ON answers(question_url)")

    conn.commit()
    return conn


def save_question_with_answers(
    conn: sqlite3.Connection,
    data: dict,
    topic: str = "",
):
    """Save a scraped Q&A to the database."""
    now = datetime.utcnow().isoformat()
    url = data["url"]

    conn.execute("""
        INSERT OR REPLACE INTO questions (url, question_text, topic, scraped_at, answer_count)
        VALUES (?, ?, ?, ?, ?)
    """, (url, data.get("question", ""), topic, now, len(data.get("answers", []))))

    for answer in data.get("answers", []):
        content = answer.get("content", "")
        if len(content) < 50:
            continue

        conn.execute("""
            INSERT INTO answers (question_url, author, credential, content, upvotes, timestamp, scraped_at)
            VALUES (?, ?, ?, ?, ?, ?, ?)
        """, (
            url, answer.get("author", ""),
            answer.get("credential", ""), content,
            answer.get("upvotes", "0"), answer.get("timestamp", ""), now,
        ))

        # FTS index
        conn.execute(
            "INSERT INTO answers_fts (content, question_url, author) VALUES (?, ?, ?)",
            (content, url, answer.get("author", ""))
        )

    conn.commit()


def search_answers(conn: sqlite3.Connection, query: str, limit: int = 20) -> list[dict]:
    """Full-text search across all scraped answers."""
    rows = conn.execute("""
        SELECT a.question_url, a.author, a.content, a.upvotes,
               q.question_text
        FROM answers_fts f
        JOIN answers a ON a.question_url = f.question_url AND a.content = f.content
        JOIN questions q ON q.url = a.question_url
        WHERE answers_fts MATCH ?
        ORDER BY rank
        LIMIT ?
    """, (query, limit)).fetchall()

    return [
        {
            "question": r[4], "url": r[0], "author": r[1],
            "content": r[2][:300], "upvotes": r[3],
        }
        for r in rows
    ]

Complete Working Pipeline

import asyncio
import json

async def run_quora_pipeline(
    topics: list[str],
    questions_per_topic: int = 20,
    answers_per_question: int = 5,
    db_path: str = "quora_data.db",
    proxy_config: dict = None,
):
    """
    Full pipeline: topic feeds -> question scraping -> database storage.
    """
    db = init_quora_db(db_path)
    total_saved = 0

    for topic in topics:
        print(f"\nTopic: {topic}")
        print("  Fetching question list...")

        try:
            questions = await scrape_topic_feed(
                topic,
                max_questions=questions_per_topic * 2,
                proxy_config=proxy_config,
            )
        except Exception as e:
            print(f"  Feed failed: {e}")
            continue

        print(f"  Found {len(questions)} questions, scraping top {questions_per_topic}...")

        for i, q in enumerate(questions[:questions_per_topic]):
            print(f"  [{i+1}/{questions_per_topic}] {q['title'][:55]}...")
            try:
                data = await scrape_question(
                    q["url"],
                    max_answers=answers_per_question,
                    proxy_config=proxy_config,
                )
                save_question_with_answers(db, data, topic=topic)
                total_saved += 1
                print(f"    Saved {len(data['answers'])} answers")
            except Exception as e:
                print(f"    Error: {e}")

            # Rate limit: 4-10 seconds between questions
            await asyncio.sleep(random.uniform(4, 10))

        print(f"  Topic '{topic}' complete. Total saved: {total_saved}")

    # Summary
    row = db.execute("SELECT COUNT(*) FROM questions").fetchone()
    ans_row = db.execute("SELECT COUNT(*) FROM answers").fetchone()
    print(f"\nDatabase: {row[0]} questions, {ans_row[0]} answers")
    db.close()


# Run it
if __name__ == "__main__":
    asyncio.run(run_quora_pipeline(
        topics=["Machine-Learning", "Python-programming-language", "Startups"],
        questions_per_topic=15,
        answers_per_question=5,
        proxy_config=THORDATA_US,
    ))

Common Gotchas

Quora changes class names frequently. They don't use human-readable class names — they're compiled and rotated on each deploy. The selectors in this guide use multiple fallbacks for exactly this reason. When things break, inspect the live DOM and look for structural patterns (data-* attributes, element hierarchy) rather than specific class names.

Login walls are session-scoped. A fresh browser context resets the session counter. If you're seeing modals constantly, your context is too old. Rotate more aggressively.

Upvote numbers are display strings, not integers. Quora displays "2.3K upvotes" not "2300". Parse these with: int(float(s.replace('K','')) * 1000) if 'K' in s else int(s.replace(',', '')).

Some questions are behind paywall ("Quora+"). These render a blurred preview. Your content extraction will return very short or empty strings. Filter by minimum content length (>100 chars).

Answers with collapsed "more" sections. Long answers have a "Continue Reading" button. If you need full answer text, click it before extracting content:

async def expand_answers(page: Page):
    """Click all 'Continue Reading' / 'more' buttons to expand answers."""
    expand_selectors = [
        "button[class*='expand']",
        "span[class*='more']",
        "a.continue_reading",
        "[data-functional-selector='expand-answer']",
    ]
    for sel in expand_selectors:
        buttons = await page.query_selector_all(sel)
        for btn in buttons:
            try:
                await btn.click()
                await asyncio.sleep(0.3)
            except Exception:
                pass

Ethics and Legal Considerations

Quora's Terms of Service prohibit automated scraping. The 2022 hiQ v. LinkedIn ruling established that scraping publicly accessible data doesn't automatically violate the Computer Fraud and Abuse Act — but that's a US legal standard, and it applies to the criminal statute, not to Quora's right to ban your IP.

Practical guidelines for responsible use:

Rate limit aggressively. 1 request per 3-8 seconds per IP is respectful.
Don't scrape behind login walls — stick to public questions and answers.
For commercial products built on Quora data, the risk-to-benefit math probably doesn't favor scraping. Contact Quora about data licensing.
For research, NLP training data, and personal analysis, small-scale scraping is standard practice. Keep session lengths short, don't hammer their servers, and you'll be fine operationally.
Don't republish scraped answers verbatim in public-facing content. Attribution and transformation are your friends legally and ethically.

The techniques here work reliably in 2026. Use them responsibly.