Skip to main content

Introducing Lightfeed Extract

· 3 min read
Lightfeed Team

We're thrilled to launch Lightfeed Extract — a powerful, business-grade web data extraction tool that turns any website into clean, structured, and up-to-date data — all from a simple prompt.

Lightfeed Extract

Say goodbye to custom scrapers, brittle workflows, and writing code. Lightfeed handles the heavy lifting, and even better — we keep your data fresh in a continuously maintained, queryable database.

The Web Data Challenge

If you need clean structured data from websites - whether tracking competitors, monitoring pricing trends, extracting business intelligence, training AI models, or powering applications - you're probably familiar with the limitations of existing tools:

Common Extraction Pain Points

  • Manual Scraping and Maintenance: Traditional scrapers require custom code for each website and break when layouts change - forcing teams to constantly rewrite and fix code instead of focusing on business goals.

  • Limited Extraction Depth: Most tools only extract data from specified URLs, missing critical information buried in subpages and linked content.

  • No Integrated Database: Most scrapers don't provide a persistent database — forcing slow, repeated website crawling for each data request instead of fast queries, and making it impossible to track changes, search historic data, or quickly find relevant information.

  • Data Quality Issues: Raw extracted data requires significant post-processing to clean, normalize, and deduplicate - creating additional engineering complexity and introducing potential errors.

  • Anti-Scraping Measures: Modern websites implement various protection mechanisms - including CAPTCHAs, request throttling, and automated bot detection - making reliable data collection increasingly challenging.

The Lightfeed Solution

Lightfeed transforms how organizations extract and maintain clean, structured and up-to-date web data at scale. Our platform leverages Large Language Models (LLMs) and AI agents that can read, understand and interact with website content, making data extraction reliable and fully automated.

Key Benefits

Adaptive AI Extraction

Extract data from any website using simple natural language instructions without writing code. Automatically adapt to website changes.

Deep Content Discovery and Enrichment

Automatically extract data from linked pages and subpages, while enriching information from multiple sources and third-party websites to create comprehensive datasets.

Fast Database Access

Access consistently up-to-date structured data through instant queries instead of slow crawling, with built-in AI search capabilities to track changes and find the most relevant information.

Automated Data Processing

Get clean, normalized data with automatic deduplication and formatting.

Reliable Scraping

Extract data consistently even from the hardest websites—solving CAPTCHAs automatically and using premium proxies to bypass anti-bot measures.

Getting Started with Lightfeed Extract

Ready to transform how you extract structured data from the web?

Introducing AI Workflows

· 2 min read
Lightfeed Team
Notice

The Knowledge Base described in this post has been replaced by our new Extract functionality. Lightfeed Extract offers improved performance, better reliability, and a more intuitive user experience.

To learn more, please see our dedicated blog post on Lightfeed Extract.

Hi, Lyla from Lightfeed here with some big features to wrap up the year! We are excited to bring you AI workflows - to automate search and analysis on websites at scale. Adding websites is now 10x easier and customizable, and you can directly ask AI what to extract. Plus the LinkedIn and Google search support. Let's get into it.

Introducing AI workflows

This is where you can automate AI to extract, search and analyze your websites. Here is a 3-minute walkthrough.


Extract website with your own prompt

Directly ask AI what to extract. No more schema is required.

Extract website with your own prompt

You can now track user or company activity on LinkedIn and Google search results. Extract LinkedIn and Google

Other improvements

  1. AI agents can read and analyze PDF pages.
  2. We migrated to state-of-art proxy servers and captcha solvers to prevent scrapers from getting blocked. Crawling is now 3x faster.
  3. Styling improvement on workflow emails. Now they are a lot more readable.

Extract Websites to Your Knowledge Base

· 2 min read
Lightfeed Team
Notice

The Knowledge Base described in this post has been replaced by our new Extract functionality. Lightfeed Extract offers improved performance, better reliability, and a more intuitive user experience.

To learn more, please see our dedicated blog post on Lightfeed Extract.

Welcome! Lyla from Lightfeed here. We are super excited to release knowledge base and dashboard - helping you gather and maintain web data easily from hundreds to thousands of websites.

Introducing knowledge base

Knowledge base is your custom database to extract website data at scale. We've been improving it for the last two months - making it able to extract any public website at the frequency you choose.

Just define the data format you need from each website. Lightfeed will extract, deduplicate and index data into your knowledge base continuously. It can be configured to run every hour or every day.


Lightfeed works for any public websites. Unlike traditional web scraping that depends on brittle selectors, Lightfeed use AI (large language models) to understand and reason about the entire page in order to extract and search. It can extract relevant content from anywhere on the page and is robust to site design changes.

Index website

Better data visibility into dashboard

At Lightfeed, we want to make sure you can easily view your website data and workflow results all in one place. So we redesigned the Dashboard and made it accessible to every data menu in Lightfeed.

You can now filter by source or by time. It is powerful to only see results in the last month, year or all time.

Dashboard filter

Table view is here. You can now apply filter from individual columns and export results to CSV.

Dashboard table

Other improvements

  1. New API-based pricing plan is released. No longer hard limit on number of websites to index.
  2. Launched Onboarding UI to guide new users to create their first knowledge base and workflow.
  3. We now support infinite scrolling during scraping, giving more comprehensive results on Zillow/Redfin.
  4. We expanded our extraction context to 128K tokens (~100k words) per web page, providing up to four times more results at the same cost.