Skip to main content

Source

Sources are the websites from which you extract data into a single database. They define where your data comes from while the database itself maintains consistency with a unified schema, prompt, and extraction schedule across all sources.

Source Modes

Lightfeed supports two source modes that determine how data is extracted. Your choice should be based on both the page structure and your extraction goal:

List Mode

List mode optimizes extraction for pages containing multiple similar items (like product listings, search results, or directories). It:

  • Maintains consistent structure across all entries
  • Prevents the LLM from getting distracted by unrelated page elements
  • Extracts key information from each item efficiently

Ideal for:

  • Product listing pages
  • Directory listings
  • Forum threads
  • News article lists
  • Job board listings

Detail Mode

Detail mode is designed for pages focused on a single item or when you need comprehensive information about a specific entity. It:

  • Captures more comprehensive information including nested details
  • Identifies relationships between different data points
  • Extracts deeper context that might otherwise be missed

Ideal for:

  • Individual product pages
  • Company homepages
  • Single article pages
  • User profile pages
  • Property listing details
tip

The choice between modes depends on both the page structure AND your extraction goal. For example, you could use Detail Mode on a list page if you want to extract comprehensive information about just one specific company mentioned in a list of companies.

Destinations of Sources

Lightfeed supports various destinations for your sources:

  • Website URLs: Direct links to web pages containing your target data
  • Google Search results: Dynamic listings generated from Google search queries
  • Reddit communities: Content from subreddits
  • LinkedIn pages: User and company information, posts and updates
  • RSS feeds: Regularly updated content streams

Adding Sources to a Database

To include data from multiple websites in the same database, you must add them as sources. A database can have one or multiple sources, allowing you to collect data from different websites while maintaining a unified structure.

Best Practices

  1. Choose the Right Mode

    • Use list mode when you need to collect multiple different items
    • Use detail mode when you need in-depth information about a specific item
    • Consider your data needs when deciding between modes
  2. Select Appropriate Destinations

    • Use website URLs for direct data extraction
    • Use Google Search for tracking search results
    • Use Reddit for finding new contents and leads in communities
    • Use LinkedIn for professional and company information
    • Use RSS feeds for curated and regularly updated content
  3. Combine Sources Strategically

    • Mix different destinations to get comprehensive data
    • Use both list and detail modes when appropriate
    • Consider the update frequency of each source

Why Are Sources Important?

By defining sources strategically, you can extract comprehensive, structured, and up-to-date data across multiple websites into a single, unified database.

  • Flexible data collection – Allows combining data from multiple sites in a structured way
  • Scalability – Add or remove sources as needed without affecting the database structure
  • Automated updates – Sources are reprocessed based on your extraction schedule

Example Use Cases and Their Sources

Use caseModeExample sources
Market researchListGoogle search results for competitors, LinkedIn company pages
E-commerce product trackingListProduct listing pages from different online stores
Company analysisDetailCompany homepage, LinkedIn company page
Job listingsListCompany career pages and job boards
News aggregationListArticles from various news sites
Regulation trackingDetailIndividual regulation pages from government sites