Source
Sources are the websites from which you extract data into a single database. They define where your data comes from while the database itself maintains consistency with a unified schema, prompt, and extraction schedule across all sources.
Source Modes
Lightfeed supports two source modes that determine how data is extracted. Your choice should be based on both the page structure and your extraction goal:
List Mode
List mode optimizes extraction for pages containing multiple similar items (like product listings, search results, or directories). It:
- Maintains consistent structure across all entries
- Prevents the LLM from getting distracted by unrelated page elements
- Extracts key information from each item efficiently
Ideal for:
- Product listing pages
- Directory listings
- Forum threads
- News article lists
- Job board listings
Detail Mode
Detail mode is designed for pages focused on a single item or when you need comprehensive information about a specific entity. It:
- Captures more comprehensive information including nested details
- Identifies relationships between different data points
- Extracts deeper context that might otherwise be missed
Ideal for:
- Individual product pages
- Company homepages
- Single article pages
- User profile pages
- Property listing details
The choice between modes depends on both the page structure AND your extraction goal. For example, you could use Detail Mode on a list page if you want to extract comprehensive information about just one specific company mentioned in a list of companies.
Destinations of Sources
Lightfeed supports various destinations for your sources:
- Website URLs: Direct links to web pages containing your target data
- Google Search results: Dynamic listings generated from Google search queries
- Reddit communities: Content from subreddits
- LinkedIn pages: User and company information, posts and updates
- RSS feeds: Regularly updated content streams
Adding Sources to a Database
To include data from multiple websites in the same database, you must add them as sources. A database can have one or multiple sources, allowing you to collect data from different websites while maintaining a unified structure.
Best Practices
-
Choose the Right Mode
- Use list mode when you need to collect multiple different items
- Use detail mode when you need in-depth information about a specific item
- Consider your data needs when deciding between modes
-
Select Appropriate Destinations
- Use website URLs for direct data extraction
- Use Google Search for tracking search results
- Use Reddit for finding new contents and leads in communities
- Use LinkedIn for professional and company information
- Use RSS feeds for curated and regularly updated content
-
Combine Sources Strategically
- Mix different destinations to get comprehensive data
- Use both list and detail modes when appropriate
- Consider the update frequency of each source
Why Are Sources Important?
By defining sources strategically, you can extract comprehensive, structured, and up-to-date data across multiple websites into a single, unified database.
- Flexible data collection – Allows combining data from multiple sites in a structured way
- Scalability – Add or remove sources as needed without affecting the database structure
- Automated updates – Sources are reprocessed based on your extraction schedule
Example Use Cases and Their Sources
Use case | Mode | Example sources |
---|---|---|
Market research | List | Google search results for competitors, LinkedIn company pages |
E-commerce product tracking | List | Product listing pages from different online stores |
Company analysis | Detail | Company homepage, LinkedIn company page |
Job listings | List | Company career pages and job boards |
News aggregation | List | Articles from various news sites |
Regulation tracking | Detail | Individual regulation pages from government sites |