step 4 data sources
Connect external knowledge bases to give your agent domain-specific context beyond its training data.
Supported Source Types
Website
Crawl and index web pages with configurable depth
Auto
PDF / Document
Upload PDFs, Word docs, or text files
Manual
Git Repository
Index code, README files, and documentation
Auto
CSV / JSON
Structured data files with column mapping
Auto
REST API
Fetch data from any HTTP endpoint
Auto
SQL Database
Query PostgreSQL, MySQL, SQLite, or MSSQL
Auto
Vector Database
Connect existing Pinecone, Weaviate, Qdrant, Supabase, or Chroma stores
Live
Cloud Storage
Index files from S3, GCS, Azure Blob, or Cloudflare R2
Auto
Indexing & Chunking
Each data source supports configurable:
Chunking strategy — Fixed size, sentence, paragraph, semantic, recursive, or custom regex
Chunk size — Number of characters per chunk (default varies by type)
Chunk overlap — Character overlap between chunks for context continuity
Embedding model — Choose from OpenAI, Cohere, Voyage AI, or local models (BGE, E5)
Refresh Intervals
Data sources marked "Auto" support scheduled re-indexing. Set the refresh interval in minutes, or leave at 0 for manual-only refresh.
How It's Used
Indexed data is available to the agent via the RAG Pipeline tool. When the agent receives a query, it searches the connected data sources for relevant context and includes it in its reasoning.
Was this helpful?