step 4 data sources

Connect external knowledge bases to give your agent domain-specific context beyond its training data.

Supported Source Types

Type

Description

Refresh

Website

Crawl and index web pages with configurable depth

Auto

PDF / Document

Upload PDFs, Word docs, or text files

Manual

Git Repository

Index code, README files, and documentation

Auto

CSV / JSON

Structured data files with column mapping

Auto

REST API

Fetch data from any HTTP endpoint

Auto

SQL Database

Query PostgreSQL, MySQL, SQLite, or MSSQL

Auto

Vector Database

Connect existing Pinecone, Weaviate, Qdrant, Supabase, or Chroma stores

Live

Cloud Storage

Index files from S3, GCS, Azure Blob, or Cloudflare R2

Auto

Indexing & Chunking

Each data source supports configurable:

Chunking strategy — Fixed size, sentence, paragraph, semantic, recursive, or custom regex
Chunk size — Number of characters per chunk (default varies by type)
Chunk overlap — Character overlap between chunks for context continuity
Embedding model — Choose from OpenAI, Cohere, Voyage AI, or local models (BGE, E5)

Refresh Intervals

Data sources marked "Auto" support scheduled re-indexing. Set the refresh interval in minutes, or leave at 0 for manual-only refresh.

How It's Used

Indexed data is available to the agent via the RAG Pipeline tool. When the agent receives a query, it searches the connected data sources for relevant context and includes it in its reasoning.

Previousstep 3 policies Nextstep 5 tools

Was this helpful?

Good morning

hashtagSupported Source Types

hashtagIndexing & Chunking

hashtagRefresh Intervals

hashtagHow It's Used

Supported Source Types

Indexing & Chunking

Refresh Intervals

How It's Used