data sources

Data sources connect external knowledge to your agents. Index websites, documents, APIs, databases, and more — then query them with semantic search at runtime.

Supported Types

Type

Description

Auto-Refresh

Website

Crawl and index web pages with configurable depth and patterns

✅

PDF / Document

Upload and index PDFs, Word docs, and text files

Manual

Git Repository

Index code, READMEs, and documentation from any Git repo

✅

CSV / JSON

Structured data with column mapping for embedding

✅

REST API

Fetch and index data from any HTTP endpoint

✅

SQL Database

Query and index rows from PostgreSQL, MySQL, SQLite, or MSSQL

✅

Vector Database

Connect existing Pinecone, Weaviate, Qdrant, Supabase, or Chroma stores

Live

Cloud Storage

Index files from S3, GCS, Azure Blob, or Cloudflare R2 buckets

✅

Chunking Strategies

When indexing text, ClawEngine splits content into chunks before generating embeddings:

Strategy

Description

Fixed Size

Split into chunks of N characters

Sentence

Split on sentence boundaries

Paragraph

Split on paragraph breaks

Semantic

AI-powered splitting based on topic shifts

Recursive

Hierarchical splitting — best for code

Custom

Define your own regex pattern

Embedding Models

Model

Provider

Dimensions

Cost per 1K tokens

text-embedding-3-small

OpenAI

1,536

$0.00002

text-embedding-3-large

OpenAI

3,072

$0.00013

embed-english-v3

Cohere

1,024

$0.0001

embed-multilingual-v3

Cohere

1,024

$0.0001

voyage-2

Voyage AI

1,024

$0.0001

BGE Small / Large

Local

384 / 1,024

Free

E5 Small / Large

Local

384 / 1,024

Free

Index Status

Each data source tracks its indexing state: pending → indexing → indexed. If indexing fails, the status shows failed with an error message. Stale sources (data changed since last index) show stale.

Previousconnectors Nextexport formats

Was this helpful?

Good afternoon

hashtagSupported Types

hashtagChunking Strategies

hashtagEmbedding Models

hashtagIndex Status

Supported Types

Chunking Strategies

Embedding Models

Index Status