Core RAG Pipeline Architecture
- Upload: Parses files into raw text.
- Embed & Chunk: Splits text into chunks, converts chunks to vectors via the embedding model, and saves vectors to the vector database.
- Query: Converts user query to a vector.
- Retrieve: Calculates similarity using Cosine Distance. Filters out low-score chunks based on similarity threshold. Retrieves top 4-6 raw text chunks.
- Generation: Appends retrieved text chunks to the LLM system prompt context window along with chat history and user query.
Limitations & Constraints
- Context Windows: The LLM does not read the entire document set. It only receives the retrieved 4-6 text chunks relevant to the query vector.
- Global Queries: Queries requiring scanning the entire document set (e.g., โList the titles of all documentsโ or โFind the top 3 items across all filesโ) will fail under standard RAG.
- Semantic Limitations: Retrieval relies on mathematical vector distance (Cosine Distance), not deep semantic understanding of intent.
Configuration Variables & Tuning Parameters
Adjust these settings within the workspace configuration to tune retrieval performance:
| Parameter | Function | Impact |
|---|---|---|
| System Prompt | Directs LLM behavior | Instructs how to weight retrieved context versus base knowledge. |
| Embedding Model | Generates vectors from text | Determines vector alignment accuracy and semantic representation. |
| Similarity Threshold | Filters low-score chunks | Higher threshold restricts context to highly relevant matches; lower allows more context but increases noise. |
| Chunk Size / Overlap | Defines text segment limits | Affects whether related information remains contiguous during step 3. |