Core RAG Pipeline Architecture

  1. Upload: Parses files into raw text.
  2. Embed & Chunk: Splits text into chunks, converts chunks to vectors via the embedding model, and saves vectors to the vector database.
  3. Query: Converts user query to a vector.
  4. Retrieve: Calculates similarity using Cosine Distance. Filters out low-score chunks based on similarity threshold. Retrieves top 4-6 raw text chunks.
  5. Generation: Appends retrieved text chunks to the LLM system prompt context window along with chat history and user query.

Limitations & Constraints

  • Context Windows: The LLM does not read the entire document set. It only receives the retrieved 4-6 text chunks relevant to the query vector.
  • Global Queries: Queries requiring scanning the entire document set (e.g., โ€œList the titles of all documentsโ€ or โ€œFind the top 3 items across all filesโ€) will fail under standard RAG.
  • Semantic Limitations: Retrieval relies on mathematical vector distance (Cosine Distance), not deep semantic understanding of intent.

Configuration Variables & Tuning Parameters

Adjust these settings within the workspace configuration to tune retrieval performance:

ParameterFunctionImpact
System PromptDirects LLM behaviorInstructs how to weight retrieved context versus base knowledge.
Embedding ModelGenerates vectors from textDetermines vector alignment accuracy and semantic representation.
Similarity ThresholdFilters low-score chunksHigher threshold restricts context to highly relevant matches; lower allows more context but increases noise.
Chunk Size / OverlapDefines text segment limitsAffects whether related information remains contiguous during step 3.