rag-anythingllm

Core RAG Pipeline Architecture

Upload: Parses files into raw text.
Embed & Chunk: Splits text into chunks, converts chunks to vectors via the embedding model, and saves vectors to the vector database.
Query: Converts user query to a vector.
Retrieve: Calculates similarity using Cosine Distance. Filters out low-score chunks based on similarity threshold. Retrieves top 4-6 raw text chunks.
Generation: Appends retrieved text chunks to the LLM system prompt context window along with chat history and user query.

Context Windows: The LLM does not read the entire document set. It only receives the retrieved 4-6 text chunks relevant to the query vector.
Global Queries: Queries requiring scanning the entire document set (e.g., “List the titles of all documents” or “Find the top 3 items across all files”) will fail under standard RAG.
Semantic Limitations: Retrieval relies on mathematical vector distance (Cosine Distance), not deep semantic understanding of intent.

Adjust these settings within the workspace configuration to tune retrieval performance:

Parameter	Function	Impact
System Prompt	Directs LLM behavior	Instructs how to weight retrieved context versus base knowledge.
Embedding Model	Generates vectors from text	Determines vector alignment accuracy and semantic representation.
Similarity Threshold	Filters low-score chunks	Higher threshold restricts context to highly relevant matches; lower allows more context but increases noise.
Chunk Size / Overlap	Defines text segment limits	Affects whether related information remains contiguous during step 3.