Overview

Retrieval-Augmented Generation (RAG) is a powerful paradigm that enhances large language models by providing them with relevant information from external knowledge sources. This approach has become essential for enterprise AI applications that need to work with specific, up-to-date, or domain-specific information that wasn’t part of the model’s training data. RAG addresses key limitations of traditional LLMs:
  • Knowledge cutoffs - Access the most current information
  • Domain expertise - Integrate specialized knowledge bases
  • Factual accuracy - Reduce hallucinations with grounded responses
  • Scalability - Work with vast document collections efficiently
Enterprises rely on RAG for applications like customer support, document analysis, knowledge management, and intelligent search systems.
Location within the framework: beeai_framework/rag.
RAG is most effective when document chunking and retrieval strategies are tailored to your specific problem domain. It’s recommended to experiment with different configurations such as chunk sizes, overlap settings, and retrieval parameters. Future releases of BeeAI will provide enhanced capabilities to streamline this optimization process.

Philosophy

BeeAI Framework’s approach to RAG emphasizes integration over invention. Rather than building RAG components from scratch, we provide seamless adapters for proven, production-ready solutions from leading platforms like LangChain and Llama-Index. This philosophy offers several advantages:
  • Leverage existing expertise - Use battle-tested implementations
  • Faster time-to-market - Focus on your application logic, not infrastructure
  • Community support - Benefit from extensive documentation and community
  • Flexibility - Switch between providers as needs evolve

Installation

To use RAG components, install the framework with the RAG extras:
pip install "beeai-framework[rag]"

RAG Components

The following table outlines the key RAG components available in the BeeAI Framework:
ComponentDescriptionCompatibilityFuture Compatibility
Document LoadersResponsible for loading content from different formats and sources such as PDFs, web pages, and structured text filesWIPBeeAI, LangChain
Text SplittersSplits long documents into workable chunks using various strategies, e.g. fixed length or preserving contextWIPBeeAI, LangChain
DocumentThe basic data structure to house text content, metadata, and relevant scores for retrieval operationsBeeAI-
Vector StoreUsed to store document embeddings and retrieve them based on semantic similarity using embedding distanceLangChainBeeAI, Llama-Index
Document ProcessorsUsed to process and refine documents during the retrieval-generation lifecycle including reranking and filteringLlama-Index-

Dynamic Module Loading

BeeAI Framework provides a dynamic module loading system that allows you to instantiate RAG components using string identifiers. This approach enables configuration-driven architectures and easy provider switching. The from_name method uses the format provider:ClassName where:
  • provider identifies the integration module (e.g., “beeai”, “langchain”)
  • ClassName specifies the exact class to instantiate
Dynamic loading enables you to switch between different vector store implementations without changing your application code - just update the configuration string.

BeeAI Vector Store

Python
import asyncio
import sys
import traceback

from beeai_framework.adapters.beeai.backend.vector_store import TemporalVectorStore
from beeai_framework.adapters.langchain.mappers.documents import lc_document_to_document
from beeai_framework.backend.embedding import EmbeddingModel
from beeai_framework.backend.vector_store import VectorStore
from beeai_framework.errors import FrameworkError

# LC dependencies - to be swapped with BAI dependencies
try:
    from langchain_community.document_loaders import UnstructuredMarkdownLoader
    from langchain_text_splitters import RecursiveCharacterTextSplitter
except ModuleNotFoundError as e:
    raise ModuleNotFoundError(
        "Optional modules are not found.\nRun 'pip install \"beeai-framework[rag]\"' to install."
    ) from e


async def main() -> None:
    embedding_model = EmbeddingModel.from_name("watsonx:ibm/slate-125m-english-rtrvr-v2", truncate_input_tokens=500)

    # Document loading
    loader = UnstructuredMarkdownLoader(file_path="docs/modules/agents.mdx")
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=1000)
    all_splits = text_splitter.split_documents(docs)
    documents = [lc_document_to_document(document) for document in all_splits]
    print(f"Loaded {len(documents)} documents")

    vector_store: TemporalVectorStore = VectorStore.from_name(
        name="beeai:TemporalVectorStore", embedding_model=embedding_model
    )  # type: ignore[assignment]
    _ = await vector_store.add_documents(documents=documents)


if __name__ == "__main__":
    try:
        asyncio.run(main())
    except FrameworkError as e:
        traceback.print_exc()
        sys.exit(e.explain())

Native BeeAI modules can be loaded directly by importing and instantiating the module, e.g. from beeai_framework.adapters.beeai.backend.vector_store import TemporalVectorStore.

Supported Provider’s Vector Store

# LangChain integration
vector_store = VectorStore.from_name(
    name="langchain:InMemoryVectorStore",
    embedding_model=embedding_model
)
For production deployments, consider implementing document caching and index optimization to improve response times.

Future Enhancements

The RAG Agent is designed for extensibility. The roadmap focuses on two main areas of improvement:

Fallback Strategies

Handle edge cases and improve robustness when standard retrieval approaches fall short:
  • Query rephrasing - Automatically rephrase queries for better retrieval when initial attempts yield poor results
  • Query decomposition - Break complex queries into simpler sub-queries for more targeted retrieval
  • Alternative retrieval methods - Implement backup strategies when semantic search doesn’t find relevant documents

Agentic Capabilities

Add autonomous reasoning and self-improvement capabilities for higher quality responses:
  • Reflection loops - Enable the agent to evaluate and improve its own responses
  • Iterative refinement - Allow multiple rounds of retrieval and generation for complex queries
  • Self-assessment - Implement confidence scoring and quality evaluation mechanisms

Examples