Backend is an umbrella module that encapsulates a unified way to work with the following functionalities:
Chat Models via (ChatModel class)
Embedding Models (coming soon)
Audio Models (coming soon)
Image Models (coming soon)
BeeAI framework’s backend is designed with a provider-based architecture, allowing you to switch between different AI service providers while maintaining a consistent API.
The following table depicts supported providers. Each provider requires specific configuration through environment variables. Ensure all required variables are set before initializing a provider.
The ChatModel class represents a Chat Large Language Model and provides methods for text generation, streaming responses, and more. You can initialize a chat model in multiple ways:
Method 1: Using the generic factory method
Copy
Ask AI
from beeai_framework.backend.chat import ChatModelmodel = ChatModel.from_name("ollama:llama3.1")
Method 2: Creating a specific provider model directly
Copy
Ask AI
from beeai_framework.adapters.ollama.backend.chat import OllamaChatModelmodel = OllamaChatModel("llama3.1")
You can configure various parameters for your chat model:
Copy
Ask AI
import asyncioimport sysimport tracebackfrom beeai_framework.adapters.ollama import OllamaChatModelfrom beeai_framework.backend import UserMessagefrom beeai_framework.errors import FrameworkErrorfrom examples.helpers.io import ConsoleReaderasync def main() -> None: llm = OllamaChatModel("llama3.1") # Optionally one may set llm parameters llm.parameters.max_tokens = 10000 # high number yields longer potential output llm.parameters.top_p = 0.1 # higher number yields more complex vocabulary, recommend only changing p or k llm.parameters.frequency_penalty = 0 # higher number yields reduction in word reptition llm.parameters.temperature = 0 # higher number yields greater randomness and variation llm.parameters.top_k = 0 # higher number yields more variance, recommend only changing p or k llm.parameters.n = 1 # higher number yields more choices llm.parameters.presence_penalty = 0 # higher number yields reduction in repetition of words llm.parameters.seed = 10 # can help produce similar responses if prompt and seed are always the same llm.parameters.stop_sequences = ["q", "quit", "ahhhhhhhhh"] # stops the model on input of any of these strings llm.parameters.stream = False # determines whether or not to use streaming to receive incremental data reader = ConsoleReader() for prompt in reader: response = await llm.create(messages=[UserMessage(prompt)]) reader.write("LLM 🤖 (txt) : ", response.get_text_content()) reader.write("LLM 🤖 (raw) : ", "\n".join([str(msg.to_plain()) for msg in response.messages]))if __name__ == "__main__": try: asyncio.run(main()) except FrameworkError as e: traceback.print_exc() sys.exit(e.explain())
The most basic usage is to generate text responses:
Copy
Ask AI
from beeai_framework.adapters.ollama.backend.chat import OllamaChatModelfrom beeai_framework.backend.message import UserMessagemodel = OllamaChatModel("llama3.1")response = await model.create( messages=[UserMessage("what states are part of New England?")])print(response.get_text_content())
Execution parameters (those passed to model.create({...})) are superior to ones defined via config.
from beeai_framework.adapters.ollama.backend.chat import OllamaChatModelfrom beeai_framework.backend.message import UserMessagellm = OllamaChatModel("llama3.1")user_message = UserMessage("How many islands make up the country of Cape Verde?")response = await llm.create(messages=[user_message], stream=True) .on( "new_token", lambda data, event: print(data.value.get_text_content())) ))print("Full response", response.get_text_content())