> ## Documentation Index
> Fetch the complete documentation index at: https://framework.beeai.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Backend

## Overview

`Backend` is an umbrella module that encapsulates a unified way to work with the following functionalities:

* Chat Models (`ChatModel` class)
* Embedding Models (`EmbeddingModel` class)
* Audio Models (coming soon)
* Image Models (coming soon)

BeeAI framework's backend is designed with a provider-based architecture, allowing you to switch between different AI service providers while maintaining a consistent API.

<Note>
  Supported in Python and TypeScript.
</Note>

***

## Supported providers

The following table depicts supported providers. Each provider requires specific configuration through environment variables. Ensure all required variables are set before initializing a provider.

| Name           | Chat | Embedding | Environment Variables                                                                                                                                                                                                                       |
| :------------- | :--- | :-------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Ollama         | ✅    | ✅         | OLLAMA\_CHAT\_MODEL<br />OLLAMA\_BASE\_URL                                                                                                                                                                                                  |
| OpenAI         | ✅    | ✅         | OPENAI\_CHAT\_MODEL<br />OPENAI\_EMBEDDING\_MODEL<br />OPENAI\_API\_BASE<br />**OPENAI\_API\_KEY**<br />OPENAI\_ORGANIZATION<br />OPENAI\_API\_HEADERS                                                                                      |
| IBM watsonx.ai | ✅    | ✅         | WATSONX\_CHAT\_MODEL<br />WATSONX\_API\_KEY<br />**WATSONX\_PROJECT\_ID**<br />WATSONX\_SPACE\_ID<br />WATSONX\_TOKEN<br />WATSONX\_ZENAPIKEY<br />WATSONX\_URL<br />WATSONX\_REGION                                                        |
| Anthropic      | ✅    | ✅         | ANTHROPIC\_CHAT\_MODEL<br />**ANTHROPIC\_API\_KEY**<br />ANTHROPIC\_API\_HEADERS                                                                                                                                                            |
| Groq           | ✅    | ✅         | GROQ\_CHAT\_MODEL<br />GROQ\_EMBEDDING\_MODEL<br />**GROQ\_API\_KEY**                                                                                                                                                                       |
| Amazon Bedrock | ✅    | ✅         | AWS\_CHAT\_MODEL<br />**AWS\_BEDROCK\_API\_KEY**<br />**AWS\_ACCESS\_KEY\_ID**<br />**AWS\_SECRET\_ACCESS\_KEY**<br />**AWS\_REGION**<br />AWS\_API\_HEADERS                                                                                |
| Google Vertex  | ✅    | ✅         | GOOGLE\_VERTEX\_CHAT\_MODEL<br />**GOOGLE\_VERTEX\_PROJECT**<br />**GOOGLE\_VERTEX\_LOCATION**<br />GOOGLE\_APPLICATION\_CREDENTIALS<br />GOOGLE\_APPLICATION\_CREDENTIALS\_JSON<br />GOOGLE\_CREDENTIALS<br />GOOGLE\_VERTEX\_API\_HEADERS |
| Azure OpenAI   | ✅    | ✅         | AZURE\_OPENAI\_CHAT\_MODEL<br />**AZURE\_OPENAI\_API\_KEY**<br />**AZURE\_OPENAI\_API\_BASE**<br />**AZURE\_OPENAI\_API\_VERSION**<br />AZURE\_AD\_TOKEN<br />AZURE\_API\_TYPE<br />AZURE\_API\_HEADERS                                     |
| xAI            | ✅    | ✅         | XAI\_CHAT\_MODEL<br />**XAI\_API\_KEY**                                                                                                                                                                                                     |
| Google Gemini  | ✅    | ✅         | GEMINI\_CHAT\_MODEL<br />**GEMINI\_API\_KEY**<br />GEMINI\_API\_HEADERS                                                                                                                                                                     |
| MistralAI      | ✅    | ✅         | MISTRALAI\_CHAT\_MODEL<br />MISTRALAI\_EMBEDDING\_MODEL<br />**MISTRALAI\_API\_KEY**<br />MISTRALAI\_API\_BASE                                                                                                                              |
| Transformers   | ✅    | ✅         | TRANSFORMERS\_CHAT\_MODEL<br />HF\_TOKEN                                                                                                                                                                                                    |
| MiniMax        | ✅    | ❌         | MINIMAX\_CHAT\_MODEL<br />**MINIMAX\_API\_KEY**<br />MINIMAX\_API\_BASE<br />MINIMAX\_API\_HEADERS                                                                                                                                          |

<Tip>
  If you don't see your provider raise an issue [here](https://github.com/i-am-bee/beeai-framework/issues).
  Meanwhile, you can use the Ollama for local models in [Python](https://github.com/i-am-bee/beeai-framework/blob/main/python/examples/backend/providers/ollama.py)
  or [TypeScript](https://github.com/i-am-bee/beeai-framework/blob/main/typescript/examples/backend/providers/ollama.ts) or the [Langchain adapter](#adding-a-provider-using-the-langchain-adapter) for hosted providers.
</Tip>

<Note>
  Google Gemini, MistralAI, and Transformers are supported in Python only. The Transformers chat model does not support tool calling.
</Note>

***

### Backend initialization

The `Backend` class serves as a central entry point to access models from your chosen provider. This example illustrate how to leverage the framework's unified interface for different provider model operations by showcasing various interaction patterns including:

* Basic chat completion
* Streaming responses with abort functionality
* Structured output generation
* Real-time response parsing
* Tool calling with external APIs
* Text embedding generation

<Tip>
  Explore more provider examples in [Python](https://github.com/i-am-bee/beeai-framework/tree/main/python/examples/backend/providers) or [TypeScript](https://github.com/i-am-bee/beeai-framework/tree/main/typescript/examples/backend/providers)
</Tip>

<CodeGroup>
  ```py Python [expandable] theme={null}
  import asyncio
  import datetime
  import sys
  import traceback

  from pydantic import BaseModel, Field

  from beeai_framework.adapters.openai import OpenAIChatModel, OpenAIEmbeddingModel
  from beeai_framework.backend import (
      ChatModel,
      ChatModelNewTokenEvent,
      ChatModelParameters,
      MessageToolResultContent,
      ToolMessage,
      UserMessage,
  )
  from beeai_framework.emitter import EventMeta
  from beeai_framework.errors import AbortError, FrameworkError
  from beeai_framework.parsers.field import ParserField
  from beeai_framework.parsers.line_prefix import LinePrefixParser, LinePrefixParserNode
  from beeai_framework.tools.weather import OpenMeteoTool, OpenMeteoToolInput
  from beeai_framework.utils import AbortSignal


  async def openai_from_name() -> None:
      llm = ChatModel.from_name("openai:gpt-4.1-mini")
      user_message = UserMessage("what states are part of New England?")
      response = await llm.run([user_message])
      print(response.get_text_content())


  async def openai_granite_from_name() -> None:
      llm = ChatModel.from_name("openai:gpt-4.1-mini")
      user_message = UserMessage("what states are part of New England?")
      response = await llm.run([user_message])
      print(response.get_text_content())


  async def openai_sync() -> None:
      llm = OpenAIChatModel("gpt-4.1-mini")
      user_message = UserMessage("what is the capital of Massachusetts?")
      response = await llm.run([user_message])
      print(response.get_text_content())


  async def openai_stream() -> None:
      llm = OpenAIChatModel("gpt-4.1-mini")
      user_message = UserMessage("How many islands make up the country of Cape Verde?")
      response = await llm.run([user_message], stream=True)
      print(response.get_text_content())


  async def openai_stream_abort() -> None:
      llm = OpenAIChatModel("gpt-4.1-mini")
      user_message = UserMessage("What is the smallest of the Cape Verde islands?")

      try:
          response = await llm.run([user_message], stream=True, signal=AbortSignal.timeout(0.5))

          if response is not None:
              print(response.get_text_content())
          else:
              print("No response returned.")
      except AbortError as err:
          print(f"Aborted: {err}")


  async def openai_structure() -> None:
      class TestSchema(BaseModel):
          answer: str = Field(description="your final answer")

      llm = OpenAIChatModel("gpt-4.1-mini")
      user_message = UserMessage("How many islands make up the country of Cape Verde?")
      response = await llm.run([user_message], response_format=TestSchema, stream=True)
      print(response.output_structured)


  async def openai_stream_parser() -> None:
      llm = OpenAIChatModel("gpt-4.1-mini")

      parser = LinePrefixParser(
          nodes={
              "test": LinePrefixParserNode(
                  prefix="Prefix: ", field=ParserField.from_type(str), is_start=True, is_end=True
              )
          }
      )

      async def on_new_token(data: ChatModelNewTokenEvent, event: EventMeta) -> None:
          await parser.add(data.value.get_text_content())

      user_message = UserMessage("Produce 3 lines each starting with 'Prefix: ' followed by a sentence and a new line.")
      await llm.run([user_message], stream=True).observe(lambda emitter: emitter.on("new_token", on_new_token))
      result = await parser.end()
      print(result)


  async def openai_tool_calling() -> None:
      llm = ChatModel.from_name("openai:gpt-4.1-mini", ChatModelParameters(stream=True, temperature=0))
      user_message = UserMessage(f"What is the current weather in Boston? Current date is {datetime.datetime.today()}.")
      weather_tool = OpenMeteoTool()
      response = await llm.run([user_message], tools=[weather_tool])
      tool_call_msg = response.get_tool_calls()[0]
      print(tool_call_msg.model_dump())
      tool_response = await weather_tool.run(OpenMeteoToolInput(location_name="Boston"))
      tool_response_msg = ToolMessage(
          MessageToolResultContent(
              result=tool_response.get_text_content(),
              tool_name=weather_tool.name,
              tool_call_id=response.get_tool_calls()[0].id,
          )
      )
      print(tool_response_msg.to_plain())
      final_response = await llm.run([user_message, *response.output, tool_response_msg], tools=[])
      print(final_response.get_text_content())


  async def openai_embedding() -> None:
      embedding_llm = OpenAIEmbeddingModel()

      response = await embedding_llm.create(["Text", "to", "embed"])

      for row in response.embeddings:
          print(*row)


  async def openai_cloning() -> None:
      llm = OpenAIChatModel("gpt-4.1-mini")
      await llm.clone()

      embedding_llm = OpenAIEmbeddingModel()
      await embedding_llm.clone()


  async def openai_file_example() -> None:
      llm = ChatModel.from_name("openai:gpt-4.1-mini")
      data_uri = "data:application/pdf;base64,JVBERi0xLjQKMSAwIG9iago8PC9UeXBlIC9DYXRhbG9nCi9QYWdlcyAyIDAgUgo+PgplbmRvYmoKMiAwIG9iago8PC9UeXBlIC9QYWdlcwovS2lkcyBbMyAwIFJdCi9Db3VudCAxCj4+CmVuZG9iagozIDAgb2JqCjw8L1R5cGUgL1BhZ2UKL1BhcmVudCAyIDAgUgovTWVkaWFCb3ggWzAgMCA1OTUgODQyXQovQ29udGVudHMgNSAwIFIKL1Jlc291cmNlcyA8PC9Qcm9jU2V0IFsvUERGIC9UZXh0XQovRm9udCA8PC9GMSA0IDAgUj4+Cj4+Cj4+CmVuZG9iago0IDAgb2JqCjw8L1R5cGUgL0ZvbnQKL1N1YnR5cGUgL1R5cGUxCi9OYW1lIC9GMQovQmFzZUZvbnQgL0hlbHZldGljYQovRW5jb2RpbmcgL01hY1JvbWFuRW5jb2RpbmcKPj4KZW5kb2JqCjUgMCBvYmoKPDwvTGVuZ3RoIDUzCj4+CnN0cmVhbQpCVAovRjEgMjAgVGYKMjIwIDQwMCBUZAooRHVtbXkgUERGKSBUagpFVAplbmRzdHJlYW0KZW5kb2JqCnhyZWYKMCA2CjAwMDAwMDAwMDAgNjU1MzUgZgowMDAwMDAwMDA5IDAwMDAwIG4KMDAwMDAwMDA2MyAwMDAwMCBuCjAwMDAwMDAxMjQgMDAwMDAgbgowMDAwMDAwMjc3IDAwMDAwIG4KMDAwMDAwMDM5MiAwMDAwMCBuCnRyYWlsZXIKPDwvU2l6ZSA2Ci9Sb290IDEgMCBSCj4+CnN0YXJ0eHJlZgo0OTUKJSVFT0YK"

      file_message = UserMessage.from_file(file_data=data_uri, format="text")
      print(file_message.to_plain())
      response = await llm.run([UserMessage("Read content of the file."), file_message])
      print(response.get_text_content())


  async def main() -> None:
      print("*" * 10, "openai_from_name")
      await openai_from_name()
      print("*" * 10, "openai_granite_from_name")
      await openai_granite_from_name()
      print("*" * 10, "openai_sync")
      await openai_sync()
      print("*" * 10, "openai_stream")
      await openai_stream()
      print("*" * 10, "openai_stream_abort")
      await openai_stream_abort()
      print("*" * 10, "openai_structure")
      await openai_structure()
      print("*" * 10, "openai_stream_parser")
      await openai_stream_parser()
      print("*" * 10, "openai_tool_calling")
      await openai_tool_calling()
      print("*" * 10, "openai_embedding")
      await openai_embedding()
      print("*" * 10, "openai_cloning")
      await openai_cloning()


  if __name__ == "__main__":
      try:
          asyncio.run(main())
      except FrameworkError as e:
          traceback.print_exc()
          sys.exit(e.explain())

  ```

  ```ts TypeScript [expandable] theme={null}
  import "dotenv/config.js";
  import { OpenAIChatModel } from "beeai-framework/adapters/openai/backend/chat";
  import { ToolMessage, UserMessage } from "beeai-framework/backend/message";
  import { ChatModel } from "beeai-framework/backend/chat";
  import { z } from "zod";
  import { ChatModelError } from "beeai-framework/backend/errors";
  import { OpenMeteoTool } from "beeai-framework/tools/weather/openMeteo";

  const llm = new OpenAIChatModel(
    "gpt-5-nano",
    {},
    // {
    //   baseURL: "OPENAI_BASE_URL",
    //   apiKey: "OPENAI_API_KEY",
    //   organization: "OPENAI_ORGANIZATION",
    //   project: "OPENAI_PROJECT",
    // },
  );

  llm.config({
    parameters: {
      maxTokens: 2048,
    },
  });

  async function openaiFromName() {
    const openaiLLM = await ChatModel.fromName("openai:gpt-5-nano");
    const response = await openaiLLM.create({
      messages: [new UserMessage("what states are part of New England?")],
    });
    console.info(response.getTextContent());
  }

  async function openaiSync() {
    const response = await llm.create({
      messages: [new UserMessage("what is the capital of Massachusetts?")],
    });
    console.info(response.getTextContent());
  }

  async function openaiStream() {
    const response = await llm.create({
      messages: [new UserMessage("How many islands make up the country of Cape Verde?")],
      stream: true,
    });
    console.info(response.getTextContent());
  }

  async function openaiAbort() {
    try {
      const response = await llm.create({
        messages: [new UserMessage("What is the smallest of the Cape Verde islands?")],
        stream: true,
        abortSignal: AbortSignal.timeout(1 * 500),
      });
      console.info(response.getTextContent());
    } catch (err) {
      if (err instanceof ChatModelError) {
        console.log("Aborted", { err });
      }
    }
  }

  async function openaiStructure() {
    const response = await llm.createStructure({
      schema: z.object({
        answer: z.string({ description: "your final answer" }),
      }),
      messages: [new UserMessage("How many islands make up the country of Cape Verde?")],
    });
    console.info(response.object);
  }

  async function openaiToolCalling() {
    const userMessage = new UserMessage(
      `What is the current weather in Boston? Current date is ${new Date().toISOString().split("T")[0]}.`,
    );
    const weatherTool = new OpenMeteoTool({ retryOptions: { maxRetries: 3 } });
    const response = await llm.create({
      messages: [userMessage],
      tools: [weatherTool],
      toolChoice: weatherTool,
    });
    const toolCallMsg = response.getToolCalls()[0];
    console.debug(JSON.stringify(toolCallMsg));
    const toolResponse = await weatherTool.run(toolCallMsg.input as any);
    const toolResponseMsg = new ToolMessage({
      type: "tool-result",
      output: { type: "text", value: toolResponse.getTextContent() },
      toolName: toolCallMsg.toolName,
      toolCallId: toolCallMsg.toolCallId,
    });
    console.info(toolResponseMsg.toPlain());
    const finalResponse = await llm.create({
      messages: [userMessage, ...response.messages, toolResponseMsg],
      tools: [],
    });
    console.info(finalResponse.getTextContent());
  }

  async function openaiDebug() {
    // Log every request
    llm.emitter.match("*", (value, event) =>
      console.debug(
        `Time: ${event.createdAt.toISOString()}`,
        `Event: ${event.name}`,
        `Data: ${JSON.stringify(value)}`,
      ),
    );

    const response = await llm.create({
      messages: [new UserMessage("Hello world!")],
    });
    console.info(response.messages[0].toPlain());
  }

  console.info(" openaiFromName".padStart(25, "*"));
  await openaiFromName();
  console.info(" openaiSync".padStart(25, "*"));
  await openaiSync();
  console.info(" openaiStream".padStart(25, "*"));
  await openaiStream();
  console.info(" openaiAbort".padStart(25, "*"));
  await openaiAbort();
  console.info(" openaiStructure".padStart(25, "*"));
  await openaiStructure();
  console.info(" openaiToolCalling".padStart(25, "*"));
  await openaiToolCalling();
  console.info(" openaiDebug".padStart(25, "*"));
  await openaiDebug();

  ```
</CodeGroup>

<Note>
  See the [events documentation](/modules/events) for more information on standard emitter events.
</Note>

***

## Chat model

The `ChatModel` class represents a Chat Language Model and provides methods for text generation, streaming responses, and more. You can initialize a chat model in multiple ways:

**Method 1: Using the `from_name` method**

<CodeGroup>
  ```py Python [expandable] theme={null}
  from beeai_framework.backend.chat import ChatModel

  model = ChatModel.from_name("ollama:llama3.1")
  ```

  ```ts TypeScript [expandable] theme={null}
  import { ChatModel } from "beeai-framework/backend/chat";

  const model = await ChatModel.fromName("ollama:granite3.3:8b");
  ```
</CodeGroup>

**Method 2: Directly specifying the provider class**

<CodeGroup>
  ```py Python [expandable] theme={null}
  from beeai_framework.adapters.ollama.backend.chat import OllamaChatModel

  model = OllamaChatModel("llama3.1")
  ```

  ```ts TypeScript [expandable] theme={null}
  import { OllamaChatModel } from "beeai-framework/adapters/ollama/backend/chat";

  const model = new OllamaChatModel("llama3.1");
  ```
</CodeGroup>

### File / Document Inputs (PDF etc.)

You can attach files (e.g. PDFs) to a `UserMessage` using the `MessageFileContent` part or the convenience factory `UserMessage.from_file`. Provide either a remote `file_id`/URL or an inline base64 data URI (`file_data`). Optionally specify a MIME `format`.

```py Python [expandable] theme={null}
from beeai_framework.backend import UserMessage, MessageFileContent

# Using a remote / previously uploaded file URL or id (flattened API)
msg_with_file_id = UserMessage([
  MessageFileContent(
    file_id="https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
    format="application/pdf",
  ),
  "What's this file about?",
])

# Same using the factory helper
msg_with_file_id_factory = UserMessage.from_file(
  file_id="https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
  format="application/pdf",
)

# Using inline base64 data (shortened example)
msg_with_file_data = UserMessage([
  MessageFileContent(
    file_data="data:application/pdf;base64,AAA...",
    format="application/pdf",
  ),
  "Summarize the document",
])

# Inline base64 with factory
msg_with_file_data_factory = UserMessage.from_file(
  file_data="data:application/pdf;base64,AAA...",
  format="application/pdf",
)
```

These content parts serialize to the flattened schema (legacy nested `{ "file": {...} }` removed):

```json theme={null}
{
  "type": "file",
  "file_id": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
  "format": "application/pdf"
}
```

If neither `file_id` nor `file_data` is supplied a validation error is raised.

### Chat model configuration

You can configure various parameters for your chat model.

<CodeGroup>
  ```python Python theme={null}
  import asyncio
  import sys
  import traceback

  from beeai_framework.adapters.ollama import OllamaChatModel
  from beeai_framework.backend import UserMessage
  from beeai_framework.errors import FrameworkError
  from examples.helpers.io import ConsoleReader


  async def main() -> None:
      llm = OllamaChatModel("llama3.1")

      #  Optionally one may set llm parameters
      llm.parameters.max_tokens = 10000  # high number yields longer potential output
      llm.parameters.top_p = 0.1  # higher number yields more complex vocabulary, recommend only changing p or k
      llm.parameters.frequency_penalty = 0  # higher number yields reduction in word reptition
      llm.parameters.temperature = 0  # higher number yields greater randomness and variation
      llm.parameters.top_k = 0  # higher number yields more variance, recommend only changing p or k
      llm.parameters.n = 1  # higher number yields more choices
      llm.parameters.presence_penalty = 0  # higher number yields reduction in repetition of words
      llm.parameters.seed = 10  # can help produce similar responses if prompt and seed are always the same
      llm.parameters.stop_sequences = ["q", "quit", "ahhhhhhhhh"]  # stops the model on input of any of these strings
      llm.parameters.stream = False  # determines whether or not to use streaming to receive incremental data

      reader = ConsoleReader()

      for prompt in reader:
          response = await llm.run([UserMessage(prompt)])
          reader.write("LLM 🤖 (txt) : ", response.get_text_content())
          reader.write("LLM 🤖 (raw) : ", "\n".join([str(msg.to_plain()) for msg in response.output]))


  if __name__ == "__main__":
      try:
          asyncio.run(main())
      except FrameworkError as e:
          traceback.print_exc()
          sys.exit(e.explain())

  ```

  ```ts TypeScript [expandable] theme={null}
  import "dotenv/config.js";
  import { createConsoleReader } from "examples/helpers/io.js";
  import { UserMessage } from "beeai-framework/backend/message";
  import { OllamaChatModel } from "beeai-framework/adapters/ollama/backend/chat";

  const llm = new OllamaChatModel("granite4:micro");

  //  Optionally one may set llm parameters
  llm.parameters.maxTokens = 10000; // high number yields longer potential output
  llm.parameters.topP = 0; // higher number yields more complex vocabulary, recommend only changing p or k
  llm.parameters.frequencyPenalty = 0; // higher number yields reduction in word reptition
  llm.parameters.temperature = 0; // higher number yields greater randomness and variation
  llm.parameters.topK = 0; // higher number yields more variance, recommend only changing p or k
  llm.parameters.n = 1; // higher number yields more choices
  llm.parameters.presencePenalty = 0; // higher number yields reduction in repetition of words
  llm.parameters.seed = 10; // can help produce similar responses if prompt and seed are always the same
  llm.parameters.stopSequences = ["q", "quit", "ahhhhhhhhh"]; // stops the model on input of any of these strings

  // alternatively
  llm.config({
    parameters: {
      maxTokens: 10000,
      // other parameters
    },
  });

  const reader = createConsoleReader();

  for await (const { prompt } of reader) {
    const response = await llm.create({
      messages: [new UserMessage(prompt)],
    });
    reader.write(`LLM 🤖 (txt) : `, response.getTextContent());
    reader.write(`LLM 🤖 (raw) : `, JSON.stringify(response.messages));
  }

  ```
</CodeGroup>

### Text generation

The most basic usage is to generate text responses:

<CodeGroup>
  ```py Python [expandable] theme={null}
  from beeai_framework.adapters.ollama.backend.chat import OllamaChatModel
  from beeai_framework.backend.message import UserMessage

  model = OllamaChatModel("llama3.1")
  response = await model.create(
      messages=[UserMessage("what states are part of New England?")]
  )

  print(response.get_text_content())
  ```

  ```ts TypeScript [expandable] theme={null}
  import { UserMessage } from "beeai-framework/backend/message";
  import { OllamaChatModel } from "beeai-framework/adapters/ollama/backend/chat";

  const llm = new OllamaChatModel("llama3.1");

  const response = await llm.create({
    messages: [new UserMessage("what states are part of New England?")],
  });

  console.log(response.getTextContent());
  ```
</CodeGroup>

<Note>
  Execution parameters (those passed to `model.create({...})`) take precedent over ones defined via `config`.
</Note>

### Streaming responses

For applications requiring real-time responses:

<CodeGroup>
  ```py Python [expandable] theme={null}
  from beeai_framework.adapters.ollama.backend.chat import OllamaChatModel
  from beeai_framework.backend.message import UserMessage

  llm = OllamaChatModel("llama3.1")
  user_message = UserMessage("How many islands make up the country of Cape Verde?")
  response = await llm.create(messages=[user_message], stream=True)
    .on(
      "new_token",
      lambda data, event: print(data.value.get_text_content()))
    )
  )
  print("Full response", response.get_text_content())
  ```

  ```ts TypeScript [expandable] theme={null}
  import "dotenv/config.js";
  import { createConsoleReader } from "examples/helpers/io.js";
  import { UserMessage } from "beeai-framework/backend/message";
  import { OllamaChatModel } from "beeai-framework/adapters/ollama/backend/chat";

  const llm = new OllamaChatModel("granite4:micro");

  const reader = createConsoleReader();

  for await (const { prompt } of reader) {
    const response = await llm
      .create({
        messages: [new UserMessage(prompt)],
      })
      .observe((emitter) =>
        emitter.match("*", (data, event) => {
          reader.write(`LLM 🤖 (event: ${event.name})`, JSON.stringify(data));

          // if you want to close the stream prematurely, just uncomment the following line
          // callbacks.abort()
        }),
      );

    reader.write(`LLM 🤖 (txt) : `, response.getTextContent());
    reader.write(`LLM 🤖 (raw) : `, JSON.stringify(response.messages));
  }

  ```
</CodeGroup>

### Structured generation

Generate structured data according to a schema:

<CodeGroup>
  ```py Python [expandable] theme={null}
  import asyncio
  import json
  import sys
  import traceback

  from pydantic import BaseModel, Field

  from beeai_framework.backend import ChatModel, UserMessage
  from beeai_framework.errors import FrameworkError


  async def main() -> None:
      model = ChatModel.from_name("ollama:granite4:micro")

      class ProfileSchema(BaseModel):
          first_name: str = Field(..., min_length=1)
          last_name: str = Field(..., min_length=1)
          address: str
          age: int
          hobby: str

      response = await model.run(
          [UserMessage("Generate a profile of a citizen of Europe.")], response_format=ProfileSchema
      )
      assert isinstance(response.output_structured, ProfileSchema)
      print(json.dumps(response.output_structured.model_dump(), indent=4))


  if __name__ == "__main__":
      try:
          asyncio.run(main())
      except FrameworkError as e:
          traceback.print_exc()
          sys.exit(e.explain())

  ```

  ```ts TypeScript [expandable] theme={null}
  import { ChatModel, UserMessage } from "beeai-framework/backend/core";
  import { z } from "zod";

  const model = await ChatModel.fromName("ollama:granite4:micro");
  const response = await model.createStructure({
    schema: z.union([
      z.object({
        firstName: z.string().min(1),
        lastName: z.string().min(1),
        address: z.string(),
        age: z.number().int().min(1),
        hobby: z.string(),
      }),
      z.object({
        error: z.string(),
      }),
    ]),
    messages: [new UserMessage("Generate a profile of a citizen of Europe.")],
  });
  console.log(response.object);

  ```
</CodeGroup>

### Tool calling

Integrate external tools with your AI model:

<CodeGroup>
  ```py Python [expandable] theme={null}
  import asyncio
  import json
  import re
  import sys
  import traceback

  from beeai_framework.backend import (
      AnyMessage,
      ChatModel,
      ChatModelParameters,
      MessageToolResultContent,
      SystemMessage,
      ToolMessage,
      UserMessage,
  )
  from beeai_framework.errors import FrameworkError
  from beeai_framework.tools import AnyTool, ToolOutput
  from beeai_framework.tools.search.duckduckgo import DuckDuckGoSearchTool
  from beeai_framework.tools.weather.openmeteo import OpenMeteoTool


  async def main() -> None:
      model = ChatModel.from_name("ollama:llama3.1", ChatModelParameters(temperature=0))
      tools: list[AnyTool] = [DuckDuckGoSearchTool(), OpenMeteoTool()]
      messages: list[AnyMessage] = [
          SystemMessage("You are a helpful assistant. Use tools to provide a correct answer."),
          UserMessage("What's the fastest marathon time?"),
      ]

      while True:
          response = await model.run(
              messages,
              tools=tools,
          )

          tool_calls = response.get_tool_calls()
          messages.extend(response.output)

          tool_results: list[ToolMessage] = []

          for tool_call in tool_calls:
              print(f"-> running '{tool_call.tool_name}' tool with {tool_call.args}")
              tool: AnyTool = next(tool for tool in tools if tool.name == tool_call.tool_name)
              assert tool is not None
              res: ToolOutput = await tool.run(json.loads(tool_call.args))
              result = res.get_text_content()
              print(f"<- got response from '{tool_call.tool_name}'", re.sub(r"\s+", " ", result)[:256] + " (truncated)")
              tool_results.append(
                  ToolMessage(
                      MessageToolResultContent(
                          result=result,
                          tool_name=tool_call.tool_name,
                          tool_call_id=tool_call.id,
                      )
                  )
              )

          messages.extend(tool_results)

          answer = response.get_text_content()

          if answer:
              print(f"Agent: {answer}")
              break


  if __name__ == "__main__":
      try:
          asyncio.run(main())
      except FrameworkError as e:
          traceback.print_exc()
          sys.exit(e.explain())

  ```

  ```ts TypeScript [expandable] theme={null}
  import "dotenv/config";
  import {
    ChatModel,
    Message,
    SystemMessage,
    ToolMessage,
    UserMessage,
  } from "beeai-framework/backend/core";
  import { DuckDuckGoSearchTool } from "beeai-framework/tools/search/duckDuckGoSearch";
  import { OpenMeteoTool } from "beeai-framework/tools/weather/openMeteo";
  import { AnyTool, ToolOutput } from "beeai-framework/tools/base";

  const model = await ChatModel.fromName("ollama:granite4:micro");
  const tools: AnyTool[] = [new DuckDuckGoSearchTool(), new OpenMeteoTool()];
  const messages: Message[] = [
    new SystemMessage("You are a helpful assistant. Use tools to provide a correct answer."),
    new UserMessage("What's the fastest marathon time?"),
  ];

  while (true) {
    const response = await model.create({
      messages,
      tools,
    });
    messages.push(...response.messages);

    const toolCalls = response.getToolCalls();
    const toolResults = await Promise.all(
      toolCalls.map(async ({ input, toolName, toolCallId }) => {
        console.log(`-> running '${toolName}' tool with ${JSON.stringify(input)}`);
        const tool = tools.find((tool) => tool.name === toolName)!;
        const response: ToolOutput = await tool.run(input as any);
        const result = response.getTextContent();
        console.log(
          `<- got response from '${toolName}'`,
          result.replaceAll(/\s+/g, " ").substring(0, 90).concat(" (truncated)"),
        );
        return new ToolMessage({
          type: "tool-result",
          output: { type: "text", value: result },
          toolName,
          toolCallId,
        });
      }),
    );
    messages.push(...toolResults);

    const answer = response.getTextContent();
    if (answer) {
      console.info(`Agent: ${answer}`);
      break;
    }
  }

  ```
</CodeGroup>

***

## Embedding model

The `EmbedingModel` class provides functionality for generating vector embeddings from text.

### Embedding model initialization

You can initialize an embedding model in multiple ways:

**Method 1: Using the `from_name` method**

<CodeGroup>
  ```py Python [expandable] theme={null}
  from beeai_framework.backend.embedding import EmbeddingModel

  model = EmbeddingModel.from_name("ollama:nomic-embed-text")
  ```

  ```ts TypeScript [expandable] theme={null}
  import { EmbeddingModel } from "beeai-framework/backend/embedding";

  const model = await EmbeddingModel.fromName("ollama:nomic-embed-text");
  ```
</CodeGroup>

**Method 2: Directly specifying the provider class**

<CodeGroup>
  ```py Python [expandable] theme={null}
  from beeai_framework.adapters.ollama.backend import OllamaEmbeddingModel

  model = OllamaEmbeddingModel("nomic-embed-text")
  ```

  ```ts TypeScript [expandable] theme={null}
  import { OpenAIEmbeddingModel } from "beeai-framework/adapters/openai/embedding";

  const model = new OpenAIEmbeddingModel(
    "text-embedding-3-large",
    {
      dimensions: 512,
      maxEmbeddingsPerCall: 5,
    },
    {
      baseURL: "your_custom_endpoint",
      compatibility: "compatible",
      headers: {
        CUSTOM_HEADER: "...",
      },
    },
  );
  ```
</CodeGroup>

### Embedding model usage

Generate embeddings for one or more text strings:

<CodeGroup>
  ```py Python [expandable] theme={null}
  from beeai_framework.backend.embedding import EmbeddingModel

  model = EmbeddingModel.from_name("ollama:nomic-embed-text")

  response = await model.create(["Hello world!", "Hello Bee!"])
  console.log(response.values)
  console.log(response.embeddings)
  ```

  ```ts TypeScript [expandable] theme={null}
  import { EmbeddingModel } from "beeai-framework/backend/embedding";

  const model = await EmbeddingModel.fromName("ollama:nomic-embed-text");

  const response = await model.create({
  	values: ["Hello world!", "Hello Bee!"],
  });
  console.log(response.values);
  console.log(response.embeddings);
  ```
</CodeGroup>

***

## Adding a Provider Using the LangChain Adapter

If your preferred provider isn't directly supported, you can use the LangChain adapter as a bridge as long as that provider has LangChain compatibility.

<CodeGroup>
  ```py Python [expandable] theme={null}
  import asyncio
  import json
  import sys
  import traceback
  from datetime import UTC, datetime

  from pydantic import BaseModel, Field

  from beeai_framework.adapters.langchain.backend.chat import LangChainChatModel
  from beeai_framework.backend import (
      AnyMessage,
      ChatModelNewTokenEvent,
      MessageToolResultContent,
      SystemMessage,
      ToolMessage,
      UserMessage,
  )
  from beeai_framework.emitter import EventMeta
  from beeai_framework.errors import AbortError, FrameworkError
  from beeai_framework.parsers.field import ParserField
  from beeai_framework.parsers.line_prefix import LinePrefixParser, LinePrefixParserNode
  from beeai_framework.tools.weather import OpenMeteoTool
  from beeai_framework.utils import AbortSignal

  # prevent import error for langchain_ollama (only needed in this context)
  cur_dir = sys.path.pop(0)
  while cur_dir in sys.path:
      sys.path.remove(cur_dir)

  from langchain_ollama.chat_models import ChatOllama as LangChainOllamaChat  # noqa: E402


  async def langchain_ollama_from_name() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      user_message = UserMessage("what states are part of New England?")
      response = await llm.run([user_message])
      print(response.get_text_content())


  async def langchain_ollama_granite_from_name() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      user_message = UserMessage("what states are part of New England?")
      response = await llm.run([user_message])
      print(response.get_text_content())


  async def langchain_ollama_sync() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      user_message = UserMessage("what is the capital of Massachusetts?")
      response = await llm.run([user_message])
      print(response.get_text_content())


  async def langchain_ollama_stream() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      user_message = UserMessage("How many islands make up the country of Cape Verde?")
      response = await llm.run([user_message], stream=True)
      print(response.get_text_content())


  async def langchain_ollama_stream_abort() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      user_message = UserMessage("What is the smallest of the Cape Verde islands?")

      try:
          response = await llm.run([user_message], stream=True, signal=AbortSignal.timeout(0.5))

          if response is not None:
              print(response.get_text_content())
          else:
              print("No response returned.")
      except AbortError as err:
          print(f"Aborted: {err}")


  async def langchain_ollama_structure() -> None:
      class TestSchema(BaseModel):
          answer: str = Field(description="your final answer")

      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      user_message = UserMessage("How many islands make up the country of Cape Verde?")
      response = await llm.run([user_message], response_format=TestSchema)
      print(response.output_structured)


  async def langchain_ollama_stream_parser() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)

      parser = LinePrefixParser(
          nodes={
              "test": LinePrefixParserNode(
                  prefix="Prefix: ", field=ParserField.from_type(str), is_start=True, is_end=True
              )
          }
      )

      async def on_new_token(data: ChatModelNewTokenEvent, event: EventMeta) -> None:
          await parser.add(data.value.get_text_content())

      user_message = UserMessage("Produce 3 lines each starting with 'Prefix: ' followed by a sentence and a new line.")
      await llm.run([user_message], stream=True).observe(lambda emitter: emitter.on("new_token", on_new_token))
      result = await parser.end()
      print(result)


  async def langchain_ollama_tool_calling() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      llm.parameters.stream = True
      weather_tool = OpenMeteoTool()
      messages: list[AnyMessage] = [
          SystemMessage(
              f"""You are a helpful assistant that uses tools to provide answers.
  Current date is {datetime.now(tz=UTC).date()!s}
  """
          ),
          UserMessage("What is the current weather in Berlin?"),
      ]
      response = await llm.run(messages, tools=[weather_tool], tool_choice="required")
      messages.extend(response.output)
      tool_call_msg = response.get_tool_calls()[0]
      print(tool_call_msg.model_dump())
      tool_response = await weather_tool.run(json.loads(tool_call_msg.args))
      tool_response_msg = ToolMessage(
          MessageToolResultContent(
              result=tool_response.get_text_content(), tool_name=tool_call_msg.tool_name, tool_call_id=tool_call_msg.id
          )
      )
      print(tool_response_msg.to_plain())
      final_response = await llm.run([*messages, tool_response_msg], tools=[])
      print(final_response.get_text_content())


  async def langchain_ollama_cloning() -> None:
      langchain_llm = LangChainOllamaChat(model="granite4:micro")
      llm = LangChainChatModel(langchain_llm)
      await llm.clone()


  async def main() -> None:
      print("*" * 10, "langchain_ollama_from_name")
      await langchain_ollama_from_name()
      print("*" * 10, "langchain_ollama_granite_from_name")
      await langchain_ollama_granite_from_name()
      print("*" * 10, "langchain_ollama_sync")
      await langchain_ollama_sync()
      print("*" * 10, "langchain_ollama_stream")
      await langchain_ollama_stream()
      print("*" * 10, "langchain_ollama_stream_abort")
      await langchain_ollama_stream_abort()
      print("*" * 10, "langchain_ollama_structure")
      await langchain_ollama_structure()
      print("*" * 10, "langchain_ollama_stream_parser")
      await langchain_ollama_stream_parser()
      print("*" * 10, "langchain_ollama_tool_calling")
      await langchain_ollama_tool_calling()
      print("*" * 10, "langchain_ollama_cloning")
      await langchain_ollama_cloning()


  if __name__ == "__main__":
      try:
          asyncio.run(main())
      except FrameworkError as e:
          traceback.print_exc()
          sys.exit(e.explain())

  ```

  ```ts TypeScript [expandable] theme={null}
  // NOTE: ensure you have installed following packages
  // - @langchain/core
  // - @langchain/cohere (or any other provider related package that you would like to use)
  // List of available providers: https://js.langchain.com/v0.2/docs/integrations/chat/

  import { LangChainChatModel } from "beeai-framework/adapters/langchain/backend/chat";
  // @ts-expect-error package not installed
  import { ChatCohere } from "@langchain/cohere";
  import "dotenv/config.js";
  import { ToolMessage, UserMessage } from "beeai-framework/backend/message";
  import { z } from "zod";
  import { ChatModelError } from "beeai-framework/backend/errors";
  import { OpenMeteoTool } from "beeai-framework/tools/weather/openMeteo";

  const llm = new LangChainChatModel(
    new ChatCohere({
      model: "command-r-plus",
      temperature: 0,
    }),
  );

  async function langchainSync() {
    const response = await llm.create({
      messages: [new UserMessage("what is the capital of Massachusetts?")],
    });
    console.info(response.getTextContent());
  }

  async function langchainStream() {
    const response = await llm.create({
      messages: [new UserMessage("How many islands make up the country of Cape Verde?")],
      stream: true,
    });
    console.info(response.getTextContent());
  }

  async function langchainAbort() {
    try {
      const response = await llm.create({
        messages: [new UserMessage("What is the smallest of the Cape Verde islands?")],
        stream: true,
        abortSignal: AbortSignal.timeout(1 * 1000),
      });
      console.info(response.getTextContent());
    } catch (err) {
      if (err instanceof ChatModelError) {
        console.log("Aborted", { err });
      }
    }
  }

  async function langchainStructure() {
    const response = await llm.createStructure({
      schema: z.object({
        answer: z.string({ description: "your final answer" }),
      }),
      messages: [new UserMessage("How many islands make up the country of Cape Verde?")],
    });
    console.info(response.object);
  }

  async function langchainToolCalling() {
    const userMessage = new UserMessage(
      `What is the current weather in Boston? Current date is ${new Date().toISOString().split("T")[0]}.`,
    );
    const weatherTool = new OpenMeteoTool({ retryOptions: { maxRetries: 3 } });
    const response = await llm.create({ messages: [userMessage], tools: [weatherTool] });
    const toolCallMsg = response.getToolCalls()[0];
    console.debug(JSON.stringify(toolCallMsg));
    const toolResponse = await weatherTool.run(toolCallMsg.input as any);
    const toolResponseMsg = new ToolMessage({
      type: "tool-result",
      output: { type: "text", value: toolResponse.getTextContent() },
      toolName: toolCallMsg.toolName,
      toolCallId: toolCallMsg.toolCallId,
    });
    console.info(toolResponseMsg.toPlain());
    const finalResponse = await llm.create({
      messages: [userMessage, ...response.messages, toolResponseMsg],
      tools: [],
    });
    console.info(finalResponse.getTextContent());
  }

  async function langchainDebug() {
    // Log every request
    llm.emitter.match("*", (value, event) =>
      console.debug(
        `Time: ${event.createdAt.toISOString()}`,
        `Event: ${event.name}`,
        `Data: ${JSON.stringify(value)}`,
      ),
    );

    const response = await llm.create({
      messages: [new UserMessage("Hello world!")],
    });
    console.info(response.messages[0].toPlain());
  }

  console.info(" langchainSync".padStart(25, "*"));
  await langchainSync();
  console.info(" langchainStream".padStart(25, "*"));
  await langchainStream();
  console.info(" langchainAbort".padStart(25, "*"));
  await langchainAbort();
  console.info(" langchainStructure".padStart(25, "*"));
  await langchainStructure();
  console.info(" langchainToolCalling".padStart(25, "*"));
  await langchainToolCalling();
  console.info(" langchainDebug".padStart(25, "*"));
  await langchainDebug();

  ```
</CodeGroup>

***

## Troubleshooting

Common issues and their solutions:

1. Authentication errors: Ensure all required environment variables are set correctly
2. Model not found: Verify that the model ID is correct and available for the selected provider

***

## Examples

<CardGroup cols={2}>
  <Card title="Python" icon="python" href="https://github.com/i-am-bee/beeai-framework/tree/main/python/examples/backend">
    Explore reference backend implementations in Python
  </Card>

  <Card title="TypeScript" icon="js" href="https://github.com/i-am-bee/beeai-framework/tree/main/typescript/examples/backend">
    Explore reference backend implementations in TypeScript
  </Card>
</CardGroup>