Embedding Clients

ForzaEmbed supports multiple embedding backends through a unified client interface.

FastEmbed Client

FastEmbed client for local embedding generation.

This module provides a client for generating embeddings using the FastEmbed library with GPU acceleration support and automatic fallback to CPU.

Example

Generate embeddings using FastEmbed:

from src.clients.fastembed_client import FastEmbedClient

embeddings = FastEmbedClient.get_embeddings(
    texts=["Hello world"],
    model_name="BAAI/bge-small-en-v1.5"
)

class src.clients.fastembed_client.FastEmbedClient[source]

Bases: object

Client for managing FastEmbed embedding models.

Implements singleton pattern for model instances to avoid reloading. Supports GPU acceleration with automatic CPU fallback.

_instances

Class-level cache of loaded model instances.

Type:: dict[str, fastembed.text.text_embedding.TextEmbedding]

classmethod get_instance(model_name)[source]

Get or create a FastEmbed model instance.

Attempts GPU acceleration first, falls back to CPU if unavailable. GPU detection is re-attempted each time to allow for dynamic GPU availability.

Parameters:: model_name (str) – Name of the FastEmbed model.
Returns:: Loaded TextEmbedding model instance.
Return type:: TextEmbedding

classmethod reset_instance(model_name)[source]

Reset a model instance to allow reloading with different settings.

Useful when GPU becomes available/unavailable and we want to re-attempt GPU loading.

Parameters:: model_name (str) – Name of the FastEmbed model to reset.

static get_embeddings(texts, model_name, expected_dimension=None, batch_size=32, max_tokens=None, pooling_strategy='max')[source]

Generate embeddings for a list of texts.

Parameters:

texts (list[str]) – List of texts to embed.
model_name (str) – Name of the FastEmbed model to use.
expected_dimension (int | None) – Expected embedding dimension for validation.
batch_size (int) – Number of texts to process at once (lower = less memory).
max_tokens (int | None) – Maximum number of tokens per text. When a text exceeds this limit, it will be split into chunks and recombined using the pooling_strategy. If None, uses model default.
pooling_strategy (str) – Strategy for combining chunk embeddings when text exceeds max_tokens. Options: “max” (default), “average”, “weighted”.

Returns:

List of embedding vectors as lists of floats.

Raises:

ValueError – If embedding dimension doesn’t match expected.

Return type:

list[list[float]]

Sentence Transformers Client

Sentence Transformers client for local embedding generation.

This module provides a client for generating embeddings using the sentence-transformers library with singleton pattern for model caching.

Example

Generate embeddings using Sentence Transformers:

from src.clients.sentencetransformers_client import SentenceTransformersClient

embeddings = SentenceTransformersClient.get_embeddings(
    texts=["Hello world"],
    model_name="all-MiniLM-L6-v2"
)

class src.clients.sentencetransformers_client.SentenceTransformersClient[source]

Bases: object

Client for managing local sentence-transformer models.

Implements singleton pattern for model instances to avoid reloading.

_instances

Class-level cache of loaded model instances.

Type:: Dict[str, sentence_transformers.sentence_transformer.model.SentenceTransformer]

classmethod get_instance(model_name)[source]

Get or create a SentenceTransformer model instance.

Parameters:: model_name (str) – Name of the sentence-transformer model.
Returns:: Loaded SentenceTransformer model instance.
Return type:: SentenceTransformer

classmethod get_embeddings(texts, model_name, expected_dimension=None, batch_size=32, max_tokens=None, pooling_strategy='max')[source]

Generate embeddings for a list of texts using a local model.

Automatically adds prefix for Jina models.

Parameters:

texts (list[str]) – List of texts to embed.
model_name (str) – Name of the sentence-transformer model.
expected_dimension (int | None) – Expected embedding dimension for validation.
batch_size (int) – Number of texts to process at once (lower = less memory).
max_tokens (int | None) – Maximum number of tokens per text. When a text exceeds this limit, it will be split into chunks and recombined using the pooling_strategy. If None, uses model default.
pooling_strategy (str) – Strategy for combining chunk embeddings when text exceeds max_tokens. Options: “max” (default), “average”, “weighted”.

Returns:

List of embedding vectors as lists of floats.

Raises:

ValueError – If embedding dimension doesn’t match expected.

Return type:

list[list[float]]

Transformers Client

Transformers client for local embedding generation.

This module provides a client for generating embeddings using the Hugging Face transformers library directly, with special handling for Jina models.

Example

Generate embeddings using Transformers:

from src.clients.transformers_client import TransformersClient

embeddings = TransformersClient.get_embeddings(
    texts=["Hello world"],
    model_name="BAAI/bge-small-en-v1.5"
)

src.clients.transformers_client.mean_pooling(token_embeddings, attention_mask)[source]

Perform mean pooling on token embeddings.

Parameters:

token_embeddings (Tensor) – Tensor of token-level embeddings.
attention_mask (Tensor) – Attention mask for the input tokens.

Returns:

Mean-pooled sentence embeddings tensor.

Return type:

Tensor

class src.clients.transformers_client.TransformersClient[source]

Bases: object

Client for managing local transformers embedding models.

Implements singleton pattern for model instances with special handling for Jina models and their task labels.

_instances

Class-level cache of loaded model and tokenizer instances.

Type:: Dict[str, Tuple[transformers.modeling_utils.PreTrainedModel, transformers.tokenization_python.PythonBackend]]

classmethod get_instance(model_name)[source]

Get or create a transformers model and tokenizer instance.

Parameters:: model_name (str) – Name of the transformers model.
Returns:: Tuple of (model, tokenizer) instances.
Return type:: Tuple[PreTrainedModel, PythonBackend]

classmethod get_embeddings(texts, model_name, expected_dimension=None)[source]

Generate embeddings using a local transformers model.

Handles special cases for Jina models including task labels and different output formats.

Parameters:

texts (list[str]) – List of texts to embed.
model_name (str) – Name of the transformers model.
expected_dimension (int | None) – Expected embedding dimension for validation.

Returns:

List of normalized embedding vectors as lists of floats.

Raises:

ValueError – If embedding dimension doesn’t match expected or embeddings cannot be extracted.

Return type:

list[list[float]]

Hugging Face Client

Hugging Face embedding client using transformers library.

This module provides functions for generating embeddings using generic Hugging Face models with mean pooling and normalization.

Example

Generate embeddings using a Hugging Face model:

from src.clients.huggingface_client import get_huggingface_embeddings

embeddings = get_huggingface_embeddings(
    texts=["Hello world"],
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

src.clients.huggingface_client.mean_pooling(model_output, attention_mask)[source]

Perform mean pooling on token embeddings to get sentence embedding.

Parameters:

model_output (Tensor) – Model output containing token embeddings.
attention_mask (Tensor) – Attention mask for the input tokens.

Returns:

Mean-pooled sentence embeddings tensor.

Return type:

Tensor

src.clients.huggingface_client.get_huggingface_embeddings(texts, model_name, expected_dimension=None)[source]

Generate embeddings using a generic Hugging Face model.

Loads the model and tokenizer, processes texts, and applies mean pooling with L2 normalization.

Parameters:

texts (List[str]) – List of texts to embed.
model_name (str) – Name of the Hugging Face model to use.
expected_dimension (int | None) – Expected embedding dimension for validation.

Returns:

List of normalized embedding vectors as lists of floats.

Raises:

ValueError – If embedding dimension doesn’t match expected.

Return type:

List[List[float]]

API Client

API client for production embedding services.

This module provides a client for obtaining embeddings from production APIs including OpenAI, Mistral, and VoyageAI. It handles authentication, batching, and automatic retry with batch size reduction on errors.

Example

Get embeddings from an API:

from src.clients.api_client import ProductionEmbeddingClient

client = ProductionEmbeddingClient(
    base_url="https://api.openai.com/v1",
    model="text-embedding-ada-002",
    expected_dimension=1536
)
embeddings = client.get_embeddings(["Hello", "World"])

class src.clients.api_client.ProductionEmbeddingClient(base_url, model, expected_dimension=None, timeout=30, initial_batch_size=None)[source]

Bases: object

Client for obtaining embeddings from production APIs.

Supports OpenAI-compatible APIs with automatic API key selection based on the model name. Implements automatic batch splitting and retries.

base_url: Base URL of the API.

model: Name of the embedding model.

expected_dimension: Expected embedding dimension for validation.

timeout: Request timeout in seconds.

session: Requests session with authentication headers.

__init__(base_url, model, expected_dimension=None, timeout=30, initial_batch_size=None)[source]

Initialize the ProductionEmbeddingClient.

Parameters:

base_url (str) – Base URL of the API.
model (str) – Name of the embedding model to use.
expected_dimension (int | None) – Expected dimension of embeddings for validation.
timeout (int) – Timeout for requests in seconds.
initial_batch_size (int | None) – Initial batch size for requests.

get_embeddings(texts)[source]

Retrieve embeddings for a list of texts via the API.

Implements automatic batch splitting for large requests.

Parameters:: texts (List[str]) – List of texts to embed.
Returns:: List of embedding vectors as lists of floats.
Return type:: List[List[float]]