Embedding Clients
ForzaEmbed supports multiple embedding backends through a unified client interface.
FastEmbed Client
FastEmbed client for local embedding generation.
This module provides a client for generating embeddings using the FastEmbed library with GPU acceleration support and automatic fallback to CPU.
Example
Generate embeddings using FastEmbed:
from src.clients.fastembed_client import FastEmbedClient
embeddings = FastEmbedClient.get_embeddings(
texts=["Hello world"],
model_name="BAAI/bge-small-en-v1.5"
)
- class src.clients.fastembed_client.FastEmbedClient[source]
Bases:
objectClient for managing FastEmbed embedding models.
Implements singleton pattern for model instances to avoid reloading. Supports GPU acceleration with automatic CPU fallback.
- _instances
Class-level cache of loaded model instances.
- classmethod get_instance(model_name)[source]
Get or create a FastEmbed model instance.
Attempts GPU acceleration first, falls back to CPU if unavailable.
- Parameters:
model_name (str) – Name of the FastEmbed model.
- Returns:
Loaded TextEmbedding model instance.
- Return type:
TextEmbedding
- static get_embeddings(texts, model_name, expected_dimension=None)[source]
Generate embeddings for a list of texts.
- Parameters:
- Returns:
List of embedding vectors as lists of floats.
- Raises:
ValueError – If embedding dimension doesn’t match expected.
- Return type:
Sentence Transformers Client
Sentence Transformers client for local embedding generation.
This module provides a client for generating embeddings using the sentence-transformers library with singleton pattern for model caching.
Example
Generate embeddings using Sentence Transformers:
from src.clients.sentencetransformers_client import SentenceTransformersClient
embeddings = SentenceTransformersClient.get_embeddings(
texts=["Hello world"],
model_name="all-MiniLM-L6-v2"
)
- class src.clients.sentencetransformers_client.SentenceTransformersClient[source]
Bases:
objectClient for managing local sentence-transformer models.
Implements singleton pattern for model instances to avoid reloading.
- _instances
Class-level cache of loaded model instances.
- Type:
Dict[str, sentence_transformers.SentenceTransformer.SentenceTransformer]
- classmethod get_instance(model_name)[source]
Get or create a SentenceTransformer model instance.
- Parameters:
model_name (str) – Name of the sentence-transformer model.
- Returns:
Loaded SentenceTransformer model instance.
- Return type:
SentenceTransformer
- classmethod get_embeddings(texts, model_name, expected_dimension=None)[source]
Generate embeddings for a list of texts using a local model.
Automatically adds prefix for Jina models.
- Parameters:
- Returns:
List of embedding vectors as lists of floats.
- Raises:
ValueError – If embedding dimension doesn’t match expected.
- Return type:
Transformers Client
Transformers client for local embedding generation.
This module provides a client for generating embeddings using the Hugging Face transformers library directly, with special handling for Jina models.
Example
Generate embeddings using Transformers:
from src.clients.transformers_client import TransformersClient
embeddings = TransformersClient.get_embeddings(
texts=["Hello world"],
model_name="BAAI/bge-small-en-v1.5"
)
- src.clients.transformers_client.mean_pooling(token_embeddings, attention_mask)[source]
Perform mean pooling on token embeddings.
- Parameters:
token_embeddings (Tensor) – Tensor of token-level embeddings.
attention_mask (Tensor) – Attention mask for the input tokens.
- Returns:
Mean-pooled sentence embeddings tensor.
- Return type:
Tensor
- class src.clients.transformers_client.TransformersClient[source]
Bases:
objectClient for managing local transformers embedding models.
Implements singleton pattern for model instances with special handling for Jina models and their task labels.
- _instances
Class-level cache of loaded model and tokenizer instances.
- Type:
Dict[str, Tuple[transformers.modeling_utils.PreTrainedModel, transformers.tokenization_python.PythonBackend]]
- classmethod get_instance(model_name)[source]
Get or create a transformers model and tokenizer instance.
- classmethod get_embeddings(texts, model_name, expected_dimension=None)[source]
Generate embeddings using a local transformers model.
Handles special cases for Jina models including task labels and different output formats.
- Parameters:
- Returns:
List of normalized embedding vectors as lists of floats.
- Raises:
ValueError – If embedding dimension doesn’t match expected or embeddings cannot be extracted.
- Return type:
Hugging Face Client
Hugging Face embedding client using transformers library.
This module provides functions for generating embeddings using generic Hugging Face models with mean pooling and normalization.
Example
Generate embeddings using a Hugging Face model:
from src.clients.huggingface_client import get_huggingface_embeddings
embeddings = get_huggingface_embeddings(
texts=["Hello world"],
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
- src.clients.huggingface_client.mean_pooling(model_output, attention_mask)[source]
Perform mean pooling on token embeddings to get sentence embedding.
- Parameters:
model_output (Tensor) – Model output containing token embeddings.
attention_mask (Tensor) – Attention mask for the input tokens.
- Returns:
Mean-pooled sentence embeddings tensor.
- Return type:
Tensor
- src.clients.huggingface_client.get_huggingface_embeddings(texts, model_name, expected_dimension=None)[source]
Generate embeddings using a generic Hugging Face model.
Loads the model and tokenizer, processes texts, and applies mean pooling with L2 normalization.
- Parameters:
- Returns:
List of normalized embedding vectors as lists of floats.
- Raises:
ValueError – If embedding dimension doesn’t match expected.
- Return type:
API Client
API client for production embedding services.
This module provides a client for obtaining embeddings from production APIs including OpenAI, Mistral, and VoyageAI. It handles authentication, batching, and automatic retry with batch size reduction on errors.
Example
Get embeddings from an API:
from src.clients.api_client import ProductionEmbeddingClient
client = ProductionEmbeddingClient(
base_url="https://api.openai.com/v1",
model="text-embedding-ada-002",
expected_dimension=1536
)
embeddings = client.get_embeddings(["Hello", "World"])
- class src.clients.api_client.ProductionEmbeddingClient(base_url, model, expected_dimension=None, timeout=30, initial_batch_size=None)[source]
Bases:
objectClient for obtaining embeddings from production APIs.
Supports OpenAI-compatible APIs with automatic API key selection based on the model name. Implements automatic batch splitting and retries.
- base_url
Base URL of the API.
- model
Name of the embedding model.
- expected_dimension
Expected embedding dimension for validation.
- timeout
Request timeout in seconds.
- session
Requests session with authentication headers.