Services Module
The services module contains business logic services that coordinate between different components.
Embedding Service
Embedding service for ForzaEmbed.
This module provides the EmbeddingService class that handles embedding generation and caching. It abstracts the different embedding clients and provides a unified interface for the processing pipeline.
Example
Generate embeddings using the service:
from src.services.embedding_service import EmbeddingService
service = EmbeddingService(db, config)
embed_func = service.get_embedding_function(model_config)
embeddings, time = service.get_or_create_embeddings(embed_func, "model", texts)
- class src.services.embedding_service.EmbeddingService(db, config)[source]
Bases:
objectHandle embedding generation and caching.
Provides a unified interface for generating embeddings using different backends (API, FastEmbed, Sentence Transformers, etc.) with automatic caching.
- db
The embedding database for caching.
- config
The application configuration.
- multiprocessing_config
Multiprocessing settings from config.
- __init__(db, config)[source]
Initialize the EmbeddingService.
- Parameters:
db (EmbeddingDatabase) – The embedding database for caching.
config (AppConfig) – The application configuration.
- get_embedding_function(model_config)[source]
Create the appropriate embedding function based on model type.
- Parameters:
model_config (ModelConfig) – Configuration for the embedding model.
- Returns:
A callable that takes a list of texts and returns embeddings.
- Raises:
ValueError – If the model type is unsupported or API model lacks base_url.
- Return type:
- get_or_create_embeddings(embedding_function, base_model_name, phrases)[source]
Retrieve embeddings from cache or generate and cache them.
Checks the database cache for existing embeddings. For phrases not in cache, generates new embeddings using the provided function and stores them.
- Parameters:
- Returns:
Dictionary mapping text hashes to embedding arrays.
Computation time in seconds for new embeddings.
- Return type:
A tuple containing
Similarity Service
Similarity calculation service for ForzaEmbed.
This module provides the SimilarityService class that handles various similarity and distance metric calculations between embeddings. It supports cosine, dot product, euclidean, manhattan, and chebyshev metrics.
Example
Calculate similarity between theme and phrase embeddings:
from src.services.similarity_service import SimilarityService
similarities = SimilarityService.calculate_similarity(
embed_themes, embed_phrases, "cosine"
)
validated = SimilarityService.validate_similarities(similarities, "cosine")
- class src.services.similarity_service.SimilarityService[source]
Bases:
objectHandle similarity calculations and validation.
Provides static methods for computing various similarity metrics between embedding matrices and validating/normalizing the results.
- static calculate_similarity(embed_themes, embed_phrases, metric)[source]
Calculate similarity between theme embeddings and phrase embeddings.
- Parameters:
- Returns:
Similarity matrix of shape (n_themes, n_phrases).
- Raises:
ValueError – If an unknown similarity metric is specified.
- Return type:
Visualization Service
Visualization service for ForzaEmbed.
This module provides the VisualizationService class that handles dimensionality reduction (t-SNE, UMAP, PCA) and caching for embedding visualizations.
- class src.services.visualization_service.VisualizationService(db)[source]
Bases:
objectHandle visualization tasks like UMAP, PCA and t-SNE coordinate generation.
Manages the computation and caching of projection coordinates for embedding visualizations.
- db
The embedding database for caching coordinates.
- __init__(db)[source]
Initialize the VisualizationService.
- Parameters:
db (EmbeddingDatabase) – The embedding database for caching.
- get_or_create_projections(embeddings, base_key, file_id, similarities)[source]
Compute or retrieve projection coordinates (UMAP, t-SNE, PCA).
Checks the database cache for existing coordinates using method-specific keys.
- Parameters:
embeddings (ndarray) – Embedding matrix of shape (n_samples, n_dims).
base_key (str) – Base cache key for the computation.
file_id (str) – Identifier for the file being visualized.
similarities (ndarray) – Similarity matrix for determining labels.
threshold – Similarity threshold for labeling points.
- Returns:
Dictionary containing projection data for umap, tsne, and pca. Returns None if embeddings have <= 1 sample or on error.
- Return type: