Services Module
The services module contains business logic services that coordinate between different components.
Embedding Service
Embedding service for ForzaEmbed.
This module provides the EmbeddingService class that handles embedding generation and caching. It abstracts the different embedding clients and provides a unified interface for the processing pipeline.
Example
Generate embeddings using the service:
from src.services.embedding_service import EmbeddingService
service = EmbeddingService(db, config)
embed_func = service.get_embedding_function(model_config)
embeddings, time = service.get_or_create_embeddings(embed_func, "model", texts)
- class src.services.embedding_service.EmbeddingService(db, config)[source]
Bases:
objectHandle embedding generation and caching.
Provides a unified interface for generating embeddings using different backends (API, FastEmbed, Sentence Transformers, etc.) with automatic caching.
- db
The embedding database for caching.
- config
The application configuration.
- multiprocessing_config
Multiprocessing settings from config.
- __init__(db, config)[source]
Initialize the EmbeddingService.
- Parameters:
db (EmbeddingDatabase) – The embedding database for caching.
config (AppConfig) – The application configuration.
- get_embedding_function(model_config)[source]
Create the appropriate embedding function based on model type.
- Parameters:
model_config (ModelConfig) – Configuration for the embedding model.
- Returns:
A callable that takes a list of texts and returns embeddings.
- Raises:
ValueError – If the model type is unsupported or API model lacks base_url.
- Return type:
- get_or_create_embeddings(embedding_function, base_model_name, phrases)[source]
Retrieve embeddings from cache or generate and cache them.
Checks the database cache for existing embeddings. For phrases not in cache, generates new embeddings using the provided function and stores them.
- Parameters:
- Returns:
Dictionary mapping text hashes to embedding arrays.
Computation time in seconds for new embeddings.
- Return type:
A tuple containing
Similarity Service
Similarity calculation service for ForzaEmbed.
This module provides the SimilarityService class that handles various similarity and distance metric calculations between embeddings. It supports cosine, dot product, euclidean, manhattan, and chebyshev metrics.
Example
Calculate similarity between theme and phrase embeddings:
from src.services.similarity_service import SimilarityService
similarities = SimilarityService.calculate_similarity(
embed_themes, embed_phrases, "cosine"
)
validated = SimilarityService.validate_similarities(similarities, "cosine")
- class src.services.similarity_service.SimilarityService[source]
Bases:
objectHandle similarity calculations and validation.
Provides static methods for computing various similarity metrics between embedding matrices and validating/normalizing the results.
- static calculate_similarity(embed_themes, embed_phrases, metric)[source]
Calculate similarity between theme embeddings and phrase embeddings.
- Parameters:
- Returns:
Similarity matrix of shape (n_themes, n_phrases).
- Raises:
ValueError – If an unknown similarity metric is specified.
- Return type:
Visualization Service
Visualization service for ForzaEmbed.
This module provides the VisualizationService class that handles t-SNE coordinate generation and caching for embedding visualizations.
Example
Generate t-SNE visualization data:
from src.services.visualization_service import VisualizationService
service = VisualizationService(db)
tsne_data = service.get_or_create_tsne_data(
embeddings, "key", "file_id", similarities, 0.5
)
- class src.services.visualization_service.VisualizationService(db)[source]
Bases:
objectHandle visualization tasks like t-SNE coordinate generation.
Manages the computation and caching of t-SNE coordinates for embedding visualizations.
- db
The embedding database for caching t-SNE coordinates.
- __init__(db)[source]
Initialize the VisualizationService.
- Parameters:
db (EmbeddingDatabase) – The embedding database for caching.
- get_or_create_tsne_data(embeddings, tsne_key, file_id, similarities, threshold)[source]
Compute or retrieve t-SNE coordinates for a given combination.
Checks the database cache for existing t-SNE coordinates. If not found, computes new coordinates using sklearn’s TSNE implementation.
- Parameters:
embeddings (ndarray) – Embedding matrix of shape (n_samples, n_dims).
tsne_key (str) – Cache key for the t-SNE computation.
file_id (str) – Identifier for the file being visualized.
similarities (ndarray) – Similarity matrix for determining labels.
threshold (float) – Similarity threshold for labeling points.
- Returns:
- ‘x’: List of x-coordinates.
’y’: List of y-coordinates.
’labels’: List of threshold-based labels.
’similarities’: List of similarity scores.
’title’: Visualization title.
’threshold’: The threshold value used.
Returns None if embeddings have <= 1 sample or on error.
- Return type:
Dictionary containing t-SNE visualization data with keys