Evaluation Metrics
ForzaEmbed uses clustering-based metrics to evaluate embedding quality.
Main Metrics Module
Evaluation metrics for text embeddings based on similarity scores.
This module provides functions for calculating clustering quality metrics on embedding spaces, particularly silhouette-based metrics that decompose into intra-cluster cohesion and inter-cluster separation.
Example
Calculate metrics for document embeddings:
from src.metrics.evaluation_metrics import calculate_all_metrics
metrics = calculate_all_metrics(ref_embeddings, doc_embeddings, doc_labels)
print(f"Silhouette score: {metrics['silhouette_score']}")
- src.metrics.evaluation_metrics.calculate_silhouette_metrics(embeddings, labels, metric='cosine')[source]
Calculate silhouette-based clustering metrics with normalized components.
This function decomposes the silhouette score into its constituent parts: intra-cluster distance (cohesion) and inter-cluster distance (separation), providing normalized versions for better interpretability.
- Parameters:
- Returns:
intra_cluster_distance_normalized: Normalized intra-cluster quality (0-1, higher is better).
inter_cluster_distance_normalized: Normalized inter-cluster separation (0-1, higher is better).
silhouette_score: Standard silhouette score (-1 to 1, higher is better).
- Return type:
Dictionary containing
- src.metrics.evaluation_metrics.calculate_all_metrics(ref_embeddings, doc_embeddings, doc_labels)[source]
Calculate minimal essential evaluation metrics.
This function computes only the core metrics needed for clustering evaluation: silhouette score and its decomposition (intra/inter cluster distances).
- Parameters:
- Returns:
silhouette_score
intra_cluster_distance_normalized
inter_cluster_distance_normalized
- Return type:
Dictionary containing silhouette-based metrics with keys
Silhouette Analysis
Silhouette score decomposition into intra-cluster and inter-cluster components.
This module provides functions for decomposing the silhouette score into its constituent parts: intra-cluster cohesion (a(i)) and inter-cluster separation (b(i)). This decomposition helps understand clustering quality in more detail than the aggregate silhouette score alone.
Example
Perform enhanced silhouette analysis:
from src.metrics.silhouette_decomposition import enhanced_silhouette_analysis
analysis = enhanced_silhouette_analysis(embeddings, labels)
print(f"Global metrics: {analysis['global_metrics']}")
print(f"Per-cluster: {analysis['cluster_analysis']}")
- src.metrics.silhouette_decomposition.decompose_silhouette_score(embeddings, labels)[source]
Decompose the silhouette score into its a(i) and b(i) components.
The silhouette score s(i) = (b(i) - a(i)) / max(a(i), b(i)) where: - a(i) = average intra-cluster distance (cohesion) - LOWER = BETTER - b(i) = average distance to nearest cluster - HIGHER = BETTER
- Parameters:
- Returns:
mean_intra_cluster_distance: Average a(i) across samples.
mean_inter_cluster_distance: Average b(i) across samples.
silhouette_score: Aggregate silhouette score.
intra_cluster_quality: Normalized cohesion (0-1, higher = better).
inter_cluster_separation: Normalized separation (0-1, higher = better).
- Return type:
Dictionary containing
- src.metrics.silhouette_decomposition.analyze_silhouette_by_cluster(embeddings, labels)[source]
Perform detailed silhouette score analysis per cluster.
- Parameters:
- Returns:
- mean_silhouette: Average silhouette score for the cluster.
std_silhouette: Standard deviation of silhouette scores.
min_silhouette: Minimum silhouette score in cluster.
max_silhouette: Maximum silhouette score in cluster.
size: Number of samples in the cluster.
proportion_positive: Fraction of samples with positive score.
Returns empty dict if fewer than 2 clusters or insufficient samples.
- Return type:
Dictionary mapping cluster label to its silhouette statistics
- src.metrics.silhouette_decomposition.enhanced_silhouette_analysis(embeddings, labels)[source]
Perform complete clustering analysis with silhouette decomposition.
Combines global silhouette decomposition with per-cluster analysis to provide a comprehensive view of clustering quality.
Note
Always uses ‘cosine’ as the distance metric for clustering analysis, regardless of the similarity metric used for embedding evaluation.
- Parameters:
- Returns:
global_metrics: Results from decompose_silhouette_score().
cluster_analysis: Results from analyze_silhouette_by_cluster().
- Return type:
Dictionary containing
Metric Descriptions
Silhouette Score
The silhouette score measures how well-defined the clusters are. It ranges from -1 to 1:
1.0: Perfect clustering - points are far from other clusters
0.0: Overlapping clusters - points are on cluster boundaries
-1.0: Poor clustering - points may be assigned to wrong clusters
Formula:
where:
\(a(i)\) = average distance to points in the same cluster (intra-cluster)
\(b(i)\) = average distance to points in nearest different cluster (inter-cluster)
Intra-Cluster Distance (Normalized)
Measures cohesion within clusters. Normalized to [0, 1]:
Higher values indicate tighter clustering
Formula: \(1 - \\frac{\\text{avg\\_intra\\_distance}}{\\text{max\\_distance}}\)
Inter-Cluster Distance (Normalized)
Measures separation between clusters. Normalized to [0, 1]:
Higher values indicate better separation
Formula: \(\\frac{\\text{avg\\_inter\\_distance}}{\\text{max\\_distance}}\)
Embedding Computation Time
Measures the total time (in seconds) required to compute embeddings for both:
Theme keywords
Document chunks
This metric helps identify performance bottlenecks across different embedding models and configurations.
Interpretation Guide
Good Configuration
A good embedding configuration should have:
Silhouette score > 0.5
Intra-cluster distance > 0.7
Inter-cluster distance > 0.6
Embedding time as low as possible for your use case
Poor Configuration
Warning signs of poor configuration:
Silhouette score < 0.3
Intra-cluster distance < 0.5
Inter-cluster distance < 0.4
Large variance in cluster sizes