Evaluation Metrics

ForzaEmbed uses clustering-based metrics to evaluate embedding quality.

Main Metrics Module

Evaluation metrics for text embeddings based on similarity scores.

This module provides functions for calculating clustering quality metrics on embedding spaces, particularly silhouette-based metrics that decompose into intra-cluster cohesion and inter-cluster separation.

Example

Calculate metrics for document embeddings:

from src.metrics.evaluation_metrics import calculate_all_metrics

metrics = calculate_all_metrics(ref_embeddings, doc_embeddings, doc_labels)
print(f"Silhouette score: {metrics['silhouette_score']}")

src.metrics.evaluation_metrics.calculate_silhouette_metrics(embeddings, labels, metric='cosine')[source]

Calculate silhouette-based clustering metrics with normalized components.

This function decomposes the silhouette score into its constituent parts: intra-cluster distance (cohesion) and inter-cluster distance (separation), providing normalized versions for better interpretability.

Parameters:

embeddings (ndarray) – The embeddings of the text chunks, shape (n_samples, n_dims).
labels (ndarray) – The theme label for each chunk, shape (n_samples,).
metric (str) – Distance metric to use for calculations. Defaults to ‘cosine’.

Returns:

intra_cluster_distance_normalized: Normalized intra-cluster quality (0-1, higher is better).
inter_cluster_distance_normalized: Normalized inter-cluster separation (0-1, higher is better).
silhouette_score: Standard silhouette score (-1 to 1, higher is better).

Return type:

Dictionary containing

src.metrics.evaluation_metrics.calculate_all_metrics(ref_embeddings, doc_embeddings, doc_labels)[source]

Calculate minimal essential evaluation metrics.

This function computes only the core metrics needed for clustering evaluation: silhouette score and its decomposition (intra/inter cluster distances).

Parameters:

ref_embeddings (ndarray) – Embeddings for reference themes, shape (n_themes, n_dims).
doc_embeddings (ndarray) – Embeddings for document chunks, shape (n_chunks, n_dims).
doc_labels (ndarray) – Theme labels for each document chunk, shape (n_chunks,).

Returns:

silhouette_score
intra_cluster_distance_normalized
inter_cluster_distance_normalized

Return type:

Dictionary containing silhouette-based metrics with keys

Silhouette Analysis

Silhouette score decomposition into intra-cluster and inter-cluster components.

This module provides functions for decomposing the silhouette score into its constituent parts: intra-cluster cohesion (a(i)) and inter-cluster separation (b(i)). This decomposition helps understand clustering quality in more detail than the aggregate silhouette score alone.

Example

Perform enhanced silhouette analysis:

from src.metrics.silhouette_decomposition import enhanced_silhouette_analysis

analysis = enhanced_silhouette_analysis(embeddings, labels)
print(f"Global metrics: {analysis['global_metrics']}")
print(f"Per-cluster: {analysis['cluster_analysis']}")

src.metrics.silhouette_decomposition.decompose_silhouette_score(embeddings, labels)[source]

Decompose the silhouette score into its a(i) and b(i) components.

The silhouette score s(i) = (b(i) - a(i)) / max(a(i), b(i)) where: - a(i) = average intra-cluster distance (cohesion) - LOWER = BETTER - b(i) = average distance to nearest cluster - HIGHER = BETTER

Parameters:

embeddings (ndarray) – Embedding matrix of shape (n_samples, n_features).
labels (ndarray) – Cluster labels of shape (n_samples,).

Returns:

mean_intra_cluster_distance: Average a(i) across samples.
mean_inter_cluster_distance: Average b(i) across samples.
silhouette_score: Aggregate silhouette score.
intra_cluster_quality: Normalized cohesion (0-1, higher = better).
inter_cluster_separation: Normalized separation (0-1, higher = better).

Return type:

Dictionary containing

src.metrics.silhouette_decomposition.analyze_silhouette_by_cluster(embeddings, labels)[source]

Perform detailed silhouette score analysis per cluster.

Parameters:

embeddings (ndarray) – Embedding matrix of shape (n_samples, n_features).
labels (ndarray) – Cluster labels of shape (n_samples,).

Returns:

mean_silhouette: Average silhouette score for the cluster.
- std_silhouette: Standard deviation of silhouette scores.
- min_silhouette: Minimum silhouette score in cluster.
- max_silhouette: Maximum silhouette score in cluster.
- size: Number of samples in the cluster.
- proportion_positive: Fraction of samples with positive score.

Returns empty dict if fewer than 2 clusters or insufficient samples.

Return type:

Dictionary mapping cluster label to its silhouette statistics

src.metrics.silhouette_decomposition.enhanced_silhouette_analysis(embeddings, labels)[source]

Perform complete clustering analysis with silhouette decomposition.

Combines global silhouette decomposition with per-cluster analysis to provide a comprehensive view of clustering quality.

Note

Always uses ‘cosine’ as the distance metric for clustering analysis, regardless of the similarity metric used for embedding evaluation.

Parameters:

embeddings (ndarray) – Embedding matrix of shape (n_samples, n_features).
labels (ndarray) – Cluster labels of shape (n_samples,).

Returns:

global_metrics: Results from decompose_silhouette_score().
cluster_analysis: Results from analyze_silhouette_by_cluster().

Return type:

Dictionary containing

Metric Descriptions

Silhouette Score

The silhouette score measures how well-defined the clusters are. It ranges from -1 to 1:

1.0: Perfect clustering - points are far from other clusters
0.0: Overlapping clusters - points are on cluster boundaries
-1.0: Poor clustering - points may be assigned to wrong clusters

Formula:

\[\begin{split}s(i) = \\frac{b(i) - a(i)}{\\max(a(i), b(i))}\end{split}\]

where:

\(a(i)\) = average distance to points in the same cluster (intra-cluster)
\(b(i)\) = average distance to points in nearest different cluster (inter-cluster)

Intra-Cluster Distance (Normalized)

Measures cohesion within clusters. Normalized to [0, 1]:

Higher values indicate tighter clustering
Formula: \(1 - \\frac{\\text{avg\\_intra\\_distance}}{\\text{max\\_distance}}\)

Inter-Cluster Distance (Normalized)

Measures separation between clusters. Normalized to [0, 1]:

Higher values indicate better separation
Formula: \(\\frac{\\text{avg\\_inter\\_distance}}{\\text{max\\_distance}}\)

Embedding Computation Time

Measures the total time (in seconds) required to compute embeddings for both:

Theme keywords
Document chunks

This metric helps identify performance bottlenecks across different embedding models and configurations.

Interpretation Guide

Good Configuration

A good embedding configuration should have:

Silhouette score > 0.5
Intra-cluster distance > 0.7
Inter-cluster distance > 0.6
Embedding time as low as possible for your use case

Poor Configuration

Warning signs of poor configuration:

Silhouette score < 0.3
Intra-cluster distance < 0.5
Inter-cluster distance < 0.4
Large variance in cluster sizes