Evaluation Metrics

ForzaEmbed uses clustering-based metrics to evaluate embedding quality.

Main Metrics Module

Evaluation metrics for text embeddings based on similarity scores.

This module provides functions for calculating clustering quality metrics on embedding spaces, particularly silhouette-based metrics that decompose into intra-cluster cohesion and inter-cluster separation.

Example

Calculate metrics for document embeddings:

from src.metrics.evaluation_metrics import calculate_all_metrics

metrics = calculate_all_metrics(ref_embeddings, doc_embeddings, doc_labels)
print(f"Silhouette score: {metrics['silhouette_score']}")
src.metrics.evaluation_metrics.calculate_silhouette_metrics(embeddings, labels, metric='cosine')[source]

Calculate silhouette-based clustering metrics with normalized components.

This function decomposes the silhouette score into its constituent parts: intra-cluster distance (cohesion) and inter-cluster distance (separation), providing normalized versions for better interpretability.

Parameters:
  • embeddings (ndarray) – The embeddings of the text chunks, shape (n_samples, n_dims).

  • labels (ndarray) – The theme label for each chunk, shape (n_samples,).

  • metric (str) – Distance metric to use for calculations. Defaults to ‘cosine’.

Returns:

  • intra_cluster_distance_normalized: Normalized intra-cluster quality (0-1, higher is better).

  • inter_cluster_distance_normalized: Normalized inter-cluster separation (0-1, higher is better).

  • silhouette_score: Standard silhouette score (-1 to 1, higher is better).

Return type:

Dictionary containing

src.metrics.evaluation_metrics.calculate_all_metrics(ref_embeddings, doc_embeddings, doc_labels)[source]

Calculate minimal essential evaluation metrics.

This function computes only the core metrics needed for clustering evaluation: silhouette score and its decomposition (intra/inter cluster distances).

Parameters:
  • ref_embeddings (ndarray) – Embeddings for reference themes, shape (n_themes, n_dims).

  • doc_embeddings (ndarray) – Embeddings for document chunks, shape (n_chunks, n_dims).

  • doc_labels (ndarray) – Theme labels for each document chunk, shape (n_chunks,).

Returns:

  • silhouette_score

  • intra_cluster_distance_normalized

  • inter_cluster_distance_normalized

Return type:

Dictionary containing silhouette-based metrics with keys

Silhouette Analysis

Silhouette score decomposition into intra-cluster and inter-cluster components.

This module provides functions for decomposing the silhouette score into its constituent parts: intra-cluster cohesion (a(i)) and inter-cluster separation (b(i)). This decomposition helps understand clustering quality in more detail than the aggregate silhouette score alone.

Example

Perform enhanced silhouette analysis:

from src.metrics.silhouette_decomposition import enhanced_silhouette_analysis

analysis = enhanced_silhouette_analysis(embeddings, labels)
print(f"Global metrics: {analysis['global_metrics']}")
print(f"Per-cluster: {analysis['cluster_analysis']}")
src.metrics.silhouette_decomposition.decompose_silhouette_score(embeddings, labels)[source]

Decompose the silhouette score into its a(i) and b(i) components.

The silhouette score s(i) = (b(i) - a(i)) / max(a(i), b(i)) where: - a(i) = average intra-cluster distance (cohesion) - LOWER = BETTER - b(i) = average distance to nearest cluster - HIGHER = BETTER

Parameters:
  • embeddings (ndarray) – Embedding matrix of shape (n_samples, n_features).

  • labels (ndarray) – Cluster labels of shape (n_samples,).

Returns:

  • mean_intra_cluster_distance: Average a(i) across samples.

  • mean_inter_cluster_distance: Average b(i) across samples.

  • silhouette_score: Aggregate silhouette score.

  • intra_cluster_quality: Normalized cohesion (0-1, higher = better).

  • inter_cluster_separation: Normalized separation (0-1, higher = better).

Return type:

Dictionary containing

src.metrics.silhouette_decomposition.analyze_silhouette_by_cluster(embeddings, labels)[source]

Perform detailed silhouette score analysis per cluster.

Parameters:
  • embeddings (ndarray) – Embedding matrix of shape (n_samples, n_features).

  • labels (ndarray) – Cluster labels of shape (n_samples,).

Returns:

  • mean_silhouette: Average silhouette score for the cluster.
    • std_silhouette: Standard deviation of silhouette scores.

    • min_silhouette: Minimum silhouette score in cluster.

    • max_silhouette: Maximum silhouette score in cluster.

    • size: Number of samples in the cluster.

    • proportion_positive: Fraction of samples with positive score.

Returns empty dict if fewer than 2 clusters or insufficient samples.

Return type:

Dictionary mapping cluster label to its silhouette statistics

src.metrics.silhouette_decomposition.enhanced_silhouette_analysis(embeddings, labels)[source]

Perform complete clustering analysis with silhouette decomposition.

Combines global silhouette decomposition with per-cluster analysis to provide a comprehensive view of clustering quality.

Note

Always uses ‘cosine’ as the distance metric for clustering analysis, regardless of the similarity metric used for embedding evaluation.

Parameters:
  • embeddings (ndarray) – Embedding matrix of shape (n_samples, n_features).

  • labels (ndarray) – Cluster labels of shape (n_samples,).

Returns:

  • global_metrics: Results from decompose_silhouette_score().

  • cluster_analysis: Results from analyze_silhouette_by_cluster().

Return type:

Dictionary containing

Metric Descriptions

Silhouette Score

The silhouette score measures how well-defined the clusters are. It ranges from -1 to 1:

  • 1.0: Perfect clustering - points are far from other clusters

  • 0.0: Overlapping clusters - points are on cluster boundaries

  • -1.0: Poor clustering - points may be assigned to wrong clusters

Formula:

\[\begin{split}s(i) = \\frac{b(i) - a(i)}{\\max(a(i), b(i))}\end{split}\]

where:

  • \(a(i)\) = average distance to points in the same cluster (intra-cluster)

  • \(b(i)\) = average distance to points in nearest different cluster (inter-cluster)

Intra-Cluster Distance (Normalized)

Measures cohesion within clusters. Normalized to [0, 1]:

  • Higher values indicate tighter clustering

  • Formula: \(1 - \\frac{\\text{avg\\_intra\\_distance}}{\\text{max\\_distance}}\)

Inter-Cluster Distance (Normalized)

Measures separation between clusters. Normalized to [0, 1]:

  • Higher values indicate better separation

  • Formula: \(\\frac{\\text{avg\\_inter\\_distance}}{\\text{max\\_distance}}\)

Embedding Computation Time

Measures the total time (in seconds) required to compute embeddings for both:

  • Theme keywords

  • Document chunks

This metric helps identify performance bottlenecks across different embedding models and configurations.

Interpretation Guide

Good Configuration

A good embedding configuration should have:

  • Silhouette score > 0.5

  • Intra-cluster distance > 0.7

  • Inter-cluster distance > 0.6

  • Embedding time as low as possible for your use case

Poor Configuration

Warning signs of poor configuration:

  • Silhouette score < 0.3

  • Intra-cluster distance < 0.5

  • Inter-cluster distance < 0.4

  • Large variance in cluster sizes