Reporting Module

The reporting module handles visualization and export of results.

Report Generator

Report generation module for ForzaEmbed.

This module provides the ReportGenerator class that handles the generation of all reports and visualizations, including comparison charts, radar charts, and interactive web pages.

Example

Generate reports from processing results:

from src.reporting.reporting import ReportGenerator

generator = ReportGenerator(db, config, output_dir, "config_name")
generator.generate_all(top_n=25, single_file=False)
class src.reporting.reporting.ReportGenerator(db, config, output_dir, config_name)[source]

Bases: object

Handle the generation of all reports and visualizations.

Coordinates the generation of comparison charts, radar charts, filtered markdowns, and interactive web pages from processing results.

db

The embedding database containing results.

config

Configuration dictionary with report settings.

output_dir

Directory path for output files.

config_name

Name of the configuration for file prefixes.

similarity_threshold

Threshold for similarity-based filtering.

data_aggregator

Helper for aggregating data from database.

markdown_filter

Helper for generating filtered markdowns.

__init__(db, config, output_dir, config_name)[source]

Initialize the ReportGenerator.

Parameters:
  • db (EmbeddingDatabase) – The embedding database containing results.

  • config (dict[str, Any]) – Configuration dictionary with report settings.

  • output_dir (Path) – Directory path for output files.

  • config_name (str) – Name of the configuration for file prefixes.

generate_all(top_n=25, single_file=False, data_source='markdowns')[source]

Generate all reports from the data in the database.

Parameters:
  • top_n (int) – Maximum number of top models to include in reports. Use -1 for all models. Defaults to 25.

  • single_file (bool) – If True, creates a single HTML file for all results. If False, creates one HTML per markdown file. Defaults to False.

  • data_source (str) – Source directory name for data files. Defaults to ‘markdowns’.

src.reporting.reporting.get_metrics_info()[source]

Return information about metrics including names, descriptions, and preferences.

Returns:

  • name: Human-readable metric name.

  • description: Explanation of what the metric measures.

  • higher_is_better: Whether higher values indicate better performance.

  • range: Expected value range as a string.

Return type:

Dictionary mapping metric keys to their metadata

Data Aggregator

Data aggregation module for ForzaEmbed reporting.

This module provides the DataAggregator class that handles aggregation and caching of processed data from the database for report generation.

Example

Aggregate data for reporting:

from src.reporting.aggregator import DataAggregator

aggregator = DataAggregator(db, output_dir, "config_name")
data = aggregator.get_aggregated_data()
class src.reporting.aggregator.DataAggregator(db, output_dir, config_name)[source]

Bases: object

Handle aggregation and caching of processed data from the database.

Aggregates processing results from the database into a format suitable for report generation, with caching to avoid redundant computation.

db

The embedding database containing results.

output_dir

Directory path for cache files.

cache_path

Path to the cache file.

__init__(db, output_dir, config_name)[source]

Initialize the DataAggregator.

Parameters:
  • db (EmbeddingDatabase) – The embedding database containing results.

  • output_dir (Path) – Directory path for cache files.

  • config_name (str) – Name of the configuration for cache file prefix.

get_aggregated_data()[source]

Load aggregated data from cache if valid, otherwise aggregate from scratch.

Checks if the cache is newer than the database modification time. If valid, loads from cache; otherwise, aggregates fresh data.

Returns:

Dictionary containing aggregated data for reporting, or None if no processing results are available. Contains keys:

  • all_results: Raw results from database.

  • processed_data_for_interactive_page: Optimized web data.

  • all_models_metrics: Metrics organized by model.

  • model_embeddings_for_variance: Embeddings for analysis.

  • total_combinations: Count of model combinations.

Return type:

dict[str, Any] | None

touch_cache()[source]

Update the cache file’s modification time to the current time.

Web Generator

Web page generation module for ForzaEmbed.

This module provides functions for generating interactive HTML pages for visualising embedding analysis results, including heatmaps and comparison charts.

Templates are maintained as separate files under src/reporting/templates/ for easier editing:

  • template.html — HTML structure with %%PLACEHOLDER%% markers

  • style.css — Professional report stylesheet (minified at build time)

  • main.js — Interactive report logic

  • worker.js — Web Worker for Base64/zlib decompression

Example

Generate an interactive web page:

from src.reporting.web_generator import generate_main_page

generate_main_page(
    processed_data, output_dir, total_combinations,
    single_file=True, config_name="my_config"
)
src.reporting.web_generator.safe_numpy_converter(obj)[source]

Recursively convert NumPy types to native Python types for JSON serialisation.

Parameters:

obj (Any) – Object to convert, can be ndarray, scalar, dict, list, or other.

Returns:

Object with all NumPy types converted to native Python equivalents.

Return type:

Any

src.reporting.web_generator.generate_main_page(processed_data, output_dir, total_combinations, single_file=False, graph_paths=None, config_name='config', themes_config=None)[source]

Generate the main interactive web page for heatmap visualisation.

Creates HTML files with embedded JavaScript for interactive exploration of embedding similarity results.

Parameters:
  • processed_data (dict[str, Any]) – Dictionary containing processed analysis data.

  • output_dir (str) – Directory path for output HTML files.

  • total_combinations (int) – Total number of model combinations processed.

  • single_file (bool) – If True, creates a single index.html for all files. If False, creates one HTML file per markdown. Defaults to False.

  • graph_paths (dict[str, list[str]] | None) – Dictionary mapping file keys to lists of graph image paths.

  • config_name (str) – Name of the configuration for file prefixes.

  • themes_config (dict[str, Any] | None) – Theme configuration for tooltip display.

Markdown Filter

Markdown filtering module for ForzaEmbed.

This module provides the MarkdownFilter class that handles generation of filtered markdown files based on similarity threshold, extracting only the chunks that are above the threshold.

Example

Generate filtered markdown files:

from src.reporting.markdown_filter import MarkdownFilter

filter = MarkdownFilter(db, config, output_dir, "config_name")
filter.generate_filtered_markdowns()
class src.reporting.markdown_filter.MarkdownFilter(db, config, output_dir, config_name)[source]

Bases: object

Handle generation of filtered markdown files based on similarity threshold.

Creates filtered versions of input markdown files containing only the text chunks that exceed the similarity threshold for each model.

db

The embedding database containing results.

config

Configuration dictionary with filter settings.

output_dir

Directory path for output files.

config_name

Name of the configuration for file prefixes.

similarity_threshold

Minimum similarity for including chunks.

__init__(db, config, output_dir, config_name)[source]

Initialize the MarkdownFilter.

Parameters:
  • db (EmbeddingDatabase) – The embedding database containing results.

  • config (dict[str, Any]) – Configuration dictionary with filter settings.

  • output_dir (Path) – Directory path for output files.

  • config_name (str) – Name of the configuration for file prefixes.

generate_filtered_markdowns()[source]

Generate filtered markdown files containing only chunks above threshold.

Creates one filtered markdown file per model-document combination, along with a CSV summary of filtering statistics.