Reporting Module
The reporting module handles visualization and export of results.
Report Generator
Report generation module for ForzaEmbed.
This module provides the ReportGenerator class that handles the generation of all reports and visualizations, including comparison charts, radar charts, and interactive web pages.
Example
Generate reports from processing results:
from src.reporting.reporting import ReportGenerator
generator = ReportGenerator(db, config, output_dir, "config_name")
generator.generate_all(top_n=25, single_file=False)
- class src.reporting.reporting.ReportGenerator(db, config, output_dir, config_name)[source]
Bases:
objectHandle the generation of all reports and visualizations.
Coordinates the generation of comparison charts, radar charts, filtered markdowns, and interactive web pages from processing results.
- db
The embedding database containing results.
- config
Configuration dictionary with report settings.
- output_dir
Directory path for output files.
- config_name
Name of the configuration for file prefixes.
- similarity_threshold
Threshold for similarity-based filtering.
- data_aggregator
Helper for aggregating data from database.
- markdown_filter
Helper for generating filtered markdowns.
- generate_all(top_n=25, single_file=False, data_source='markdowns')[source]
Generate all reports from the data in the database.
- Parameters:
top_n (int) – Maximum number of top models to include in reports. Use -1 for all models. Defaults to 25.
single_file (bool) – If True, creates a single HTML file for all results. If False, creates one HTML per markdown file. Defaults to False.
data_source (str) – Source directory name for data files. Defaults to ‘markdowns’.
- src.reporting.reporting.get_metrics_info()[source]
Return information about metrics including names, descriptions, and preferences.
- Returns:
name: Human-readable metric name.
description: Explanation of what the metric measures.
higher_is_better: Whether higher values indicate better performance.
range: Expected value range as a string.
- Return type:
Dictionary mapping metric keys to their metadata
Data Aggregator
Data aggregation module for ForzaEmbed reporting.
This module provides the DataAggregator class that handles aggregation and caching of processed data from the database for report generation.
Example
Aggregate data for reporting:
from src.reporting.aggregator import DataAggregator
aggregator = DataAggregator(db, output_dir, "config_name")
data = aggregator.get_aggregated_data()
- class src.reporting.aggregator.DataAggregator(db, output_dir, config_name)[source]
Bases:
objectHandle aggregation and caching of processed data from the database.
Aggregates processing results from the database into a format suitable for report generation, with caching to avoid redundant computation.
- db
The embedding database containing results.
- output_dir
Directory path for cache files.
- cache_path
Path to the cache file.
- __init__(db, output_dir, config_name)[source]
Initialize the DataAggregator.
- Parameters:
db (EmbeddingDatabase) – The embedding database containing results.
output_dir (Path) – Directory path for cache files.
config_name (str) – Name of the configuration for cache file prefix.
- get_aggregated_data()[source]
Load aggregated data from cache if valid, otherwise aggregate from scratch.
Checks if the cache is newer than the database modification time. If valid, loads from cache; otherwise, aggregates fresh data.
- Returns:
Dictionary containing aggregated data for reporting, or None if no processing results are available. Contains keys:
all_results: Raw results from database.
processed_data_for_interactive_page: Optimized web data.
all_models_metrics: Metrics organized by model.
model_embeddings_for_variance: Embeddings for analysis.
total_combinations: Count of model combinations.
- Return type:
Web Generator
Web page generation module for ForzaEmbed.
This module provides functions for generating interactive HTML pages for visualising embedding analysis results, including heatmaps and comparison charts.
Templates are maintained as separate files under
src/reporting/templates/ for easier editing:
template.html— HTML structure with%%PLACEHOLDER%%markersstyle.css— Professional report stylesheet (minified at build time)main.js— Interactive report logicworker.js— Web Worker for Base64/zlib decompression
Example
Generate an interactive web page:
from src.reporting.web_generator import generate_main_page
generate_main_page(
processed_data, output_dir, total_combinations,
single_file=True, config_name="my_config"
)
- src.reporting.web_generator.safe_numpy_converter(obj)[source]
Recursively convert NumPy types to native Python types for JSON serialisation.
- src.reporting.web_generator.generate_main_page(processed_data, output_dir, total_combinations, single_file=False, graph_paths=None, config_name='config', themes_config=None)[source]
Generate the main interactive web page for heatmap visualisation.
Creates HTML files with embedded JavaScript for interactive exploration of embedding similarity results.
- Parameters:
processed_data (dict[str, Any]) – Dictionary containing processed analysis data.
output_dir (str) – Directory path for output HTML files.
total_combinations (int) – Total number of model combinations processed.
single_file (bool) – If True, creates a single index.html for all files. If False, creates one HTML file per markdown. Defaults to False.
graph_paths (dict[str, list[str]] | None) – Dictionary mapping file keys to lists of graph image paths.
config_name (str) – Name of the configuration for file prefixes.
themes_config (dict[str, Any] | None) – Theme configuration for tooltip display.
Markdown Filter
Markdown filtering module for ForzaEmbed.
This module provides the MarkdownFilter class that handles generation of filtered markdown files based on similarity threshold, extracting only the chunks that are above the threshold.
Example
Generate filtered markdown files:
from src.reporting.markdown_filter import MarkdownFilter
filter = MarkdownFilter(db, config, output_dir, "config_name")
filter.generate_filtered_markdowns()
- class src.reporting.markdown_filter.MarkdownFilter(db, config, output_dir, config_name)[source]
Bases:
objectHandle generation of filtered markdown files based on similarity threshold.
Creates filtered versions of input markdown files containing only the text chunks that exceed the similarity threshold for each model.
- db
The embedding database containing results.
- config
Configuration dictionary with filter settings.
- output_dir
Directory path for output files.
- config_name
Name of the configuration for file prefixes.
- similarity_threshold
Minimum similarity for including chunks.