Reporting Module

The reporting module handles visualization and export of results.

Report Generator

Data Aggregator

Data aggregation module for ForzaEmbed reporting.

This module provides the DataAggregator class that handles aggregation and caching of processed data from the database for report generation.

Example

Aggregate data for reporting:

from src.reporting.aggregator import DataAggregator

aggregator = DataAggregator(db, output_dir, "config_name")
data = aggregator.get_aggregated_data()

class src.reporting.aggregator.DataAggregator(db, output_dir, config_name)[source]

Bases: object

Handle aggregation and caching of processed data from the database.

Aggregates processing results from the database into a format suitable for report generation, with caching to avoid redundant computation.

db: The embedding database containing results.

output_dir: Directory path for cache files.

cache_path: Path to the cache file.

__init__(db, output_dir, config_name)[source]

Initialize the DataAggregator.

Parameters:

db (EmbeddingDatabase) – The embedding database containing results.
output_dir (Path) – Directory path for cache files.
config_name (str) – Name of the configuration for cache file prefix.

get_aggregated_data()[source]

Load aggregated data from cache if valid, otherwise aggregate from scratch.

Checks if the cache is newer than the database modification time. If valid, loads from cache; otherwise, aggregates fresh data.

Returns:

Dictionary containing aggregated data for reporting, or None if no processing results are available. Contains keys:

all_results: Raw results from database.

processed_data_for_interactive_page: Optimized web data.

all_models_metrics: Metrics organized by model.

model_embeddings_for_variance: Embeddings for analysis.

total_combinations: Count of model combinations.

Return type:

dict[str, Any] | None

touch_cache()[source]

Update the cache file’s modification time to the current time.

Web Generator

Web page generation module for ForzaEmbed.

This module provides functions for generating interactive HTML pages for visualising embedding analysis results, including heatmaps and comparison charts.

Templates are maintained as separate files under src/reporting/templates/ for easier editing:

template.html — HTML structure with %%PLACEHOLDER%% markers
style.css — Professional report stylesheet (minified at build time)
main.js — Interactive report logic
worker.js — Web Worker for Base64/zlib decompression

Example

Generate an interactive web page:

from src.reporting.web_generator import generate_main_page

generate_main_page(
    processed_data, output_dir, total_combinations,
    single_file=True, config_name="my_config"
)

src.reporting.web_generator.safe_numpy_converter(obj)[source]

Recursively convert NumPy types to native Python types for JSON serialisation.

Parameters:: obj (Any) – Object to convert, can be ndarray, scalar, dict, list, or other.
Returns:: Object with all NumPy types converted to native Python equivalents.
Return type:: Any

src.reporting.web_generator.generate_main_page(processed_data, output_dir, total_combinations, single_file=False, graph_paths=None, config_name='config', themes_config=None)[source]

Generate the main interactive web page for heatmap visualisation.

Creates HTML files with embedded JavaScript for interactive exploration of embedding similarity results.

Parameters:

processed_data (dict[str, Any]) – Dictionary containing processed analysis data.
output_dir (str) – Directory path for output HTML files.
total_combinations (int) – Total number of model combinations processed.
single_file (bool) – If True, creates a single index.html for all files. If False, creates one HTML file per markdown. Defaults to False.
graph_paths (dict[str, list[str]] | None) – Dictionary mapping file keys to lists of graph image paths.
config_name (str) – Name of the configuration for file prefixes.
themes_config (dict[str, Any] | None) – Theme configuration for tooltip display.

Markdown Filter

Markdown filtering module for ForzaEmbed.

This module provides the MarkdownFilter class that handles generation of filtered markdown files based on similarity threshold, extracting only the chunks that are above the threshold.

Example

Generate filtered markdown files:

from src.reporting.markdown_filter import MarkdownFilter

filter = MarkdownFilter(db, config, output_dir, "config_name")
filter.generate_filtered_markdowns()

class src.reporting.markdown_filter.MarkdownFilter(db, config, output_dir, config_name)[source]

Bases: object

Handle generation of filtered markdown files based on similarity threshold.

Creates filtered versions of input markdown files containing only the text chunks that exceed the similarity threshold for each model.

db: The embedding database containing results.

config: Configuration dictionary with filter settings.

output_dir: Directory path for output files.

config_name: Name of the configuration for file prefixes.

similarity_threshold: Minimum similarity for including chunks.

__init__(db, config, output_dir, config_name)[source]

Initialize the MarkdownFilter.

Parameters:

db (EmbeddingDatabase) – The embedding database containing results.
config (dict[str, Any]) – Configuration dictionary with filter settings.
output_dir (Path) – Directory path for output files.
config_name (str) – Name of the configuration for file prefixes.

generate_filtered_markdowns()[source]

Generate filtered markdown files containing only chunks above threshold.

Creates one filtered markdown file per model-document combination, filtered files to the output directory.