ForzaEmbed

User Guide

  • Installation
    • Requirements
    • Installation
    • Verifying Installation
    • Troubleshooting
      • spaCy Models
      • NLTK Data
  • Quick Start Guide
    • Basic Workflow
    • Step 1: Prepare Your Data
    • Step 2: Create Configuration
    • Step 3: Run the Analysis
      • Using Python API
      • Using Command Line
    • Step 4: Explore Results
    • Understanding the Heatmap
    • Example Output
    • Next Steps
  • Configuration Guide
    • Configuration File Structure
    • Grid Search Parameters
      • chunk_size
      • chunk_overlap
      • chunking_strategy
      • similarity_metrics
      • themes
    • Models Configuration
      • FastEmbed Models
      • Sentence Transformers
      • Hugging Face Transformers
      • API-based Models
    • General Settings
      • similarity_threshold
      • output_dir
      • generate_filtered_markdowns
    • Database Settings
      • intelligent_quantization
    • Performance Settings
    • Complete Example
  • Examples
    • Example 1: Finding Opening Hours
      • Configuration
      • Python Code
    • Example 2: Comparing Multiple Models
      • Configuration
      • Python Code
    • Example 3: Resume from Interrupted Run
    • Example 4: Generate Reports from Existing Database
    • Command Line Examples
    • Performance Optimization Tips
      • Smart Grid Search Optimization
      • For Large Datasets
      • For API-Based Models
      • For Memory Constraints
  • Grid Search Optimization
    • Problem Statement
    • Chunking Strategy Classification
      • Parameter-Sensitive Strategies
      • Parameter-Insensitive Strategies
    • Optimization Strategy
    • Performance Impact
      • Example Configuration
      • Results
      • Per-Strategy Savings
    • Implementation Details
    • Automatic Detection
    • Transparency and Verification
    • Best Practices
      • Maximize Optimization Benefits
      • When to Use Each Strategy Type
    • Technical References
      • Sentence Tokenization
      • Character-Based Chunking

API Reference

  • Core Module
    • ForzaEmbed Class
      • ForzaEmbed
        • ForzaEmbed.db_path
        • ForzaEmbed.config_path
        • ForzaEmbed.config
        • ForzaEmbed.config_name
        • ForzaEmbed.db
        • ForzaEmbed.output_dir
        • ForzaEmbed.processor
        • ForzaEmbed.report_generator
        • ForzaEmbed.__init__()
        • ForzaEmbed.run_grid_search()
        • ForzaEmbed.generate_reports()
    • Processor Class
      • Processor
        • Processor.db
        • Processor.config
        • Processor.embedding_service
        • Processor.similarity_service
        • Processor.visualization_service
        • Processor.__init__()
        • Processor.run_test()
    • Configuration
      • GridSearchParams
        • GridSearchParams.chunk_size
        • GridSearchParams.chunk_overlap
        • GridSearchParams.chunking_strategy
        • GridSearchParams.similarity_metrics
        • GridSearchParams.themes
        • GridSearchParams.chunk_size
        • GridSearchParams.chunk_overlap
        • GridSearchParams.chunking_strategy
        • GridSearchParams.similarity_metrics
        • GridSearchParams.themes
        • GridSearchParams.model_config
      • ModelConfig
        • ModelConfig.type
        • ModelConfig.name
        • ModelConfig.dimensions
        • ModelConfig.base_url
        • ModelConfig.timeout
        • ModelConfig.type
        • ModelConfig.name
        • ModelConfig.dimensions
        • ModelConfig.base_url
        • ModelConfig.timeout
        • ModelConfig.model_config
      • DatabaseSettings
        • DatabaseSettings.intelligent_quantization
        • DatabaseSettings.intelligent_quantization
        • DatabaseSettings.model_config
      • MultiprocessingSettings
        • MultiprocessingSettings.max_workers_api
        • MultiprocessingSettings.max_workers_local
        • MultiprocessingSettings.maxtasksperchild
        • MultiprocessingSettings.embedding_batch_size_api
        • MultiprocessingSettings.embedding_batch_size_local
        • MultiprocessingSettings.file_batch_size
        • MultiprocessingSettings.api_batch_sizes
        • MultiprocessingSettings.max_workers_api
        • MultiprocessingSettings.max_workers_local
        • MultiprocessingSettings.maxtasksperchild
        • MultiprocessingSettings.embedding_batch_size_api
        • MultiprocessingSettings.embedding_batch_size_local
        • MultiprocessingSettings.file_batch_size
        • MultiprocessingSettings.api_batch_sizes
        • MultiprocessingSettings.model_config
      • AppConfig
        • AppConfig.grid_search_params
        • AppConfig.models_to_test
        • AppConfig.similarity_threshold
        • AppConfig.output_dir
        • AppConfig.generate_filtered_markdowns
        • AppConfig.database
        • AppConfig.multiprocessing
        • AppConfig.grid_search_params
        • AppConfig.models_to_test
        • AppConfig.similarity_threshold
        • AppConfig.output_dir
        • AppConfig.generate_filtered_markdowns
        • AppConfig.database
        • AppConfig.multiprocessing
        • AppConfig.model_config
      • load_config()
  • Embedding Clients
    • FastEmbed Client
      • FastEmbedClient
        • FastEmbedClient._instances
        • FastEmbedClient.get_instance()
        • FastEmbedClient.get_embeddings()
    • Sentence Transformers Client
      • SentenceTransformersClient
        • SentenceTransformersClient._instances
        • SentenceTransformersClient.get_instance()
        • SentenceTransformersClient.get_embeddings()
    • Transformers Client
      • mean_pooling()
      • TransformersClient
        • TransformersClient._instances
        • TransformersClient.get_instance()
        • TransformersClient.get_embeddings()
    • Hugging Face Client
      • mean_pooling()
      • get_huggingface_embeddings()
    • API Client
      • ProductionEmbeddingClient
        • ProductionEmbeddingClient.base_url
        • ProductionEmbeddingClient.model
        • ProductionEmbeddingClient.expected_dimension
        • ProductionEmbeddingClient.timeout
        • ProductionEmbeddingClient.session
        • ProductionEmbeddingClient.__init__()
        • ProductionEmbeddingClient.get_embeddings()
  • Services Module
    • Embedding Service
      • EmbeddingService
        • EmbeddingService.db
        • EmbeddingService.config
        • EmbeddingService.multiprocessing_config
        • EmbeddingService.__init__()
        • EmbeddingService.get_embedding_function()
        • EmbeddingService.get_or_create_embeddings()
        • EmbeddingService.get_text_hash()
    • Similarity Service
      • SimilarityService
        • SimilarityService.calculate_similarity()
        • SimilarityService.validate_similarities()
    • Visualization Service
      • VisualizationService
        • VisualizationService.db
        • VisualizationService.__init__()
        • VisualizationService.get_or_create_tsne_data()
  • Evaluation Metrics
    • Main Metrics Module
      • calculate_silhouette_metrics()
      • calculate_all_metrics()
    • Silhouette Analysis
      • decompose_silhouette_score()
      • analyze_silhouette_by_cluster()
      • enhanced_silhouette_analysis()
    • Metric Descriptions
      • Silhouette Score
      • Intra-Cluster Distance (Normalized)
      • Inter-Cluster Distance (Normalized)
      • Embedding Computation Time
    • Interpretation Guide
      • Good Configuration
      • Poor Configuration
  • Reporting Module
    • Report Generator
      • ReportGenerator
        • ReportGenerator.db
        • ReportGenerator.config
        • ReportGenerator.output_dir
        • ReportGenerator.config_name
        • ReportGenerator.similarity_threshold
        • ReportGenerator.data_aggregator
        • ReportGenerator.markdown_filter
        • ReportGenerator.__init__()
        • ReportGenerator.generate_all()
      • get_metrics_info()
    • Data Aggregator
      • DataAggregator
        • DataAggregator.db
        • DataAggregator.output_dir
        • DataAggregator.cache_path
        • DataAggregator.__init__()
        • DataAggregator.get_aggregated_data()
        • DataAggregator.touch_cache()
    • Web Generator
      • safe_numpy_converter()
      • generate_main_page()
    • Markdown Filter
      • MarkdownFilter
        • MarkdownFilter.db
        • MarkdownFilter.config
        • MarkdownFilter.output_dir
        • MarkdownFilter.config_name
        • MarkdownFilter.similarity_threshold
        • MarkdownFilter.__init__()
        • MarkdownFilter.generate_filtered_markdowns()
  • Utilities
    • Database
      • EmbeddingDatabase
        • EmbeddingDatabase.db_path
        • EmbeddingDatabase.config
        • EmbeddingDatabase.quantization_enabled
        • EmbeddingDatabase.engine
        • EmbeddingDatabase.Session
        • EmbeddingDatabase.__init__()
        • EmbeddingDatabase.add_model()
        • EmbeddingDatabase.add_evaluation_metrics()
        • EmbeddingDatabase.add_generated_file()
        • EmbeddingDatabase.add_global_chart()
        • EmbeddingDatabase.model_exists()
        • EmbeddingDatabase.save_processing_result()
        • EmbeddingDatabase.save_processing_results_batch()
        • EmbeddingDatabase.get_processed_files()
        • EmbeddingDatabase.get_model_info()
        • EmbeddingDatabase.get_all_processing_results()
        • EmbeddingDatabase.get_all_models()
        • EmbeddingDatabase.get_model_files()
        • EmbeddingDatabase.get_global_charts()
        • EmbeddingDatabase.vacuum_database()
        • EmbeddingDatabase.get_all_run_names()
        • EmbeddingDatabase.get_processed_files_with_similarities()
        • EmbeddingDatabase.get_embeddings_by_hashes()
        • EmbeddingDatabase.save_embeddings_batch()
        • EmbeddingDatabase.save_tsne_coordinates()
        • EmbeddingDatabase.get_tsne_coordinates()
        • EmbeddingDatabase.clear_tsne_cache()
        • EmbeddingDatabase.get_run_details()
        • EmbeddingDatabase.get_all_processing_results_for_run()
        • EmbeddingDatabase.update_metrics_for_file()
        • EmbeddingDatabase.get_db_modification_time()
    • Data Loader
      • load_markdown_files()
    • Text Processing
      • get_spacy_model()
      • chunk_text()
      • contains_horaire_pattern()
      • extract_context_around_phrase()
    • Database Models
      • Base
        • Base.__init__()
        • Base.metadata
        • Base.registry
      • Model
        • Model.id
        • Model.name
        • Model.base_model_name
        • Model.type
        • Model.chunk_size
        • Model.chunk_overlap
        • Model.theme_name
        • Model.chunking_strategy
        • Model.similarity_metric
        • Model.created_at
        • Model.metrics
        • Model.generated_files
        • Model.id
        • Model.name
        • Model.base_model_name
        • Model.type
        • Model.chunk_size
        • Model.chunk_overlap
        • Model.theme_name
        • Model.chunking_strategy
        • Model.similarity_metric
        • Model.created_at
        • Model.metrics
        • Model.generated_files
        • Model.__init__()
      • EvaluationMetric
        • EvaluationMetric.id
        • EvaluationMetric.model_name
        • EvaluationMetric.silhouette_score
        • EvaluationMetric.intra_cluster_distance_normalized
        • EvaluationMetric.inter_cluster_distance_normalized
        • EvaluationMetric.embedding_computation_time
        • EvaluationMetric.created_at
        • EvaluationMetric.model
        • EvaluationMetric.id
        • EvaluationMetric.model_name
        • EvaluationMetric.silhouette_score
        • EvaluationMetric.intra_cluster_distance_normalized
        • EvaluationMetric.inter_cluster_distance_normalized
        • EvaluationMetric.embedding_computation_time
        • EvaluationMetric.created_at
        • EvaluationMetric.model
        • EvaluationMetric.__init__()
      • GeneratedFile
        • GeneratedFile.id
        • GeneratedFile.model_name
        • GeneratedFile.file_type
        • GeneratedFile.file_path
        • GeneratedFile.created_at
        • GeneratedFile.model
        • GeneratedFile.id
        • GeneratedFile.model_name
        • GeneratedFile.file_type
        • GeneratedFile.file_path
        • GeneratedFile.created_at
        • GeneratedFile.model
        • GeneratedFile.__init__()
      • GlobalChart
        • GlobalChart.id
        • GlobalChart.chart_type
        • GlobalChart.file_path
        • GlobalChart.created_at
        • GlobalChart.id
        • GlobalChart.chart_type
        • GlobalChart.file_path
        • GlobalChart.created_at
        • GlobalChart.__init__()
      • ProcessingResult
        • ProcessingResult.id
        • ProcessingResult.model_name
        • ProcessingResult.file_id
        • ProcessingResult.results_blob
        • ProcessingResult.created_at
        • ProcessingResult.id
        • ProcessingResult.model_name
        • ProcessingResult.file_id
        • ProcessingResult.results_blob
        • ProcessingResult.created_at
        • ProcessingResult.__init__()
      • EmbeddingCache
        • EmbeddingCache.model_name
        • EmbeddingCache.text_hash
        • EmbeddingCache.vector
        • EmbeddingCache.dimension
        • EmbeddingCache.created_at
        • EmbeddingCache.model_name
        • EmbeddingCache.text_hash
        • EmbeddingCache.vector
        • EmbeddingCache.dimension
        • EmbeddingCache.created_at
        • EmbeddingCache.__init__()
      • TSNECoordinate
        • TSNECoordinate.id
        • TSNECoordinate.tsne_key
        • TSNECoordinate.file_id
        • TSNECoordinate.coordinates
        • TSNECoordinate.created_at
        • TSNECoordinate.id
        • TSNECoordinate.tsne_key
        • TSNECoordinate.file_id
        • TSNECoordinate.coordinates
        • TSNECoordinate.created_at
        • TSNECoordinate.__init__()

Additional Information

  • License
    • MIT License
    • Attribution
ForzaEmbed
  • Python Module Index

Python Module Index

s
 
s
- src
    src.clients.api_client
    src.clients.fastembed_client
    src.clients.huggingface_client
    src.clients.sentencetransformers_client
    src.clients.transformers_client
    src.core.config
    src.metrics.evaluation_metrics
    src.metrics.silhouette_decomposition
    src.reporting.aggregator
    src.reporting.markdown_filter
    src.reporting.reporting
    src.reporting.web_generator
    src.services.embedding_service
    src.services.similarity_service
    src.services.visualization_service
    src.utils.data_loader
    src.utils.database
    src.utils.models
    src.utils.utils

© Copyright 2026, BĂ©ranger Thomas.

Built with Sphinx using a theme provided by Read the Docs.