Quick Start Guide ================= This guide will help you get started with ForzaEmbed in minutes. Basic Workflow -------------- ForzaEmbed follows a simple three-step workflow: 1. **Configure** your grid search parameters 2. **Run** the analysis 3. **Visualize** the results Step 1: Prepare Your Data -------------------------- Place your markdown files in a directory (e.g., ``markdowns/``):: markdowns/ ├── document1.md ├── document2.md └── document3.md Step 2: Create Configuration ----------------------------- Create a YAML configuration file (e.g., ``configs/config.yml``): .. code-block:: yaml # Grid search parameters grid_search_params: chunk_size: [100, 250, 500] chunk_overlap: [0, 10, 25] chunking_strategy: ["langchain", "semchunk"] similarity_metrics: ["cosine", "dot_product"] themes: schedule_keywords: [ "opening hours", "schedule", "timetable" ] # Models to test models_to_test: - type: "fastembed" name: "BAAI/bge-small-en-v1.5" dimensions: 384 - type: "sentence_transformers" name: "all-MiniLM-L6-v2" dimensions: 384 # General settings similarity_threshold: 0.6 output_dir: "reports" # Database settings database: intelligent_quantization: true # Performance settings multiprocessing: max_workers_api: 4 max_workers_local: null embedding_batch_size_api: 100 embedding_batch_size_local: 500 Step 3: Run the Analysis ------------------------- Using Python API ~~~~~~~~~~~~~~~~ .. code-block:: python from src.core.core import ForzaEmbed # Initialize app = ForzaEmbed( db_path="reports/my_analysis.db", config_path="configs/config.yml" ) # Run grid search app.run_grid_search(data_source="markdowns/") # Generate interactive HTML reports app.generate_reports(top_n=25) Using Command Line ~~~~~~~~~~~~~~~~~~ :: python main.py --config-path configs/config.yml --data-source markdowns/ --run Generate reports only (after running analysis):: python main.py --config-path configs/config.yml --generate-reports --top-n 25 Step 4: Explore Results ----------------------- After the analysis completes, you'll find: 1. **SQLite Database**: ``reports/my_analysis_ForzaEmbed.db`` * Stores all embeddings, similarities, and metrics * Enables caching for subsequent runs 2. **Interactive HTML Report**: ``reports/my_analysis_index.html`` * Textual heatmap showing similarity scores * Interactive controls to switch between configurations * T-SNE visualization of embedding clusters 3. **CSV Reports**: ``reports/similarity_report.csv`` * Tabular comparison of all configurations * Metrics for each combination Understanding the Heatmap -------------------------- The textual heatmap is the key visualization in ForzaEmbed: * **Red text**: High similarity with theme keywords (relevant chunks) * **Blue text**: Low similarity (less relevant chunks) * **Interactive sliders**: Change model, chunk size, and other parameters in real-time * **Hover tooltips**: See exact similarity scores Example Output -------------- .. image:: assets/example_1.jpg :alt: Example of ForzaEmbed interactive report :align: center :width: 100% Next Steps ---------- * Learn about :doc:`configuration` options * Explore :doc:`examples` for common use cases * Check the :doc:`api/core` for advanced usage