Quick Start Guide
This guide will help you get started with ForzaEmbed in minutes.
Basic Workflow
ForzaEmbed follows a simple three-step workflow:
Configure your grid search parameters
Run the analysis
Visualize the results
Step 1: Prepare Your Data
Place your markdown files in a directory (e.g., markdowns/):
markdowns/
├── document1.md
├── document2.md
└── document3.md
Step 2: Create Configuration
Create a YAML configuration file (e.g., configs/config.yml):
# Grid search parameters
grid_search_params:
chunk_size: [100, 250, 500]
chunk_overlap: [0, 10, 25]
chunking_strategy: ["langchain", "semchunk"]
similarity_metrics: ["cosine", "dot_product"]
themes:
schedule_keywords: [
"opening hours",
"schedule",
"timetable"
]
# Models to test
models_to_test:
- type: "fastembed"
name: "BAAI/bge-small-en-v1.5"
dimensions: 384
- type: "sentence_transformers"
name: "all-MiniLM-L6-v2"
dimensions: 384
# General settings
similarity_threshold: 0.6
output_dir: "reports"
# Database settings
database:
intelligent_quantization: true
# Performance settings
multiprocessing:
max_workers_api: 4
max_workers_local: null
embedding_batch_size_api: 100
embedding_batch_size_local: 500
Step 3: Run the Analysis
Using Python API
from src.core.core import ForzaEmbed
# Initialize
app = ForzaEmbed(
db_path="reports/my_analysis.db",
config_path="configs/config.yml"
)
# Run grid search
app.run_grid_search(data_source="markdowns/")
# Generate interactive HTML reports
app.generate_reports(top_n=25)
Using Command Line
python main.py --config-path configs/config.yml --data-source markdowns/ --run
Generate reports only (after running analysis):
python main.py --config-path configs/config.yml --generate-reports --top-n 25
Step 4: Explore Results
After the analysis completes, you’ll find:
SQLite Database:
reports/my_analysis_ForzaEmbed.dbStores all embeddings, similarities, and metrics
Enables caching for subsequent runs
Interactive HTML Report:
reports/my_analysis_index.htmlTextual heatmap showing similarity scores
Interactive controls to switch between configurations
T-SNE visualization of embedding clusters
CSV Reports:
reports/similarity_report.csvTabular comparison of all configurations
Metrics for each combination
Understanding the Heatmap
The textual heatmap is the key visualization in ForzaEmbed:
Red text: High similarity with theme keywords (relevant chunks)
Blue text: Low similarity (less relevant chunks)
Interactive sliders: Change model, chunk size, and other parameters in real-time
Hover tooltips: See exact similarity scores
Example Output
Next Steps
Learn about Configuration Guide options
Explore Examples for common use cases
Check the Core Module for advanced usage