################################### StellaScript Project Architecture ################################### This document details the structure of the StellaScript project, the role of each file, and how the modules interact to perform audio transcription and diarization. Overview ======== The project is structured around a main module, ``stellascript``, which contains all the application logic. Execution is initiated by ``main.py`` at the project root, which acts as the entry point. Root Files ========== - ``main.py``: **Application entry point.** It is responsible for parsing command-line arguments, initializing the orchestrator, and launching the transcription process (either live or from a file). - ``README.md``: **Main documentation.** Provides an overview of the project, installation instructions, and usage guidelines. - ``pyproject.toml`` & ``uv.lock``: **Dependency management.** These files define the Python libraries required for the project to function. - ``.gitignore``: Configuration file for Git, specifying files and folders to be ignored. - ``LICENSE``: Contains the MIT license under which the project is distributed. ``stellascript`` Module (Application Core) ============================================ The ``stellascript/`` directory contains the main source code of the application, organized into several modules and sub-modules. - ``orchestrator.py``: **The conductor.** This is the most important file in the project. The ``StellaScriptTranscription`` class manages the entire processing pipeline. It initializes the various components (transcriber, diarizer, etc.) and coordinates their interactions, whether for real-time or file-based processing. - ``config.py``: **Central configuration.** This file centralizes all technical constants and parameters used in the application (e.g., sampling rate, audio buffer duration, voice detection thresholds). This allows for easy modification of the application's behavior from a single location. - ``cli.py``: **Command-line interface.** Defines all the arguments that the user can pass to the program (such as ``--file``, ``--language``, ``--mode``) and ensures they are correctly interpreted. - ``logging_config.py``: **Logging configuration.** Sets up the logging system to display informational messages, warnings, or errors during execution, which is crucial for debugging. ``stellascript/audio`` Sub-module ------------------------------------ This module is dedicated to handling raw audio data. - ``capture.py``: **Audio capture.** Manages interaction with the microphone to record the audio stream in real-time. - ``enhancement.py``: **Audio enhancement.** Contains the logic for applying audio cleaning models, such as ``DeepFilterNet`` or ``Demucs``, to reduce background noise and improve voice clarity before transcription. ``stellascript/processing`` Sub-module ----------------------------------------- This module contains the components responsible for the intelligent analysis and processing of audio. - ``transcriber.py``: **Transcription module.** Encapsulates the speech recognition model (Whisper via ``whisperx``). Its sole responsibility is to take an audio segment and convert it into text. - ``diarizer.py``: **Diarization module.** Its role is to answer the question: "who is speaking and when?". It uses models like ``pyannote.audio`` or a combination of VAD (Voice Activity Detection) and clustering to segment the audio based on speakers. - ``speaker_manager.py``: **Speaker manager.** Works closely with the ``diarizer``, especially for the ``cluster`` method. It is responsible for creating and managing "voiceprints" (embeddings) to identify and differentiate speakers consistently.