Project Modules
main
stellascript.orchestrator
Main orchestrator for the Stellascript transcription pipeline.
This module contains the StellaScriptTranscription class, which coordinates various components like audio capture, enhancement, diarization, and transcription to provide a seamless real-time and file-based transcription service.
- class stellascript.orchestrator.StellaScriptTranscription(model_id: str = 'large-v3', language: str = 'fr', similarity_threshold: float = 0.7, mode: str = 'block', min_speakers: int | None = None, max_speakers: int | None = None, diarization_method: str = 'pyannote', enhancement_method: str = 'none', save_enhanced_audio: bool = False, save_recorded_audio: bool = False)[source]
Bases:
object
Orchestrates the entire transcription process.
This class manages the audio stream (from microphone or file), applies audio enhancement, performs speaker diarization, and uses a transcription model to convert speech to text. It handles different transcription modes (block, segment, word) and coordinates the various sub-modules.
stellascript.config
Configuration settings for the Stellascript application.
This module defines various constants that control the behavior of the audio processing, transcription, and diarization pipeline. These settings are organized into sections for clarity and can be tuned to optimize performance for different use cases.
- stellascript.config.FORMAT
The audio format used for recording, corresponding to PyAudio’s paFloat32.
- Type:
str
- stellascript.config.CHANNELS
The number of audio channels (1 for mono).
- Type:
int
- stellascript.config.RATE
The sampling rate in Hz (16000 Hz is standard for speech).
- Type:
int
- stellascript.config.CHUNK
The number of samples per buffer, used for VAD processing.
- Type:
int
- stellascript.config.TRANSCRIPTION_MAX_BUFFER_DURATION
The maximum duration of the audio buffer for transcription in seconds.
- Type:
float
- stellascript.config.SUBTITLE_MAX_BUFFER_DURATION
The maximum duration of the audio buffer for subtitle generation in seconds.
- Type:
float
- stellascript.config.VAD_SPEECH_THRESHOLD
The sensitivity threshold for the Voice Activity Detection (VAD).
- Type:
float
- stellascript.config.VAD_SILENCE_DURATION_S
The duration of silence in seconds that triggers a segment split.
- Type:
float
- stellascript.config.VAD_MIN_SPEECH_DURATION_S
The minimum duration of speech in seconds to be considered a valid segment.
- Type:
float
- stellascript.config.SUBTITLE_MAX_LENGTH
The maximum number of characters per subtitle line.
- Type:
int
- stellascript.config.SUBTITLE_MAX_DURATION_S
The maximum duration of a single subtitle line in seconds.
- Type:
float
- stellascript.config.SUBTITLE_MAX_SILENCE_S
The maximum duration of silence to tolerate before creating a new subtitle line.
- Type:
float
- stellascript.config.MAX_MERGE_GAP_S
The maximum gap of silence in seconds between two speech segments to be merged into one.
- Type:
float
- stellascript.config.TARGET_CHUNK_DURATION_S
The target duration for audio chunks when processing a file.
- Type:
float
- stellascript.config.MAX_CHUNK_DURATION_S
The maximum allowed duration for an audio chunk.
- Type:
float
- stellascript.config.MIN_SILENCE_GAP_S
The minimum duration of silence to be considered a gap for chunking.
- Type:
float
- stellascript.config.TRANSCRIPTION_PADDING_S
The duration of silence padding added to audio segments before transcription.
- Type:
float
- stellascript.config.MODELS
A list of available Whisper models for transcription.
- Type:
list[str]
stellascript.cli
- stellascript.cli.parse_args() Namespace [source]
Parses command-line arguments for the Stellascript application.
This function sets up an ArgumentParser to handle various command-line options for transcription, including language, model selection, input file, diarization, and audio enhancement. It also includes argument validation to ensure compatibility between different options.
- Returns:
An object containing the parsed command-line arguments.
- Return type:
argparse.Namespace
- stellascript.cli.validate_args(args: Namespace, parser: ArgumentParser) None [source]
Validates the parsed command-line arguments to ensure they are consistent.
This function checks for various invalid combinations of arguments, such as: - Using speaker count constraints in live mode. - Incompatible diarization and transcription modes. - Misuse of the similarity threshold with certain diarization methods. - Conflicting arguments for speaker count and similarity threshold.
- Parameters:
args (argparse.Namespace) – The parsed command-line arguments.
parser (argparse.ArgumentParser) – The argument parser, used to report errors.
- Raises:
SystemExit – If an invalid combination of arguments is found, the program exits with an error message.
stellascript.logging_config
Configuration centralisée du logging pour StellaScript. Logging professionnel, équilibré et non-verbeux.
- class stellascript.logging_config.CustomFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]
Custom log formatter to remove the ‘stellascript.’ prefix from logger names.
This formatter simplifies the log output by stripping the base package name, making the logs cleaner and easier to read.
- stellascript.logging_config.get_logger(name: str) Logger [source]
Retrieves a logger instance for a specific module.
This is a convenience function to get a logger that is part of the ‘stellascript’ hierarchy. The logger inherits its configuration from the root logger set up by setup_logging.
- Parameters:
name (str) – The name of the logger, typically __name__ of the module.
- Returns:
A configured logger instance.
- Return type:
logging.Logger
- stellascript.logging_config.setup_logging(level: int = 20, log_file: str | None = None) Logger [source]
Configures the logging for the entire application.
This function sets up a root logger for the ‘stellascript’ package with a custom formatter. It supports logging to both the console and an optional log file.
- Parameters:
level (int) – The logging level (e.g., logging.INFO, logging.DEBUG).
log_file (str | None) – Optional path to a file for logging.
- Returns:
The configured root logger for the application.
- Return type:
logging.Logger
stellascript.audio.capture
Handles audio capture from the microphone using PyAudio.
- class stellascript.audio.capture.AudioCapture(format: str, channels: int, rate: int, chunk: int)[source]
Bases:
object
A class to manage audio recording from the microphone.
This class provides a context manager to handle the lifecycle of a PyAudio stream, ensuring that resources are properly opened and closed.
stellascript.audio.enhancement
Handles audio enhancement using various methods like DeepFilterNet and Demucs.
- class stellascript.audio.enhancement.AudioEnhancer(enhancement_method: str, device: device, rate: int)[source]
Bases:
object
A class to apply audio enhancement techniques to audio data.
This class supports multiple enhancement methods and handles the loading of the necessary models.
- apply(audio_data: ndarray, is_live: bool = False) ndarray [source]
Apply the selected audio enhancement method.
- Parameters:
audio_data (np.ndarray) – The input audio data as a NumPy array.
is_live (bool) – Flag indicating if the processing is for a live stream.
- Returns:
The enhanced audio data.
- Return type:
np.ndarray
stellascript.processing.transcriber
Handles audio transcription using the WhisperX library.
This module provides a Transcriber class that encapsulates the logic for loading a Whisper model and using it to transcribe audio segments. It supports generating both full-text transcriptions and detailed word-level timestamps.
- class stellascript.processing.transcriber.Transcriber(model_id: str, device: device, language: str)[source]
Bases:
object
A wrapper for the WhisperX transcription model.
This class manages the loading of the WhisperX model and provides a simple interface to transcribe audio data. It can be configured for different model sizes, languages, and devices.
- transcribe_segment(audio_data: ndarray, rate: int, padding_duration: float, word_timestamps: bool = False) str | Tuple[List[Any], str] [source]
Transcribes a single audio segment.
The audio segment is padded with silence to improve transcription accuracy at the beginning and end of the speech.
- Parameters:
audio_data (np.ndarray) – The raw audio data of the segment.
rate (int) – The sample rate of the audio.
padding_duration (float) – The duration of silence padding in seconds.
word_timestamps (bool) – If True, returns word-level timestamps.
- Returns:
If word_timestamps is False, returns the transcribed text as a string.
If word_timestamps is True, returns a tuple containing a list of segment objects (with word details) and the full transcribed text.
- Return type:
Union[str, Tuple[List[Any], str]]
stellascript.processing.diarizer
Handles speaker diarization using different methods like Pyannote and VAD with clustering.
- class stellascript.processing.diarizer.Diarizer(device: device, method: str, hf_token: str | None, rate: int)[source]
Bases:
object
A class to perform speaker diarization on audio data.
This class supports multiple diarization methods, including the pre-trained Pyannote pipeline and a custom VAD-based clustering approach.
- apply_vad_to_chunk(audio_chunk: ndarray) float [source]
Apply VAD to a small audio chunk for live subtitle mode.
- Parameters:
audio_chunk (np.ndarray) – The audio chunk to process.
- Returns:
The speech probability.
- Return type:
float
- diarize_cluster(audio_data: ndarray, speaker_manager: SpeakerManager, similarity_threshold: float, max_speakers: int | None = None) Tuple[List[Dict[str, Any]], int] [source]
Diarize audio using VAD and clustering.
- Parameters:
audio_data (np.ndarray) – The audio data to diarize.
speaker_manager (SpeakerManager) – The speaker manager for embeddings.
similarity_threshold (float) – The similarity threshold for clustering.
max_speakers (Optional[int]) – The maximum number of speakers.
- Returns:
- A tuple containing the list of
diarized segments and the number of found speakers.
- Return type:
Tuple[List[Dict[str, Any]], int]
- diarize_pyannote(audio_data: ndarray, min_speakers: int | None = None, max_speakers: int | None = None) List[Tuple[Segment, str, str]] [source]
Diarize audio using the Pyannote pipeline.
- Parameters:
audio_data (np.ndarray) – The audio data to diarize.
min_speakers (Optional[int]) – The minimum number of speakers.
max_speakers (Optional[int]) – The maximum number of speakers.
- Returns:
A list of diarized segments.
- Return type:
List[Tuple[Segment, str, str]]
stellascript.processing.speaker_manager
Manages speaker identification and embedding storage.
This module is responsible for loading a speaker recognition model, generating embeddings for audio segments, and assigning speaker IDs based on similarity. It maintains a registry of known speakers and their corresponding embeddings.
- class stellascript.processing.speaker_manager.SpeakerManager(device: device, similarity_threshold: float)[source]
Bases:
object
Handles speaker embeddings and identification.
This class uses a pre-trained speaker recognition model to create vector embeddings from audio segments. It can then compare these embeddings to identify known speakers or register new ones.
- get_embeddings(audio_segments: List[ndarray]) ndarray [source]
Gets embeddings for a batch of audio segments.
This method attempts to process all segments in a single batch for efficiency. If batch processing fails, it falls back to processing segments one by one.
- Parameters:
audio_segments (List[np.ndarray]) – A list of audio segments as NumPy arrays.
- Returns:
A NumPy array of embeddings for the processed segments.
- Return type:
np.ndarray
- get_speaker_id(embedding: ndarray | Tensor) str | None [source]
Gets or assigns a speaker ID based on embedding similarity.
Compares the provided embedding with stored embeddings of known speakers. If a match is found above the similarity threshold, the existing speaker ID is returned. Otherwise, a new speaker is registered.
- Parameters:
embedding (Union[np.ndarray, torch.Tensor]) – The speaker embedding to identify.
- Returns:
The assigned speaker ID, or None if the embedding is invalid.
- Return type:
Optional[str]