Project Modules

main

main.main()[source]: Main function to run the transcription application.

stellascript.orchestrator

Main orchestrator for the Stellascript transcription pipeline.

This module contains the StellaScriptTranscription class, which coordinates various components like audio capture, enhancement, diarization, and transcription to provide a seamless real-time and file-based transcription service.

class stellascript.orchestrator.StellaScriptTranscription(model_id: str = 'large-v3', language: str = 'fr', similarity_threshold: float = 0.7, mode: str = 'block', min_speakers: int | None = None, max_speakers: int | None = None, diarization_method: str = 'pyannote', enhancement_method: str = 'none', save_enhanced_audio: bool = False, save_recorded_audio: bool = False)[source]

Bases: object

Orchestrates the entire transcription process.

This class manages the audio stream (from microphone or file), applies audio enhancement, performs speaker diarization, and uses a transcription model to convert speech to text. It handles different transcription modes (block, segment, word) and coordinates the various sub-modules.

get_transcription() → str | None[source]

Retrieves a completed transcription line from the queue.

Returns:: A transcribed line of text, or None if the queue is empty.
Return type:: Optional[str]

save_audio() → None[source]: Saves the entire recorded audio session to a WAV file.

start_recording() → None[source]: Initializes and starts the audio recording and processing threads.

stop_recording() → None[source]: Stops the audio recording and waits for all processing to complete.

transcribe_file(file_path: str) → None[source]

Transcribes an entire audio file.

This method loads an audio file, applies enhancement and diarization, and then transcribes the resulting audio segments.

Parameters:: file_path (str) – The path to the audio file to transcribe.

stellascript.config

Configuration settings for the Stellascript application.

This module defines various constants that control the behavior of the audio processing, transcription, and diarization pipeline. These settings are organized into sections for clarity and can be tuned to optimize performance for different use cases.

stellascript.config.FORMAT

The audio format used for recording, corresponding to PyAudio’s paFloat32.

Type:: str

stellascript.config.CHANNELS

The number of audio channels (1 for mono).

Type:: int

stellascript.config.RATE

The sampling rate in Hz (16000 Hz is standard for speech).

Type:: int

stellascript.config.CHUNK

The number of samples per buffer, used for VAD processing.

Type:: int

stellascript.config.TRANSCRIPTION_MAX_BUFFER_DURATION

The maximum duration of the audio buffer for transcription in seconds.

Type:: float

stellascript.config.SUBTITLE_MAX_BUFFER_DURATION

The maximum duration of the audio buffer for subtitle generation in seconds.

Type:: float

stellascript.config.VAD_SPEECH_THRESHOLD

The sensitivity threshold for the Voice Activity Detection (VAD).

Type:: float

stellascript.config.VAD_SILENCE_DURATION_S

The duration of silence in seconds that triggers a segment split.

Type:: float

stellascript.config.VAD_MIN_SPEECH_DURATION_S

The minimum duration of speech in seconds to be considered a valid segment.

Type:: float

stellascript.config.SUBTITLE_MAX_LENGTH

The maximum number of characters per subtitle line.

Type:: int

stellascript.config.SUBTITLE_MAX_DURATION_S

The maximum duration of a single subtitle line in seconds.

Type:: float

stellascript.config.SUBTITLE_MAX_SILENCE_S

The maximum duration of silence to tolerate before creating a new subtitle line.

Type:: float

stellascript.config.MAX_MERGE_GAP_S

The maximum gap of silence in seconds between two speech segments to be merged into one.

Type:: float

stellascript.config.TARGET_CHUNK_DURATION_S

The target duration for audio chunks when processing a file.

Type:: float

stellascript.config.MAX_CHUNK_DURATION_S

The maximum allowed duration for an audio chunk.

Type:: float

stellascript.config.MIN_SILENCE_GAP_S

The minimum duration of silence to be considered a gap for chunking.

Type:: float

stellascript.config.TRANSCRIPTION_PADDING_S

The duration of silence padding added to audio segments before transcription.

Type:: float

stellascript.config.MODELS

A list of available Whisper models for transcription.

Type:: list[str]

stellascript.cli

stellascript.cli.parse_args() → Namespace[source]

Parses command-line arguments for the Stellascript application.

This function sets up an ArgumentParser to handle various command-line options for transcription, including language, model selection, input file, diarization, and audio enhancement. It also includes argument validation to ensure compatibility between different options.

Returns:: An object containing the parsed command-line arguments.
Return type:: argparse.Namespace

stellascript.cli.validate_args(args: Namespace, parser: ArgumentParser) → None[source]

Validates the parsed command-line arguments to ensure they are consistent.

This function checks for various invalid combinations of arguments, such as: - Using speaker count constraints in live mode. - Incompatible diarization and transcription modes. - Misuse of the similarity threshold with certain diarization methods. - Conflicting arguments for speaker count and similarity threshold.

Parameters:

args (argparse.Namespace) – The parsed command-line arguments.
parser (argparse.ArgumentParser) – The argument parser, used to report errors.

Raises:

SystemExit – If an invalid combination of arguments is found, the program exits with an error message.

stellascript.logging_config

Configuration centralisée du logging pour StellaScript. Logging professionnel, équilibré et non-verbeux.

class stellascript.logging_config.CustomFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]

Custom log formatter to remove the ‘stellascript.’ prefix from logger names.

This formatter simplifies the log output by stripping the base package name, making the logs cleaner and easier to read.

format(record: LogRecord) → str[source]

Formats the log record.

Parameters:: record (logging.LogRecord) – The original log record.
Returns:: The formatted log message.
Return type:: str

stellascript.logging_config.get_logger(name: str) → Logger[source]

Retrieves a logger instance for a specific module.

This is a convenience function to get a logger that is part of the ‘stellascript’ hierarchy. The logger inherits its configuration from the root logger set up by setup_logging.

Parameters:: name (str) – The name of the logger, typically __name__ of the module.
Returns:: A configured logger instance.
Return type:: logging.Logger

stellascript.logging_config.setup_logging(level: int = 20, log_file: str | None = None) → Logger[source]

Configures the logging for the entire application.

This function sets up a root logger for the ‘stellascript’ package with a custom formatter. It supports logging to both the console and an optional log file.

Parameters:

level (int) – The logging level (e.g., logging.INFO, logging.DEBUG).
log_file (str | None) – Optional path to a file for logging.

Returns:

The configured root logger for the application.

Return type:

logging.Logger

stellascript.audio.capture

Handles audio capture from the microphone using PyAudio.

class stellascript.audio.capture.AudioCapture(format: str, channels: int, rate: int, chunk: int)[source]

Bases: object

A class to manage audio recording from the microphone.

This class provides a context manager to handle the lifecycle of a PyAudio stream, ensuring that resources are properly opened and closed.

audio_stream(callback: Callable) → Generator[Stream | None, None, None][source]

A context manager for opening and managing a PyAudio stream.

Parameters:: callback (Callable) – The callback function to process audio chunks.
Yields:: Optional[pyaudio.Stream] – The PyAudio stream object.

stellascript.audio.enhancement

Handles audio enhancement using various methods like DeepFilterNet and Demucs.

class stellascript.audio.enhancement.AudioEnhancer(enhancement_method: str, device: device, rate: int)[source]

Bases: object

A class to apply audio enhancement techniques to audio data.

This class supports multiple enhancement methods and handles the loading of the necessary models.

apply(audio_data: ndarray, is_live: bool = False) → ndarray[source]

Apply the selected audio enhancement method.

Parameters:

audio_data (np.ndarray) – The input audio data as a NumPy array.
is_live (bool) – Flag indicating if the processing is for a live stream.

Returns:

The enhanced audio data.

Return type:

np.ndarray

stellascript.processing.transcriber

Handles audio transcription using the WhisperX library.

This module provides a Transcriber class that encapsulates the logic for loading a Whisper model and using it to transcribe audio segments. It supports generating both full-text transcriptions and detailed word-level timestamps.

class stellascript.processing.transcriber.Transcriber(model_id: str, device: device, language: str)[source]

Bases: object

A wrapper for the WhisperX transcription model.

This class manages the loading of the WhisperX model and provides a simple interface to transcribe audio data. It can be configured for different model sizes, languages, and devices.

transcribe_segment(audio_data: ndarray, rate: int, padding_duration: float, word_timestamps: bool = False) → str | Tuple[List[Any], str][source]

Transcribes a single audio segment.

The audio segment is padded with silence to improve transcription accuracy at the beginning and end of the speech.

Parameters:

audio_data (np.ndarray) – The raw audio data of the segment.
rate (int) – The sample rate of the audio.
padding_duration (float) – The duration of silence padding in seconds.
word_timestamps (bool) – If True, returns word-level timestamps.

Returns:

If word_timestamps is False, returns the transcribed text as a string.
If word_timestamps is True, returns a tuple containing a list of segment objects (with word details) and the full transcribed text.

Return type:

Union[str, Tuple[List[Any], str]]

stellascript.processing.diarizer

Handles speaker diarization using different methods like Pyannote and VAD with clustering.

class stellascript.processing.diarizer.Diarizer(device: device, method: str, hf_token: str | None, rate: int)[source]

Bases: object

A class to perform speaker diarization on audio data.

This class supports multiple diarization methods, including the pre-trained Pyannote pipeline and a custom VAD-based clustering approach.

apply_vad_to_chunk(audio_chunk: ndarray) → float[source]

Apply VAD to a small audio chunk for live subtitle mode.

Parameters:: audio_chunk (np.ndarray) – The audio chunk to process.
Returns:: The speech probability.
Return type:: float

diarize_cluster(audio_data: ndarray, speaker_manager: SpeakerManager, similarity_threshold: float, max_speakers: int | None = None) → Tuple[List[Dict[str, Any]], int][source]

Diarize audio using VAD and clustering.

Parameters:

audio_data (np.ndarray) – The audio data to diarize.
speaker_manager (SpeakerManager) – The speaker manager for embeddings.
similarity_threshold (float) – The similarity threshold for clustering.
max_speakers (Optional[int]) – The maximum number of speakers.

Returns:

A tuple containing the list of: diarized segments and the number of found speakers.

Return type:

Tuple[List[Dict[str, Any]], int]

diarize_pyannote(audio_data: ndarray, min_speakers: int | None = None, max_speakers: int | None = None) → List[Tuple[Segment, str, str]][source]

Diarize audio using the Pyannote pipeline.

Parameters:

audio_data (np.ndarray) – The audio data to diarize.
min_speakers (Optional[int]) – The minimum number of speakers.
max_speakers (Optional[int]) – The maximum number of speakers.

Returns:

A list of diarized segments.

Return type:

List[Tuple[Segment, str, str]]

stellascript.processing.speaker_manager

Manages speaker identification and embedding storage.

This module is responsible for loading a speaker recognition model, generating embeddings for audio segments, and assigning speaker IDs based on similarity. It maintains a registry of known speakers and their corresponding embeddings.

class stellascript.processing.speaker_manager.SpeakerManager(device: device, similarity_threshold: float)[source]

Bases: object

Handles speaker embeddings and identification.

This class uses a pre-trained speaker recognition model to create vector embeddings from audio segments. It can then compare these embeddings to identify known speakers or register new ones.

get_embeddings(audio_segments: List[ndarray]) → ndarray[source]

Gets embeddings for a batch of audio segments.

This method attempts to process all segments in a single batch for efficiency. If batch processing fails, it falls back to processing segments one by one.

Parameters:: audio_segments (List[np.ndarray]) – A list of audio segments as NumPy arrays.
Returns:: A NumPy array of embeddings for the processed segments.
Return type:: np.ndarray

get_speaker_id(embedding: ndarray | Tensor) → str | None[source]

Gets or assigns a speaker ID based on embedding similarity.

Compares the provided embedding with stored embeddings of known speakers. If a match is found above the similarity threshold, the existing speaker ID is returned. Otherwise, a new speaker is registered.

Parameters:: embedding (Union[np.ndarray, torch.Tensor]) – The speaker embedding to identify.
Returns:: The assigned speaker ID, or None if the embedding is invalid.
Return type:: Optional[str]