TorchSpeaker

class TorchSpeaker(language: str = 'ru', speaker_model: str = 'v5_5_ru', speaker: str = 'xenia', sample_rate: int = 48000, device: str | object = 'cpu', repo_or_dir: str = 'snakers4/silero-models', model: str = 'silero_tts', source: str = 'github', trust_repo: bool | str | None = None, skip_validation: bool | None = None, **settings: object)[source]

Bases: BaseSpeaker

Generate speech by using a Torch-backed Silero model.

speak(text: str, **settings: object) tuple[ndarray, int][source]

Convert text into audio data.

Parameters:
  • text (str) – Text to synthesize.

  • settings (dict[str, object]) – Additional synthesis settings.

Returns:

Generated audio samples and bitrate.

Return type:

tuple[numpy.ndarray, int]