Voicebox - Open Source Voice Cloning Desktop App
Source-code: https://github.com/jamiepine/voicebox
deezer/spleeter: Deezer source separation library including pretrained models
Deezer source separation library including pretrained models. - deezer/spleeter
Recordly - Open-source app for incredible screen recordings
Source-code: https://github.com/webadderall/Recordly
NVIDIA/personaplex: PersonaPlex code.
PersonaPlex code
BetterAudio — Master your Mac's Audio
Per-app volume control, local AI transcription, and professional audio routing for macOS. Native, fast, and private.
Monologue | Effortless voice dictation so you can work 3x faster
Write 3x faster with Al voice dictation that gets your intent right. Say it once, see it written as you meant.
gaheldev/Millisecond: Optimize your Linux system for low latency audio
Optimize your Linux system for low latency audio. Contribute to gaheldev/Millisecond development by creating an account on GitHub.
NVIDIA PersonaPlex: Natural Conversational AI With Any Role and Voice - NVIDIA ADLR
We introduce PersonaPlex, a full-duplex conversational AI model that enables natural conversations with customizable voices and roles. PersonaPlex handles interruptions and backchannels while maintaining any chosen persona, outperforming existing systems on conversational dynamics and task adherence.
MelogenAI - Convert Sheet Music to Midi Online with Ai
Convert your sheet music from various image formats to digital midi for your musical workflow
QwenLM/Qwen3-TTS: Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice cloning.
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice...
Pipit — #1 Voice-to-Text App for macOS
The best voice-to-text app for macOS. Lightning-fast, private, and free.
OpenWhispr | Open Source AI Voice Dictation
Source-code: https://github.com/HeroTools/open-whispr
ggml-org/whisper.cpp: Port of OpenAI's Whisper model in C/C++
Port of OpenAI's Whisper model in C/C++. Contribute to ggml-org/whisper.cpp development by creating an account on GitHub.
Apple - Accessibility
Explore built-in accessibility features to help you create, connect, and do what you love, your way.
Emacspeak -The Complete Audio Desktop
Source-code: https://github.com/tvraman/emacspeak
Dolphin - ScreenReader
Dolphin ScreenReader is fast and reliable screen reading software for blind people. Customise for a fully accessible screen reading experience.
Narrator for Accessibility | Microsoft Windows
Explore Windows 11 accessibility with Narrator. Narrator is a built-in screen reader that speaks out loud what's on your screen so you can read or browse the web.
rwth-i6/rasr: The RWTH ASR Toolkit.
The RWTH ASR Toolkit. Contribute to rwth-i6/rasr development by creating an account on GitHub.
HTK Speech Recognition Toolkit
HTK - Hidden Markov Model Toolkit - Speech Recognition toolkit
CMUSphinx Open Source Speech Recognition
Source-code: https://github.com/cmusphinx/pocketsphinx/
Screen reader on your Chromebook - Chromebook Help
Chromebooks have a built-in screen reader called ChromeVox, which enables people with visual impairments to use the Chrome operating system. Turn screen reader on or off You can turn ChromeVox on
Microsoft Speech SDK 5.1 - Microsoft Download Center
The Microsoft Speech SDK 5.1 adds Automation support to the features of the previous version of the Speech SDK. You can now use the Win32 Speech API (SAPI) to develop speech applications with Visual Basic ®, ECMAScript and other Automation languages.
Cloud Text-to-Speech (HD voices) | Google Cloud Documentation
Learn about Chirp 3: HD voices, the latest generation of Text-to-Speech technology. Powered by our latest generation of generative models, these voices deliver realism and emotional resonance.
Azure Speech in Foundry Tools | Microsoft Azure
Explore Azure AI Speech for speech recognition, text to speech, and translation. Build multilingual AI apps with powerful, customizable speech models.
Otosaku/OtosakuTTS-iOS · GitHub
Swift library for offline text-to-speech synthesis on iOS/macOS. Generate natural speech directly on device using CoreML-optimized FastPitch and HiFiGAN models. No internet required, fully priv...
Free Voice Reader - 87 Hours TTS for $249/year
Best value text-to-speech: 900+ voices, 87 hours/year for $2.86/hour. 89% cheaper than Eleven Labs. Try free!