- GitHub - FunAudioLLM CosyVoice: Multi-lingual large voice generation . . .
CosyVoice 2 0 has been released! Compared to version 1 0, the new version offers more accurate, more stable, faster, and better speech generation capabilities Crosslingual Mixlingual:Support zero-shot voice cloning for cross-lingual and code-switching scenarios
- [2412. 10117] CosyVoice 2: Scalable Streaming Speech Synthesis with . . .
Therefore, in this report, we present an improved streaming speech synthesis model, CosyVoice 2, which incorporates comprehensive and systematic optimizations Specifically, we introduce finite-scalar quantization to improve the codebook utilization of speech tokens
- CosyVoice2. 0 - funaudiollm. github. io
By training on a large-scale multilingual dataset, CosyVoice 2 achieves human-comparable synthesis quality with very low response latency and real-time factor
- CosyVoice2-0. 5B · Models
CosyVoice 2 0 has been released! Compared to version 1 0, the new version offers more accurate, more stable, faster, and better speech generation capabilities Crosslingual Mixlingual:Support zero-shot voice cloning for cross-lingual and code-switching scenarios
- README. md · FunAudioLLM CosyVoice2-0. 5B at main
We strongly recommend that you download our pretrained CosyVoice-300M CosyVoice-300M-SFT CosyVoice-300M-Instruct model and CosyVoice-ttsfrd resource If you are expert in this field, and you are only interested in training your own CosyVoice model from scratch, you can skip this step
- CosyVoice | Multilingual TTS Model
CosyVoice 2 0 available now! Introducing CosyVoice, a state-of-the-art multilingual voice generation model for high-fidelity text-to-speech synthesis Experience seamless voice cloning and ultra-fast streaming, now supporting a variety of languages What is CosyVoice?
- FunAudioLLM CosyVoice2-0. 5B - Model Info, Parameters, Benchmarks . . .
Discover CosyVoice 2, a cutting-edge streaming speech synthesis model offering ultra-low latency, improved pronunciation, and multilingual support for seamless text-to-speech experiences
- CosyVoice2. 0
By training on a large-scale multilingual dataset, CosyVoice 2 achieves human-comparable synthesis quality with very low response latency and real-time factor
|