Speech to Text Recognition Basic Code of Python

Alibaba Triples Down on Speech, Translation, Multimodal AI, New Model Launches Show

Alibaba rolls out models for speech recognition, speech synthesis, AI live speech translation, audio captioning, and multilingual OCR.

IEEE

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC

Abstract: Multi-talker speech recognition (MTASR) faces unique challenges in disentangling and transcribing overlapping speech. To address these challenges, this paper investigates the role of ...

IEEE

Dynamic Language Group-based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing

Abstract: The Mixture of Experts (MoE) model is a promising approach for handling code-switching speech recognition (CS-ASR) tasks. However, the existing CS-ASR work on MoE has yet to leverage the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Alibaba Triples Down on Speech, Translation, Multimodal AI, New Model Launches Show

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC

Dynamic Language Group-based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing

Trending now