Deep Speaker Embedding Across Languages

Speaker Embedding for Speaker Verification

  • This project is my final year project. It has won the Honours Project - Technical Excellence Award.
  • Deep speaker embedding for speaker verification with a domain loss to alleviate the language mismatch problem.
  • The performance of the ECAPA-TDNN (pre-trained using the English dataset) on the unlabelled Chinese dataset has improved by 10% with the MMD-based domain loss.

This project proposed a new language-independent speaker verification system. Based on the state-of-the-art ECAPA-TDNN model, the proposed model is trained with two extra different-level domain losses. While the initial model is trained on English-speaking data, the proposed model is trained on the labeled English-speaking data and unlabeled Chinese-speaking data. The performance of the proposed system is shown to be better than the original ECAPA-TDNN model on Chinese-speaking data.

An extra exploration of the VAE-based model is also worth mentioning. Although the designed VAE-based model performs poorly, it is still beneficial to make such an attempt. Further adjustments to the architecture or the training scheme of the VAE-based model can be made.

Jiaying Fang
Jiaying Fang
Electrical Engineering Master Student

My research interests include computer vision, deep learning, robotics perception, and autonomous vehicle.