[WIP] Zero-Shot Multi-Speaker Tacotron2 by pradnya-git-dev · Pull Request #2120 · speechbrain/speechbrain

pradnya-git-dev · 2023-08-13T02:50:00Z

Contribution in a nutshell

Hey, this could help our community work with zero-shot multi-speaker text-to-speech

Scope

Extending single-speaker Tacotron2 with speaker encoding capabilities to generate speech based on input text and speaker identity (embedding)

…into MSTTS

mravanelli · 2023-09-22T14:16:28Z

Thank you @pradnya-git-dev for submitting this PR! Your contribution is greatly appreciated as it adds a valuable feature to SpeechBrain. Below are my comments and suggestions:

Readme Updates:

Training Time and GPU Requirements: Please consider adding information in the README regarding the expected training time and GPU requirements for both Zero-Shot Multi-Speaker Tacotron2 and HiFi GAN (Vocoder). This information will be helpful for users who want to utilize these features efficiently.
Best Model in SpeechBrain HF Repo: It would be beneficial to place the current best model on the SpeechBrain Hugging Face (HF) repository and mark it as a work in progress. This will help pretraining in our future work. Additionally, please upload the current best model to the SpeechBrain Dropbox. For detailed instructions on this, please contact me privately.

Recipe Test Failures:

Test Failures: There are test failures in the recipe tests. Specifically, there's an issue with the LibriTTS recipe test, which is failing due to a KeyError ('LJ050-0131'). Please investigate and resolve this issue.

python -c 'from tests.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=[["LibriTTS"]], do_checks=True, run_opts="--device=cuda")) else print("TEST PASSED")'
ERROR: Error in LibriTTS_row_03 (recipes/LibriTTS/TTS/mstacotron2/hparams/train.yaml). Check tests/tmp/LibriTTS_row_03/stderr.txt and tests/tmp/LibriTTS_row_03/stdout.txt for more info.
TEST FAILED!
spk_emb = speaker_embeddings[raw_batch[idx]["uttid"]]
KeyError: 'LJ050-0131'

Script Redundancy:

Redundancy in Training Scripts: I'm curious why we need to redefine the training script for VoxCeleb/SpeakerRec (e.g., train_ecapa_tdnn_mel_spec.yaml and train_speaker_embeddings_mel_spec.py). If possible, the best option would be of reusing the existing train_speaker_embeddings.py script with minor modifications. This can help reduce code duplication and maintenance efforts.

Code Optimization:

Minimize Code Redundancy: There might be a significant overlap between speechbrain/lobes/models/MSTacotron2.py and Tacotron2.py. If not already done, consider redefining in speechbrain/lobes/models/MSTacotron2.py only the classes that need to be modified for the injection of speaker embeddings. This approach will help us minimize code redundancy and maintain cleaner code.

mravanelli · 2023-10-19T23:38:10Z

Thank you @pradnya-git-dev for working on this PR! It is an important first step toward zero-shot TTS in SpeechBrain. The quality of the generated speech can be improved, but we will do that in a follow up PR.

pradnya-git-dev added 4 commits August 12, 2023 22:38

adding zero-shot multi-speaker Tacotron2

21ed881

removed sample limit used for debugging

c86bd05

minor fix for test

6b5c280

recipe for ECAPA-TDNN with mel-spectrograms

ed15005

mravanelli assigned pradnya-git-dev Aug 13, 2023

mravanelli added the enhancement New feature or request label Aug 13, 2023

minor additions for tests

6f631a7

mravanelli self-requested a review August 15, 2023 14:35

pradnya-git-dev and others added 3 commits August 17, 2023 23:46

readme update

c848ec9

update to latest dev + small fixes

901b5e3

Merge branch 'develop' into MSTTS

8c6db1d

mravanelli marked this pull request as ready for review September 22, 2023 13:35

mravanelli added 3 commits September 22, 2023 13:54

fix yaml + fix recipe test on voxceleb

0b09dd6

Merge branch 'MSTTS' of https://github.com/pradnya-git-dev/speechbrain …

ae6da04

…into MSTTS

add missing link

3ea3a1f

pradnya-git-dev added 14 commits September 25, 2023 11:53

Merge branch 'speechbrain:develop' into MSTTS

eb7b839

code optimization

de139b2

code optimization - loss restore

9364199

minor documentation change

4fd2380

minor documentation fix for tests

d3be8d3

updating loss example

a508c40

updating hparams

ff0c768

removing script redundancy

b17e13c

minor changes for tests

e813476

updating recipe entry

f6957ae

minor changes for tests

0c42325

changes for inference

ce07c3a

internal sorting for input texts

22a7743

updating hparams with the current best

b5adc8f

pradnya-git-dev and others added 11 commits October 15, 2023 20:08

adding random speaker voice generation

21d619c

Merge branch 'develop' into MSTTS

62ac16e

minor changes for flake8

b84fa8d

updates for doctests

f69f280

fix one issue wit recipe tests

5897742

readme update

55b442d

minor update for tests

79cff28

small fix in recipe tests

ba492f9

add dropbox link

ec359cb

add performance notice

40bbe0f

last change

fc892ac

mravanelli approved these changes Oct 19, 2023

View reviewed changes

mravanelli merged commit e0d43d9 into speechbrain:develop Oct 19, 2023

BenoitWang mentioned this pull request Nov 22, 2023

multi-speaker tacotron2 enhancements #2261

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Zero-Shot Multi-Speaker Tacotron2#2120

[WIP] Zero-Shot Multi-Speaker Tacotron2#2120
mravanelli merged 36 commits intospeechbrain:developfrom
pradnya-git-dev:MSTTS

pradnya-git-dev commented Aug 13, 2023 •

edited

Loading

mravanelli commented Sep 22, 2023

mravanelli commented Oct 19, 2023

Labels

2 participants

Comments

Conversation

pradnya-git-dev commented Aug 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!