Adding no_sync and on_fit_batch_end method to core by RuABraun · Pull Request #1449 · speechbrain/speechbrain

RuABraun · 2022-06-19T19:47:28Z

The wav2vec PR would change some things to core.py, this is a new PR to integrate the changes that have a wider impact (are in core.py) first.

The main change is a new no_sync() method, this is for when one is doing DDP and gradient accumulation, in which case it is wasteful to sync gradients on every forward-backward pass. One would use the nn.module.no_sync() context-manager to change that, but this is very awkward to do in SB since one tends to have multiple modules. This method simplifies things: You can just (in fit_batch) do with self.no_sync(not should_step): and everything will be taken care of.

Two other minor things are:

Adding a method on_fit_batch_end(), upon suggestion from @TParcollet this is to keep the logging separate from the training related stuff happening in fit_batch(). Not sure if some arguments should be passed from fit_batch() by default?
clip_grad_norm_ is only called if max_grad_norm != 0.

pplantinga

This will be a great addition once the comments are addressed!

speechbrain/core.py

RuABraun · 2022-06-22T14:03:03Z

I could also add a with no_sync(not should_step) to fit_batch, the way it's implemented it will only not sync if DDP is used and should_step is False, which is the desired behaviour, what do you think @pplantinga ?

pplantinga · 2022-06-22T15:30:56Z

Hmm, that's a great question! On the one hand, I imagine in nearly every case this would have the desired behavior. On the other hand, if it wasn't the desired behavior, it might be tricky to debug. Why don't you go ahead and add it for now, and we'll get feedback from one more person before we merge it. @mravanelli @TParcollet @Gastron do you have opinions about this question?

TParcollet · 2022-06-26T14:59:38Z

Hum, I have no idea what No_sync will do outside of DDP though.

RuABraun · 2022-06-27T12:11:05Z

It won't do anything, it will just be an empty context manager.

TParcollet · 2022-06-27T12:53:12Z

Then let's have it only for DDP.

pplantinga · 2022-06-27T14:05:49Z

Then let's have it only for DDP.

I'm not sure exactly what you mean, there isn't an easy way to apply this only when DDP is active. Code without:

outputs = self.compute_forward(batch, Stage.TRAIN)
loss = self.compute_objectives(outputs, batch, Stage.TRAIN)
(loss / self.grad_accumulation_factor).backward()

Code with:

outputs = self.compute_forward(batch, Stage.TRAIN)
loss = self.compute_objectives(outputs, batch, Stage.TRAIN)
with self.no_sync(not should_step):
  (loss / self.grad_accumulation_factor).backward()

I think this is a good addition because it speeds up computation by 10-20% in cases where DDP and grad accumulation are used.

RuABraun · 2022-07-08T12:41:45Z

What's our conclusion here?

pplantinga · 2022-07-08T13:16:15Z

I'd say lets go ahead with the self.no_sync(not should_step): feature

TParcollet · 2022-07-08T15:10:43Z

Feel free to merge @pplantinga

RuABraun · 2022-07-08T15:20:59Z

Okay great, will add the final touches today/tomorrow

pplantinga

Looks good!

Adding no_sync and on_fit_batch_end method to core

edb7714

mravanelli requested review from TParcollet and pplantinga June 19, 2022 19:50

pplantinga requested changes Jun 21, 2022

View reviewed changes

speechbrain/core.py Outdated Show resolved Hide resolved

speechbrain/core.py Outdated Show resolved Hide resolved

speechbrain/core.py Show resolved Hide resolved

speechbrain/core.py Outdated Show resolved Hide resolved

on_fit_batch_end takes arguments and misc

9077820

RuABraun mentioned this pull request Jun 25, 2022

wav2vec2 pretraining implemented with speechbrain #1312

Merged

using no_sync() in fit_batch() of core.py

0b0ec9d

RuABraun force-pushed the feature/no-sync-method branch from 3300b4c to 0b0ec9d Compare July 10, 2022 17:04

pplantinga approved these changes Jul 11, 2022

View reviewed changes

pplantinga merged commit bc3c728 into speechbrain:develop Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding no_sync and on_fit_batch_end method to core#1449

Adding no_sync and on_fit_batch_end method to core#1449
pplantinga merged 3 commits intospeechbrain:developfrom
RuABraun:feature/no-sync-method

RuABraun commented Jun 19, 2022

pplantinga left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RuABraun commented Jun 22, 2022

pplantinga commented Jun 22, 2022

TParcollet commented Jun 26, 2022

RuABraun commented Jun 27, 2022

TParcollet commented Jun 27, 2022

pplantinga commented Jun 27, 2022

RuABraun commented Jul 8, 2022

pplantinga commented Jul 8, 2022

TParcollet commented Jul 8, 2022

RuABraun commented Jul 8, 2022

pplantinga left a comment

Labels

3 participants

Comments

Conversation

RuABraun commented Jun 19, 2022

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RuABraun commented Jun 22, 2022

pplantinga commented Jun 22, 2022

TParcollet commented Jun 26, 2022

RuABraun commented Jun 27, 2022

TParcollet commented Jun 27, 2022

pplantinga commented Jun 27, 2022

RuABraun commented Jul 8, 2022

pplantinga commented Jul 8, 2022

TParcollet commented Jul 8, 2022

RuABraun commented Jul 8, 2022

pplantinga left a comment

Choose a reason for hiding this comment

Labels

3 participants

Comments