WebSafe 3.7github.com
|
|
🏠
Skip to content

Adding no_sync and on_fit_batch_end method to core#1449

Merged
pplantinga merged 3 commits intospeechbrain:developfrom
RuABraun:feature/no-sync-method
Jul 11, 2022
Merged

Adding no_sync and on_fit_batch_end method to core#1449
pplantinga merged 3 commits intospeechbrain:developfrom
RuABraun:feature/no-sync-method

Conversation

@RuABraun
Copy link
Collaborator

The wav2vec PR would change some things to core.py, this is a new PR to integrate the changes that have a wider impact (are in core.py) first.

The main change is a new no_sync() method, this is for when one is doing DDP and gradient accumulation, in which case it is wasteful to sync gradients on every forward-backward pass. One would use the nn.module.no_sync() context-manager to change that, but this is very awkward to do in SB since one tends to have multiple modules. This method simplifies things: You can just (in fit_batch) do with self.no_sync(not should_step): and everything will be taken care of.

Two other minor things are:

  1. Adding a method on_fit_batch_end(), upon suggestion from @TParcollet this is to keep the logging separate from the training related stuff happening in fit_batch(). Not sure if some arguments should be passed from fit_batch() by default?
  2. clip_grad_norm_ is only called if max_grad_norm != 0.
Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a great addition once the comments are addressed!

@RuABraun
Copy link
Collaborator Author

I could also add a with no_sync(not should_step) to fit_batch, the way it's implemented it will only not sync if DDP is used and should_step is False, which is the desired behaviour, what do you think @pplantinga ?

@pplantinga
Copy link
Collaborator

Hmm, that's a great question! On the one hand, I imagine in nearly every case this would have the desired behavior. On the other hand, if it wasn't the desired behavior, it might be tricky to debug. Why don't you go ahead and add it for now, and we'll get feedback from one more person before we merge it. @mravanelli @TParcollet @Gastron do you have opinions about this question?

@TParcollet
Copy link
Collaborator

Hum, I have no idea what No_sync will do outside of DDP though.

@RuABraun
Copy link
Collaborator Author

It won't do anything, it will just be an empty context manager.

@TParcollet
Copy link
Collaborator

Then let's have it only for DDP.

@pplantinga
Copy link
Collaborator

Then let's have it only for DDP.

I'm not sure exactly what you mean, there isn't an easy way to apply this only when DDP is active. Code without:

outputs = self.compute_forward(batch, Stage.TRAIN)
loss = self.compute_objectives(outputs, batch, Stage.TRAIN)
(loss / self.grad_accumulation_factor).backward()

Code with:

outputs = self.compute_forward(batch, Stage.TRAIN)
loss = self.compute_objectives(outputs, batch, Stage.TRAIN)
with self.no_sync(not should_step):
  (loss / self.grad_accumulation_factor).backward()

I think this is a good addition because it speeds up computation by 10-20% in cases where DDP and grad accumulation are used.

@RuABraun
Copy link
Collaborator Author

RuABraun commented Jul 8, 2022

What's our conclusion here?

@pplantinga
Copy link
Collaborator

I'd say lets go ahead with the self.no_sync(not should_step): feature

@TParcollet
Copy link
Collaborator

Feel free to merge @pplantinga

@RuABraun
Copy link
Collaborator Author

RuABraun commented Jul 8, 2022

Okay great, will add the final touches today/tomorrow

@RuABraun RuABraun force-pushed the feature/no-sync-method branch from 3300b4c to 0b0ec9d Compare July 10, 2022 17:04
Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@pplantinga pplantinga merged commit bc3c728 into speechbrain:develop Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants

Comments