Adding no_sync and on_fit_batch_end method to core#1449
Adding no_sync and on_fit_batch_end method to core#1449pplantinga merged 3 commits intospeechbrain:developfrom
Conversation
pplantinga
left a comment
There was a problem hiding this comment.
This will be a great addition once the comments are addressed!
|
I could also add a |
|
Hmm, that's a great question! On the one hand, I imagine in nearly every case this would have the desired behavior. On the other hand, if it wasn't the desired behavior, it might be tricky to debug. Why don't you go ahead and add it for now, and we'll get feedback from one more person before we merge it. @mravanelli @TParcollet @Gastron do you have opinions about this question? |
|
Hum, I have no idea what No_sync will do outside of DDP though. |
|
It won't do anything, it will just be an empty context manager. |
|
Then let's have it only for DDP. |
I'm not sure exactly what you mean, there isn't an easy way to apply this only when DDP is active. Code without: Code with: I think this is a good addition because it speeds up computation by 10-20% in cases where DDP and grad accumulation are used. |
|
What's our conclusion here? |
|
I'd say lets go ahead with the |
|
Feel free to merge @pplantinga |
|
Okay great, will add the final touches today/tomorrow |
3300b4c to
0b0ec9d
Compare
The wav2vec PR would change some things to core.py, this is a new PR to integrate the changes that have a wider impact (are in
core.py) first.The main change is a new
no_sync()method, this is for when one is doing DDP and gradient accumulation, in which case it is wasteful to sync gradients on every forward-backward pass. One would use thenn.module.no_sync()context-manager to change that, but this is very awkward to do in SB since one tends to have multiple modules. This method simplifies things: You can just (in fit_batch) dowith self.no_sync(not should_step):and everything will be taken care of.Two other minor things are:
on_fit_batch_end(), upon suggestion from @TParcollet this is to keep the logging separate from the training related stuff happening infit_batch(). Not sure if some arguments should be passed fromfit_batch()by default?clip_grad_norm_is only called ifmax_grad_norm != 0.