-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad result with vqgan #418
Comments
yes if you can use such a large batch size, it either means that you have hundred of gpus or that your model is too small |
Thanks for your quick reply!
To speed up the training, I use many A100 GPUs with only 15 epochs. The first setting could be finished in several hours. The loss is from 7.0 -> 6.5 -> 5.4, and stuck at 5.4 from epoch5. |
What could I do if I want to both speed up with large batch size and use only 6layer Transformer? |
Try the default lr and depth 16 Increasing depth usually gets much better results |
Neither, I am using NCCL constructed by a company. |
Hi, I am using VQGAN on the MSCOCO training dataset (also tried adding Visual Genome to construct a 1 Million dataset), but got a bad result. The pixels are wired.
Here are my settings, thanks!
transformer_dim = 512 rotary_emb = False image_fmap_size = 32 self.transformer = Transformer( dim = transformer_dim, causal = True, seq_len = seq_len, depth = self.config_visual_decoder.num_hidden_layers, heads = 8, dim_head = 64, reversible = False, attn_dropout = 0.0, ff_dropout = 0.0, attn_types = 'full', image_fmap_size = image_fmap_size, sparse_attn = False, stable = False, sandwich_norm = False, shift_tokens = False, rotary_emb = rotary_emb, # shared_attn_ids = None, # shared_ff_ids = None, # optimize_for_inference = False, )
I looked up several previous issues and reports and notice that people usually get loss < 4.5 while my loss is around 5.4.
I use a large batch size (more than 3000) while others use a far smaller batch size (like 16), does that matter?
Thanks
The text was updated successfully, but these errors were encountered: