Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training not converge on Amazon Beauty dataset according to the paper's hyperparameter setting #8

Open
lipingcoding opened this issue Nov 13, 2020 · 3 comments
Labels
question Further information is requested

Comments

@lipingcoding
Copy link

No description provided.

@pmixer
Copy link
Owner

pmixer commented Nov 13, 2020

@lipingcoding thx for reporting the issue, I also observed some problems and still wondering what's wrong with pytorch version compared with tf implementation. Sorry to say but I still haven't figured it out yet. The only thing I can be sure currently is that original paper's hyperparameter setting could be be directly used for this codebase, as I fixed some leaky attention issue by using PyTorch's MHA, the parameter initialization issue still need to be elaborated but I haven't done it yet.

@pmixer
Copy link
Owner

pmixer commented Nov 13, 2020

btw @lipingcoding could u pls provide bit more information like log etc. to describe the observation that the model do not converge? Like uncomment

# print("loss in epoch {} iteration {}: {}".format(epoch, step, loss.item())) # expected 0.4~0.6 after init few epochs
would print loss for iteration which should be informative.

@pmixer
Copy link
Owner

pmixer commented Nov 13, 2020

@lipingcoding try the newly updated code?

@pmixer pmixer added the question Further information is requested label Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants