-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any explanation for adjusting seqs after embedding? #18
Comments
@baiyuting it's kind of normalization operation inherited from the original BERT paper https://arxiv.org/abs/1810.04805, check Prof. Lee https://speech.ee.ntu.edu.tw/~hylee/index.php Transformer+BERT lectures on Youtube/Bilibili if interested. BTW, u are encouraged to remove this if want to try, SASRec is not that deep comparing to BERT. |
I clone bert from https://github.com/google-research/bert.git, but find no related code in modeling.py:embedding_lookup() , do I missed something? Could you give me a more specific elaboration, since it is a trick I did not notice before? |
ops, my fault, BERT is just the encoder of Transformer, should refer to https://arxiv.org/pdf/1706.03762.pdf section 3.2.1 on Scaled Dot-Product Attention: Attention(Q, K, V ) = softmax(QK/√d) V for me, its just a normalization operation, if you are very interested in it, pls try to play with it based on math like in https://medium.com/@shoray.goel/kaiming-he-initialization-a8d9ed0b5899 on your own. |
I think the range of position embedding is relatively large, while the range of item embedding is relatively small. Position embedding may suppress the signal of item embedding. By scaling the embedding vector of items, the embedding of items and the embedding of positions become more consistent in numerical range. Specific reference can be made https://datascience.stackexchange.com/questions/87906/transformer-model-why-are-word-embeddings-scaled-before-adding-positional-encod/88159#88159 |
I found
seq *= self.item_emb.embedding_dim ** 0.5
in functionlog2feats(self, log_seqs)
, Is there any reason for adjusting seqs after embedding?`seqs = self.item_emb(torch.LongTensor(log_seqs).to(self.dev))
seqs *= self.item_emb.embedding_dim ** 0.5`
The text was updated successfully, but these errors were encountered: