-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Same output when right padding #24
Comments
@siciliano-diag as I remembered, item 0 is just nothing, embedding for this item was fixed all-zero vector, used to mask unused positions in the sequence, and its a causal model, which means masking the last position would mask all previous step inputs. And putting the nothing on top of the sequence is unreasonable practically. Thus, altough I doubt the claim that model is robust, what you observed is expected. |
Thank you very much for your response. I understand why it doesn't make sense to have 0 at the last position, but in case I need to get the model to work in these cases as well, but without modifying the sample itself, would you happen to know which part of the code to modify? Also, as far as fixing the padding embedding to the zero-vector, it wasn't actually working for me because this part:
somehow initializes the Embedding layer again, so also the embedding for 0. |
hi @siciliano-diag , thx, the former response was not just for claiming item 0 is not proper to occupy last position, also for suggesting that what got observed may just be expected behaviour of the model as its 'causal model', pls try to elaborate into the details of 'causal model' and even transformer itself for what you want, here's the video by Prof. Li talking about transformer https://www.youtube.com/watch?v=ugWDIIOHtPA, also you may find lots of materials online to help you understand and customize your model. |
I am experiencing an issue when I give the network sequences in which the last object is replaced by padding (0).
In this case, the trained model always outputs the same sequence, regardless of the other values present in the sequence.
Is this by any chance a known problem?
From what I understand, the
emb_dropout
inlog2feats
should make the model robust against this type of sequences.Am I wrong?
Thank you in advance for your response
The text was updated successfully, but these errors were encountered: