It turned out that the attention cannot be compresed using TT-decomposition (empirical result) , but with the Tucker decomposition we achieve the same quality as a full model.
BLEU = 0.44 compression rate: 3.175 compression rate without embeddings: 9.174