sharing weight matrix between the two embedding layers and the pre-softmax linear transformation  

Hi, thanks for your repo: helps a lot!
In the paper weight matrix is shared between the two embedding layers and the pre-softmax linear transformation.
"In our model, we share the same weight matrix between the two embedding layers and the pre-softmax
linear transformation, similar to [30]. " (Page 5, Chapter 3.4 Embeddings and Softmax)  
Would it be correct to modify in transformer_model.py the following rows to something like this:
rows 32-33    ->     self.src_embedding = self.trg_embedding = Embedding(src_vocab_size, model_dimension)
row 50           ->     self.decoder_generator = DecoderGenerator(self.src_embedding.embeddings_table.weight)
row 221         ->     def __init__(self, shared_embedding_weights):
row 224         ->     self.linear = nn.Linear(shared_embedding_weights.size()[1], shared_embedding_weights.size()[0], bias=False)
        del self.linear.weight
        self.shared_embedding_weights = shared_embedding_weights
row 232 -> self.linear.weight = self.shared_embedding_weights
row 233 -> return self.log_softmax(self.linear(trg_representations_batch) * math.sqrt(self.shared_embedding_weights.size()[1]))
        

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

sharing weight matrix between the two embedding layers and the pre-softmax linear transformation #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

sharing weight matrix between the two embedding layers and the pre-softmax linear transformation #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions