TransWikia.com

What's the input dimension for transformer decoder during TRAINING?

Data Science Asked by user99347 on January 13, 2021

For example, translate English sentence A to French sentence B.
During training with ith word in B, all previous words before B will be fed to decoder, whose length will change for different i. How this is handled so that it can fit into a fixed dimension in the final linear layer during TRAINING?

One Answer

For feeding word one by one in transformer network we pass the whole sentence along with a mask to the network. And the mask will do the job by unmasking one new word at a time.

Answered by SrJ on January 13, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP