Artificial Intelligence Asked by thepacker on January 5, 2022

Let’s assume that we embedded a vector of length 49 into a matrix using 512-d embeddings. If we then multiply the matrix by its transposed version, we receive a matrix of 49 by 49, which is symmetric. Let’s also assume we do not add the positional encoding and we only have only one attention head in the first layer of the transformer architecture.

What would the result of the softmax on this 49 by 49 matrix look like? Is it still symmetric, or is the softmax correctly applied for each line of the matrix, resulting in a non-symmetric matrix? My guess would be that the matrix should not be symmetric anymore. But I’m unsure about that.

I ask this to verify if my implementation is wrong or not, and what the output should look like. I have seen so many sophisticated and different implementations of the transformer architecture with different frameworks, that I can’t answer this question for myself right now (confusion). I still try to understand the basic building blocks of the transformer architecture.

I compared my results visually to a second implementation known to be working - "The annotated transformer". I compared the pytorch calculation results of the attention-method to my implementation results.

The answe is - the softmax is applied row by row. Therefore the resulting matrix p-attn is not equal to its transposed version.

Answered by thepacker on January 5, 2022

1 Asked on November 24, 2021

applications deep learning deepfakes generative adversarial networks

1 Asked on November 20, 2021

autoencoders deep learning machine learning neural networks unsupervised learning

1 Asked on November 20, 2021

1 Asked on November 17, 2021 by dhanush-giriyan

1 Asked on November 12, 2021

1 Asked on November 10, 2021

long short term memory machine learning open ai reinforcement learning time series

1 Asked on November 7, 2021

2 Asked on November 4, 2021

deep rl dqn neural networks reinforcement learning temporal difference methods

1 Asked on November 4, 2021

dense rewards reinforcement learning reward design reward functions reward shaping

0 Asked on November 4, 2021 by tinu

ai development machine learning papers research state of the art

1 Asked on November 4, 2021 by ijuneja

1 Asked on August 24, 2021 by kashan

1 Asked on August 24, 2021 by ram-bharadwaj

1 Asked on August 24, 2021 by metrician

epsilon greedy policy monte carlo methods notation on policy methods reinforcement learning

1 Asked on August 24, 2021 by user289602

1 Asked on August 24, 2021 by daniel-koh

0 Asked on August 24, 2021 by soitgoes

function approximation markov decision process reinforcement learning

0 Asked on August 24, 2021 by seunosiko

0 Asked on August 24, 2021 by user38639

convolutional neural networks data augmentation neural networks testing training

Get help from others!

Recent Questions

- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?
- Does Google Analytics track 404 page responses as valid page views?

Recent Answers

- Joshua Engel on Why fry rice before boiling?
- Jon Church on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- Peter Machado on Why fry rice before boiling?

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP