Artificial Intelligence Asked by thepacker on January 5, 2022
Let’s assume that we embedded a vector of length 49 into a matrix using 512-d embeddings. If we then multiply the matrix by its transposed version, we receive a matrix of 49 by 49, which is symmetric. Let’s also assume we do not add the positional encoding and we only have only one attention head in the first layer of the transformer architecture.
What would the result of the softmax on this 49 by 49 matrix look like? Is it still symmetric, or is the softmax correctly applied for each line of the matrix, resulting in a non-symmetric matrix? My guess would be that the matrix should not be symmetric anymore. But I’m unsure about that.
I ask this to verify if my implementation is wrong or not, and what the output should look like. I have seen so many sophisticated and different implementations of the transformer architecture with different frameworks, that I can’t answer this question for myself right now (confusion). I still try to understand the basic building blocks of the transformer architecture.
I compared my results visually to a second implementation known to be working - "The annotated transformer". I compared the pytorch calculation results of the attention-method to my implementation results.
The answe is - the softmax is applied row by row. Therefore the resulting matrix p-attn is not equal to its transposed version.
Answered by thepacker on January 5, 2022
1 Asked on November 24, 2021
applications deep learning deepfakes generative adversarial networks
1 Asked on November 20, 2021
autoencoders deep learning machine learning neural networks unsupervised learning
1 Asked on November 20, 2021
1 Asked on November 17, 2021 by dhanush-giriyan
1 Asked on November 12, 2021
1 Asked on November 10, 2021
long short term memory machine learning open ai reinforcement learning time series
1 Asked on November 7, 2021
2 Asked on November 4, 2021
deep rl dqn neural networks reinforcement learning temporal difference methods
1 Asked on November 4, 2021
dense rewards reinforcement learning reward design reward functions reward shaping
0 Asked on November 4, 2021 by tinu
ai development machine learning papers research state of the art
1 Asked on November 4, 2021 by ijuneja
1 Asked on August 24, 2021 by kashan
1 Asked on August 24, 2021 by ram-bharadwaj
1 Asked on August 24, 2021 by metrician
epsilon greedy policy monte carlo methods notation on policy methods reinforcement learning
1 Asked on August 24, 2021 by user289602
1 Asked on August 24, 2021 by daniel-koh
0 Asked on August 24, 2021 by soitgoes
function approximation markov decision process reinforcement learning
0 Asked on August 24, 2021 by seunosiko
0 Asked on August 24, 2021 by user38639
convolutional neural networks data augmentation neural networks testing training
Get help from others!
Recent Questions
Recent Answers
© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP