TransWikia.com

SAGAN - what is the correct architecture?

Data Science Asked by Ilya.K. on May 1, 2021

Hi, in the original paper the following scheme of the self-attention appears:
https://arxiv.org/pdf/1805.08318.pdf
enter image description here

In a later overview:
https://arxiv.org/pdf/1906.01529.pdf

this scheme appears:
enter image description here
referring the original paper.

My understanding more correlates with the second paper scheme, as:
enter image description here
Where there is two dot-product operations and three hidden parametric matrices:
$$W_k, W_v, W_q$$
which corresponds to $W_f, W_g, W_h$ without $W_v$ as it in the original paper explanation, which is as following:

enter image description here

Is this a mistake in the original paper ?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP