TransWikia.com

What is the difference between BERT architecture and vanilla Transformer architecture

Data Science Asked by Luong Minh Tam on February 25, 2021

I’m doing some research for the summarization task and found out BERT is derived from the Transformer model. In every blog about BERT that I have read, they focus on explaining what is a bidirectional encoder, So, I think this is what made BERT different from the vanilla Transformer model. But as far as I know, the Transformer reads the entire sequence of words at once, therefore it is considered bidirectional too. Can someone point out what I’m missing?

One Answer

The name provides a clue. BERT (Bidirectional Encoder Representations from Transformers): So basically BERT = Transformer Minus the Decoder

BERT ends with the final representation of the words after the encoder is done processing it.

In Transformer, the above is used in the decoder. That piece of architecture is not there in BERT

Correct answer by Allohvk on February 25, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP