Cross Validated Asked on January 3, 2022
I have seen BERT was one of the state-of-the-arts word embedding method in 2018 and then XLNet is proposed in 2019 to take care of the limitations of BERT. I have seen one limitation of BERT is the the maximum length of input tokens (which is 512, see this link ). Does anyone know the reason?
It's an arbitrary value. It is the longest length of input vector they assumed to be possible. Presumably, they didn't have longer vectors in the training set. Moreover, you can always truncate a vector and ignore farther away history, so in such case the length of the vector would be the farthest history you would considered to be useful. 512 is a power of two, what also suggests that the value is chosen arbitrarily by a computer science minded person.
Answered by Tim on January 3, 2022
1 Asked on December 13, 2021 by delta-divine
1 Asked on December 13, 2021 by sendilab
1 Asked on December 13, 2021 by romsch
0 Asked on December 13, 2021 by ronald-van-den-berg
analytical circular statistics simulation variance von mises distribution
0 Asked on December 13, 2021 by bk_
1 Asked on December 13, 2021
1 Asked on December 13, 2021 by antifrax
0 Asked on December 11, 2021
1 Asked on December 11, 2021 by user30474
causality difference in difference econometrics fixed effects model panel data
2 Asked on December 11, 2021
1 Asked on December 11, 2021 by pheno
ab test marketing mathematical statistics normalization time series
1 Asked on December 11, 2021
1 Asked on December 11, 2021
1 Asked on December 11, 2021
0 Asked on December 11, 2021 by joff
0 Asked on December 11, 2021 by meenakshi-s
1 Asked on December 11, 2021
group differences mixed model random effects model regression repeated measures
1 Asked on December 11, 2021 by epsilondelta
Get help from others!
Recent Answers
Recent Questions
© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP