TransWikia.com

Computing Attention on multi-dimensional sequences?

Data Science Asked on June 10, 2021

Is it possible to compute attention/adapt existing transformer architectures (like longformer) to be used on multi-dimensional sequence input?

As in, instead of a 1D array of tokens (like a python list of tokens to be used to calculate attention on), I feed an array of 2D/3D/4D tokens and I want to pre-train my language model on that via Masked Language Modelling technique (i.e predicting masked tokens).

Is it even possible to do this? any idea what modification I would have to make?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP