Class of DL models that apply the transformer architecture, originally designed for natural language processing, to computer vision tasks.
Generality: 705