

Qualities of Swin Transformer make it compatible with a broad range of vision This hierarchical architecture has the flexibility to model at various scalesĪnd has linear computational complexity with respect to image size. Non-overlapping local windows while also allowing for cross-window connection. The shifted windowing schemeīrings greater efficiency by limiting self-attention computation to To address theseĭifferences, we propose a hierarchical Transformer whose representation isĬomputed with \textbfdows. High resolution of pixels in images compared to words in text. Two domains, such as large variations in the scale of visual entities and the Challenges inĪdapting Transformer from language to vision arise from differences between the

Download a PDF of the paper titled Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, by Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo Download PDF Abstract: This paper presents a new vision Transformer, called Swin Transformer, thatĬapably serves as a general-purpose backbone for computer vision.
