Structural magnetic resonance imaging (sMRI) is widely used in the clinical diagnosis of diseases due to its advantages: high definition and noninvasive. Therefore, computer-aided diagnosis based on sMRI images is broadly applied in classifying Alzheimer’s disease (AD). Due to the excellent performance of Transformer in computer vision, Vision Transformer (ViT) has been employed for AD classification in recent years. ViT relies on access to large datasets, while the sample size of brain imaging datasets is relatively insufficient. Moreover, the pre-processing procedures of brain sMRI images are complex and labor-intensive. To overcome the limitations mentioned above, we propose Resizer Swin Transformer (RST), a deep learning model that can extract information from brain sMRI images that are only briefly processed to achieve multi-scale and cross-channel features. In addition, we pre-trained our RST on a natural image dataset and obtained better performance. The experimental results of ADNI and AIBL datasets prove that RST can achieve better classification performance in AD prediction compared with CNN-based and Transformer models.