Real-time object detection remains a pivotal topic in the realm of computer vision. Balancing the accuracy and speed of object detectors poses a formidable challenge for both academic researchers and industry practitioners. While recent transformer-based models have showncase the prowess of the attention mechanism, delivering remarkable performance improvements over CNNs, their computational demands can hinder their effectiveness in real-time detection tasks. In this paper, we elect to use YOLOX as our robust starting point and introduce a series of effective enhancements, culminating in a new high-performance detector named YOLOAX. To further exploit the power of the attention mechanism, we devise multi-dimensional attention-based modules that activate CNNs, emphasizing the most pertinent regions and bolstering the capacity to learn the most informative image representations from feature maps. Moreover, we introduce a novel leading label assignment strategy called STA, along with a groundbreaking loss function named GEIOU Loss, to further refine our detector's performance. We provide extensive ablation studies on the COCO and PASCAL VOC 2012 detection datasets to validate our proposed methods. Remarkably, our YOLOAX is trained solely on the MS-COCO dataset from scratch, without leveraging any prior knowledge, which achieves an impressive 54.2% AP on the COCO 2017 test set while maintaining a real-time speed of 72.3 fps, surpassing YOLOX by a significant margin of 3.0% AP. Our source code and pre-trained models are openly accessible at https://github.com/KejianXu/yoloax.