Deep learning technologies for skin cancer detection have been dramatically advanced based on high resolution dermoscopic images. The low-cost approach based on clinical skin images of low resolution is promising but remains technically challenging due to their undermined image quality. In this paper, we propose a coarse-to-fine efficient super resolution transformer (CF-ESRT) network to reconstruct the dermoscopy-level high resolution skin image from a low resolution clinical image. By connecting the refinement network to the original super resolution transformers and applying perceptual and gradient losses, our framework noticeably improves the finer texture details of skin lesions in the super resolution (SR) images, and is effective to elevate the perceptual quality of the SR images. Quantitative and qualitative evaluations show that our method outperforms ESRT the basis model as well as the other state-of-the-art SR models.