Multi-Object Tracking (MOT) is a key technology for Unmanned Aerial Vehicles (UAVs). Traditional tracking-by-detection methods firstly employ an object detector to retrieve targets in each image and then track them based on a matching algorithm. Recently, the popular multi-task learning methods has been dominating this area since they can detect targets and extract Re-Identification (Re-ID) features in a computationally efficient way. However, the detection task and the tracking task have conflicting requirements on image features, leading to the poor performance of the joint learning model compared to separate detection and tracking methods. The problem is more severe when it comes to UAV images due to the presence of irregular motion of large number of small targets. In this paper, we propose a balanced Joint Detection and Re-ID learning (JDR) network to address the MOT problem in UAV vision. To better handle the non-uniform motion of objects in UAV videos, the Set-Member Filter is applied which describes object state as a bounded set. An appearance matching cascade is then proposed based on target state set. Furthermore, a Motion-Mutation module is designed to address the challenges posed by the abrupt motion of UAV. Extensive experiments on the VisDrone-MOT2019 dataset demonstrate that our proposed model, termed as SMFMOT, outperforms state-of-the-arts by a large margin and achieves superior performance on the MOT tasks in UAV videos.