Multivariate Time Series Classification (MTSC) is one of most important tasks in time series analysis, aiding in activities such as human motion recognition and medical diagnostics. Existing methods for MTSC do not explicitly model temporal differences and generalize the idea of temporal difference into a efficient temporal module. Additionally, existing methods are not yet able to capture cross-variable relationships well during network training. As a result, they are unable to achieve convincing feature representation, leading to suboptimal classification accuracy. In this paper, we propose a novel MTSC model called Temporal Difference and Cross-variate Fusion Network (TDCFN), which integrates a two-stream differential LSTM network and a cross-variate feature extraction network to enhance feature representation. TDCFN achieves superior classification accuracy by capturing dynamic temporal evolution and inter-variable relationships. The experimental results show that TDCFN can achieve competitive performance with state-of-the-art multivariate time series classification approaches.