Modern multimedia applications have increasingly high requirements for video encoders, requiring both high coding efficiency and low computational complexity to ensure low latency and high transmission speed in real-time applications. To meet this demand, researchers are committed to designing efficient and low-complexity video encoders. Although there has been some research work on reducing the computational complexity of VVC encoders, most of the research has focused on accelerating early decision-making in the partitioning process [
14,
15,
16,
17,
18,
19,
20]. Reference [
14] proposed a fast partitioning algorithm for intra and inter-frame encoding. For intra-frame encoding, the Canny edge detection algorithm is used to extract the features of image encoding and the features are used to determine whether to skip vertical or horizontal partitioning, achieving the goal of early termination. For inter-frame encoding, the three-frame difference method is used to determine whether an object is a moving target. Reference [
15] proposes a fast texture-based CU partitioning method, which evaluates the complexity of the current CU to determine whether to skip subsequent partitioning. At the same time, an improved Canny operator is used to extract edge information to exclude horizontal or vertical partitioning patterns. Reference [
16] analyzes the probability of affine motion estimation mode in bidirectional prediction, explores the mutual exclusion between skip mode and affine mode, and proposes a VVC fast affine motion estimation mode based on near coding information. Reference [
17] studied a fast motion estimation algorithm for early termination of partial blocks in CU, using skip mode for CU that does not require affine changes. Reference [
18] Zhao et al. extracted standard deviation and edge ratio to accelerate the division of CU. The CU split information and the time position of the encoded frame are used for low-complexity encoders [
19]. Reference [
20] checks whether the optimal prediction mode of the current encoding block is skip mode. If it does, it skips the entire affine motion estimation process and checks the direction of the optimal prediction. The detection results determine whether to reduce the size of the reference sequence, thereby reducing computational complexity. Reference [
21] proposes an adaptive affine four-parameter and six-parameter encoding architecture, where the encoder can adaptively select between two affine motion models. Reference [
22] proposes an affine motion estimation model that iteratively searches for affine motion vectors, and a method for constructing an affine advanced motion vector prediction candidate (AAMVP) list, which has been adopted by the VVC standard. Reference [
23] proposes an affine motion compensation based on feature matching, which can further improve the efficiency of video coding. Reference [
24] carries out affine motion estimation through block division and predicts each pixel using a reference coordinate system, to achieve the purpose of predicting Affine transformation. Reference [
25] proposes an affine motion estimation scheme that does not require additional alternating segmentation and estimation, described by applying a segmented function of the parameter field, and derives a specific splitting optimization scheme at close range. Reference [
26] proposed a method to use Rate–distortion theory and displacement estimation error to determine the minimum bit rate required for information transmission of prediction error in the coding process. Reference [
27] proposed a method to solve the problem of relative pose estimation by using the Affine transformation between feature points. Reference [
28] proposed a motion compensation scheme for three-zone segmentation. Based on segmentation information, three motion compensation regions are divided, namely the edge region, foreground region, and background region. By using the information from these three regions, the accuracy and encoding efficiency of motion compensation are improved. Reference [
29] proposed a method of edge video compression texture synthesis based on a Generative adversarial network to obtain the most authentic texture information. Reference [
30] proposes an affine parameter model that utilizes matching algorithms to discover and extract feature point pairs from edges within consecutive frames and selects the optimal set of three sets of point pairs to describe global motion. Reference [
31] proposes linear applications of traditional intra-prediction modes based on pattern correlation processing sequence, region-based template matching prediction methods, and neural network-based intra-prediction modes. Reference [
32] proposes a context-based inter-mode judging method that skips affine modes by determining whether radial motion estimation is performed during the rate-distortion optimization process of the optimal CU mode decision. Reference [
33] added momentum parameters to accelerate the iterative process based on the symmetry of the affine motion estimation iterative process.
Overall, most of the current research work is focused on the skip judging of the affine motion estimation module. However, there have not been many improvements and optimizations to the architecture of the affine motion itself, resulting in the high computational complexity and complexity of the affine motion estimation module itself not being well addressed. This is what current research work needs to do.