Semantic segmentation of remote sensing urban scene imagery is a pixel-wise prediction, which is applied to identify the land-cover or land-use category. However, semantic segmentation demands huge computation cost. In order to reduce the huge computation cost, a common method is to introduce transformer and CNN hybrid method to have a good trade-off between accuracy and computation cost. However, recent CNN-transformer hybrid methods often capture local-global context and cross-window interaction information simultaneously. Then they introduce the local-global context to fuse the local feature, which repeat fuse the local and global feature and result in the additional computation cost. And previous methods often ignore to filter the local and global feature. To fix the problem, we design a lightweight decoder-EDDformer, which is Efficient Global Value Transformer with Dynamic Gatefusion module. Efficient global value transformer only response for extracting global feature. Dynamic Gatefusion module filter the local and global semantic feature and fuse them to capture the local-global context in one time. Extensive experiments reveal that our method not only runs faster but also achieves higher accuracy compared with other state-of-the-art lightweight models.