In recent years, with the rapid development of computer vision technology and the popularization of intelligent hardware, as well as people’s increasing demand for intelligent products for human-computer interaction, visual grounding technology can help machines and humans identify and locate objects, thereby promoting human-computer interaction and intelligent manufacturing. At the same time, human-computer interaction is constantly evolving and improving, becoming increasingly intelligent, humane, and efficient. This paper proposes a new VG model and designs a language verification module that uses language information as the main information to increase the model’s interactivity. Additionally, we propose the combination of visual grounding and human-computer interaction, aiming to explore the research status and development trends of visual grounding and human-computer interaction technology, as well as their application in practical scenarios and optimization directions, to provide references and guidance for relevant researchers and promote the development and application of visual grounding and human-computer interaction technology.