This research uses ARM-based FPGA to implement the computation of a virtual-real fusion hardware system. It also evaluates the acceleration effect of virtual-real fusion execution speed after connecting related hardware acceleration and provides a reference for real-time information fusion system design. The hardware uses Xilinx Zynq UltraScale+ MPSoC ZCU102 Evaluation Kit. It also obtains color and depth images with Intel® RealsenseTM D435i depth cameras. This system receives image data through the Linux system built on the ARM and fed to the FPGA for object detection using CNN models. In addition, it develops a module for rapidly performing registration between the color images and the depth images. It can calculate the size and position of the display information on the transparent display according to the pixel coordinates and depth values of the human eye and the object. The fusion took about 47ms on a personal computer with a GPU RTX2060 and 25ms on the ZCU102. The acceleration is nearly 168%. Finally, the results were successfully transplanted into the retail display system for demonstration.