In tracking-by-detection paradigm for multi-target tracking, target association is modeled as an optimization problem that is usually solved through network flow formulation. In this paper, we proposed combinatorial optimization formulation and used a bipartite graph matching for associating the targets in the consecutive frames. Usually, the target of interest is represented in a bounding box and track the whole box as a single entity. However, in the case of humans, the body goes through complex articulation and occlusion that severely deteriorate the tracking performance. To partially tackle the problem of occlusion, we argue that tracking the rigid body organ could lead to better tracking performance compared to the whole body tracking. Based on this assumption, we generated the target hypothesis of only the spatial locations of person’s heads in every frame. After the localization of head location, a constant velocity motion model is used for the temporal evolution of the targets in the visual scene. Qualitative results are evaluated on four challenging video surveillance dataset and promising results has been achieved.
Keywords:
Subject: Computer Science and Mathematics - Mathematics
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.