Version 1
: Received: 26 October 2024 / Approved: 26 October 2024 / Online: 28 October 2024 (13:39:24 CET)
How to cite:
Chen, C.-S.; Yang, Y.-H.; Chen, G.-Y.; Chang, S.-H. Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network. Preprints2024, 2024102080. https://doi.org/10.20944/preprints202410.2080.v1
Chen, C.-S.; Yang, Y.-H.; Chen, G.-Y.; Chang, S.-H. Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network. Preprints 2024, 2024102080. https://doi.org/10.20944/preprints202410.2080.v1
Chen, C.-S.; Yang, Y.-H.; Chen, G.-Y.; Chang, S.-H. Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network. Preprints2024, 2024102080. https://doi.org/10.20944/preprints202410.2080.v1
APA Style
Chen, C. S., Yang, Y. H., Chen, G. Y., & Chang, S. H. (2024). Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network. Preprints. https://doi.org/10.20944/preprints202410.2080.v1
Chicago/Turabian Style
Chen, C., Guan-Ying Chen and Shao-Hsuan Chang. 2024 "Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network" Preprints. https://doi.org/10.20944/preprints202410.2080.v1
Abstract
Background: Food properties can directly influence individual dietary intake. Therefore, computer vision-based food recognition could be used to estimate meal contents for patients with metabolic diseases. Food recognition based on deep learning can create opportunities for breakthroughs in dietary interventions for personal health management and have rapidly emerged as dietary assessment strategies. Methods: This study proposed a methodology for automatic food recognition based on fine-grained visual classification, the High-temperaturE Refinement and Background Suppression network (HERBS). A technical investigation was conducted involved in the HERBS model on CNFOOD241 benchmark to verify the model effectiveness. The visualization analysis of HERBS was compared with the VGG16 and RepViT on CNFOOD241. Results: The system achieved classification accuracy of 82.72 % and 97.19 % in Top-1 accuracy and Top-5 accuracy respectively. Data showed that HERBS structure widened the attention area and outperformed VGG16 and RepViT on feature maps. Conclusions: Our findings elucidated that the proposed methodology using HERBS model establishes a new benchmark for SOTA performance in food recognition on CNFOOD 241 dataset. This study proved the feasibility of the proposed approach in a challenging food dataset. We further described a vision-language model structure consisting of a multi-modal vision model, language encoder and a multilayer perceptron classifier for food composition recognition.
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.