Preprint Article Version 1 This version is not peer-reviewed

Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network

Version 1 : Received: 26 October 2024 / Approved: 26 October 2024 / Online: 28 October 2024 (13:39:24 CET)

How to cite: Chen, C.-S.; Yang, Y.-H.; Chen, G.-Y.; Chang, S.-H. Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network. Preprints 2024, 2024102080. https://doi.org/10.20944/preprints202410.2080.v1 Chen, C.-S.; Yang, Y.-H.; Chen, G.-Y.; Chang, S.-H. Food Classification for Dietary Support Using Fine-Grained Visual Recognition with the HERBS Network. Preprints 2024, 2024102080. https://doi.org/10.20944/preprints202410.2080.v1

Abstract

Background: Food properties can directly influence individual dietary intake. Therefore, computer vision-based food recognition could be used to estimate meal contents for patients with metabolic diseases. Food recognition based on deep learning can create opportunities for breakthroughs in dietary interventions for personal health management and have rapidly emerged as dietary assessment strategies. Methods: This study proposed a methodology for automatic food recognition based on fine-grained visual classification, the High-temperaturE Refinement and Background Suppression network (HERBS). A technical investigation was conducted involved in the HERBS model on CNFOOD241 benchmark to verify the model effectiveness. The visualization analysis of HERBS was compared with the VGG16 and RepViT on CNFOOD241. Results: The system achieved classification accuracy of 82.72 % and 97.19 % in Top-1 accuracy and Top-5 accuracy respectively. Data showed that HERBS structure widened the attention area and outperformed VGG16 and RepViT on feature maps. Conclusions: Our findings elucidated that the proposed methodology using HERBS model establishes a new benchmark for SOTA performance in food recognition on CNFOOD 241 dataset. This study proved the feasibility of the proposed approach in a challenging food dataset. We further described a vision-language model structure consisting of a multi-modal vision model, language encoder and a multilayer perceptron classifier for food composition recognition.

Keywords

Food dataset; HERBS; fine-grained food recognition; dietary assessment

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.