Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model

Version 1 : Received: 8 January 2024 / Approved: 8 January 2024 / Online: 8 January 2024 (17:03:40 CET)

A peer-reviewed article of this Preprint also exists.

Xiao, Y.; Dou, Y.; Yang, S. PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model. Remote Sensing 2024, 16, 2453, doi:10.3390/rs16132453. Xiao, Y.; Dou, Y.; Yang, S. PointBLIP: Zero-Training Point Cloud Classification Network Based on BLIP-2 Model. Remote Sensing 2024, 16, 2453, doi:10.3390/rs16132453.

Abstract

Leveraging the open-world understanding capacity of large-scale visual-language pre-trained models has become a hot-spot in point cloud classification. Recent approaches rely on transferable visual-language pre-trained models, classifying point clouds by projecting them into 2D images and evaluating consistency with textual prompts. These methods benefit from the robust open-world understanding capabilities of visual-language pre-trained models and require no additional training. However, they face several challenges summarized as prompt ambiguity, image domain gap, view weights confusion, and feature deviation. In response to these challenges, we propose PointBLIP, a zero-training point cloud classification network based on the recently introduced BLIP-2 visual-language model. PointBLIP is adept at processing similarities between multi-images and multi-prompts. We separately introduce a novel method for point cloud zero-shot and few-shot classification, which involves comparing multiple features to achieve effective classification. Simultaneously, we enhance the input data quality for both the image and text sides of PointBLIP. In point cloud zero-shot classification tasks, we outperform state-of-the-art methods on three benchmark datasets. For few-shot classification tasks, to the best of our knowledge, we present the first zero-training few-shot point cloud method, surpassing previous works under the same conditions and showcasing comparable performance to full-training methods.

Keywords

point cloud classification; zero-training; large-scale vision-and-language model; zero-shot classification; few-shot classification

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.