1. Introduction
The point cloud is a data set of points in 3D space, where each point possesses its own distinct attributes like color, normal direction or light intensity. This data is commonly acquired through sensors such as laser radar or camera, and is ideal for preserving the geometric structure and surface features of objects, making it very suitable for 3D scene understanding. Point cloud classification has been active in the research fields such as photogrammetry and remote sensing for decades. It has become an important part of intelligent vehicles [
1], automatic driving [
2], 3D reconstruction [
3], forest monitoring [
4], robot perception [
5], traffic signage extraction [
6] and so on. Due to factors like irregularity and disorder, high sensor noise, complex scenes and non-homogeneous density, point cloud classification is still challenging. Most previous works focuse on manually designing point cloud features [
7], which are limited to specific tasks and cannot be optimized easily for new tasks. Nowadays, deep learning has become the leading method of point cloud classification because of its high efficiency in processing large data sets and the autonomy of extracting features.
PointNet, proposed by Charles [
8], is the first groundbreaking deep neural network directly processing unordered point clouds. PointNet mainly utilizes multi-layer perceptions (MLPs) to elevate the coordinate space of point clouds to a high-dimensional feature space, and then obtain representative global feature vectors of point clouds by global pooling. However, PointNet pays too much attention to the global features of point clouds, and its ability to exploit local features is poor. The subsequently proposed method, PointNet++ [
9], is a layered network. Its abstraction layer consists of a sampling layer, a grouping layer, and a PointNet layer for nonlinear mapping. PointNet++ can better learn local features.
Inspired by PointNet++, some works attempt to extract local geometric features of point clouds by using convolution [
10,
11,
12], graph convolution [
13,
14], and transformer [
15,
16] in the non-linear mapping layers. Some methods are also proposed to improve classification performance by deepening the network [
17,
18,
19] or stacking network structures [
20,
21,
22]. In addition, many researchers choose to convert irregular point clouds into regular data before processing [
23,
24], such as converting point clouds into grids for uniform distribution and then using 3D-CNN [
25] to process grid data. These approaches have the disadvantages of high computational complexity and quantization noise, resulting in low accuracy and efficiency of point cloud classification.
Furthermore, the sampling-grouping operation can lead to the loss of point cloud information and the neglect of various detail regions. Using complex feature extractors in the non-linear mapping layer cannot also solve the sampling-grouping operation’s failure to capture multi-scale detailed features well. Meanwhile, [
26] proves that the standard MLP is more suitable for low-dimensional visual and graphics tasks, and cannot learn high-frequency information in theory and practice. Deepening or stacking network structures can lead to network degradation, model performance saturation, and inference delay.
To address the above issues, this paper proposes a simple and intuitive point cloud analysis network without any delicate local feature exploration algorithms or deepening networks: a detail activation operation, similar to positional encoding, is added between the sampling-grouping operation and the non-linear mapping layer, so that subsequent coordinated-based MLPs can learn higher frequency features. We develop a novel Detail Activation (DA) module based on the corroborating theory using Neural Tangent Kernels [
26]. The strategy we adopt is to gradually activate the high-frequency channels of the DA module’s inputs and successively unfold more complex details as the training proceeds. Specifically, after sampling and grouping, the DA module first recovers the detailed features of the point cloud at a low-frequency level. Then, in different network layers, we recover the feature information at different scales by activating channels at different frequency levels in the DA module. In this way, we progressively expand more intricate details as the training continues, which is more conducive to the subsequent MLP layer to learn detailed features.
Additionally, inspired by ResNet [
27], DenseNet [
28], and PointMLP [
18], we develop an improved MLP module that integrates the design concepts of residual connection and dense connection. We call the module ResDMLP. Ablation experiments show that our network, coupling these two designs (ResDMLP and DA), can more efficiently learn and process point clouds than standard MLPs.
Finally, to enable the current layer to retain the details activated by the previous layers ultimately, we introduce residual connections between the network layers so that the entire feature extraction process can be jointly supervised with the details from the farthest scale to the current scale. In this paper, the following is achieved:
(1) We propose a new framework named Point-MDA, which can progressively expand more intricate details as training progresses. Point-MDA can jointly monitor the entire feature extraction process through the activated details from the farthest scale to the current scale.
(2) We introduce a novel Detail Activation (DA) module. In different layers of Point-MDA, we activate different frequency levels in the DA module. By progressively recovering detailed features at different scales, we aim to address the issue of information loss caused by the conventional sampling-grouping operation failing to fully explore the geometry of irregular point clouds. Additionally, the DA module enables subsequent MLPs to better learn high-frequency information, thereby improving the accuracy and robustness of point cloud classification.
(3) We design a plug-and-play style MLP-based module (ResDMLP) that combines ideas of residual connection and dense connection, aiming to reuse features, alleviate gradient disappearance, and help conduct training more intensely, accurately, and effectively.
Our method achieves superior overall and class-average accuracy on the point cloud classification task on ModelNet40 [
29] and ScanObjectNN [
30] datasets without complex feature extraction algorithms or deepening and stacking network structures, extensively tapping the potential of PointNet++.
Author Contributions
Conceptualization, M.L. and Y.Z.; methodology, M.L. and Y.Z.; software, M.L.; validation, M.L. and Y.Z.; formal analysis, M.L.; investigation, M.L.; resources, Y.Z. and W.J.; data curation, M.L. and Y.Z.; writing—original draft preparation, M.L. and Y.Z.; writing—review and editing, M.L. and Y.Z.; visualization, M.L.; supervision, Y.Z. and W.J.; project administration, ,Y.Z. and W.J.; funding acquisition, Y.Z. and W.J. All authors have read and agreed to the published version of the manuscript.