轻量化替代XYZNet中ResNet18的RGB-D特征提取网络
RGB-D Feature Extraction Network Based on Lightweight Replacement of ResNet18 in XYZNet
-
摘要: 针对RGB-D位姿估计中特征提取网络参数量过大的问题, 提出轻量子网络——轻质网(light weight net, LWNet), 替代XYZNet中的ResNet18主干。首先, 设计了改进的C2f模块, 与部分卷积(partial convolution, PConv)模块结合形成特征提取结构用于网络浅层的特征提取, 提高了特征重用性; 然后, 设计了特征融合的高效层聚合网络(feature fusion efficient layer aggregation networks, FF-ELAN)模块用于网络深层的特征提取, 以轻量的方式整合了全局特征; 最后, 为了适应网络深度的变化, 采用了3个与ResNet18不同的下采样结构, 使得参数量降低的同时获得更强大的性能。实验结果表明: 优化网络的浮点计算量(floating point operations, FLOPs)压缩了58.1%, 参数量削减了57.3%, 推理速度提升了10.2%, 同时, 在LineMOD与YCB-Video数据集上分别获得1.3、0.8个百分点的精度增益。该架构显著提升了边缘计算场景的部署潜力。Abstract: To address the issue of excessive parameter volume in RGB-D pose estimation feature extraction networks, the lightweight subnetwork—light weight net (LWNet) is proposed to replace the ResNet18 backbone in XYZNet. First, an improved C2f module was designed in conjunction with the partial convolution (PConv) module to form a feature extraction structure for shallow-level feature extraction, enhancing feature reusability.Second, the feature fusion efficient layer aggregation networks (FF-ELAN) module was designed for deep-level feature extraction, integrating global features in a lightweight manner.Afterward, three downsampling structures distinct from ResNet18 were designed, cutting the number of parameters while boosting performance. Experimental results show that the optimized network achieves a 58.1% reduction in floating-point operations (FLOPs), a 57.3% decrease in parameter count, and a 10.2% improvement in inference speed. Simultaneously, it achieves accuracy improvements of 1.3 percentage points on the LineMOD dataset and 0.8 percentage points on the YCB-Video dataset. This architecture significantly enhances deployment potential for edge computing scenarios.
下载: