基于ResNet-TSM和BiGRU网络的移动视频感知质量评价模型

杜丽娜; 杨硕; 卓力; 张菁; 李嘉锋

doi:10.11936/bjutxb2022020009

基于ResNet-TSM和BiGRU网络的移动视频感知质量评价模型

Mobile Video Perceptual Quality Assessment Model With ResNet-TSM and BiGRU Network

摘要

摘要: 考虑到卡顿、质量切换、内容特征等因素对用户体验质量的影响都会直接体现在客户端的失真视频里, 提出了一种客户端的移动视频感知质量评价模型。该模型无须对每种影响因素均进行表征和度量, 而是基于深度特征提取+回归的思路, 直接建立失真视频与平均意见分数之间的映射模型。首先, 构建了ResNet-TSM网络结构, 提取失真视频片段的深度时空特征; 为了避免维度灾难, 采用LargeVis算法对提取的深度特征进行降维, 同时提升特征的表达与区分能力。然后, 采用双向门控循环单元网络对视频的长时间依赖关系进行建模, 得到各视频片段的打分, 再利用时间平均池化方法将各片段分数进行聚合, 得到整个视频的打分结果。在WaterlooSQoE-Ⅲ和LIVE-NFLX-Ⅱ数据集上的实验结果表明, 提出的模型可以获得更高的预测精度。

Abstract: Considering the effects of stalling, quality switching, content characteristics and other factors, which will be directly reflected in the distorted video, a client-oriented mobile video perceptual quality assessment model was proposed. The mapping model between the distorted video and the mean opinion score (MOS) was established based on the idea of "deep feature extraction+regression" instead of characterizing and measuring each influencing factor. First, ResNet-TSM network was constructed to extract the deep spatial-temporal features of each distorted video segmentation. Second, LargeVis algorithm was used to reduce the dimensionality of the extracted deep features, and simultaneously improving the representation and discriminative capabilities of the features. Afterward, the score of each video segment was obtained by modeling the long-term dependence of the video by using the bidirectional gated recurrent unit. The temporal mean pooling was adopted to aggregate the scores of each segment to obtain the overall video score. The experimental results on the WaterlooSQoE-Ⅲ and LIVE-NFLX-Ⅱ datasets show that the proposed model can achieve a higher prediction accuracy.

HTML全文

参考文献(23)

施引文献

资源附件(0)