Abstract:
To solve the problem that it is difficult to obtain the original video information in the practical applications, a no-reference video multimethod assessment fusion (VMAF) prediction model was proposed in this paper. First, the VMAF scores of distorted video frames were predicted by adopting a frame level no-reference VMAF prediction model, which was established by a convolutional neural network based on multi-mode bilinear pooling operation. Second, the quality feature vector was obtained by fusing the aggregation results of the VMAF prediction scores of the distorted video frames by three different temporal pooling methods. Finally, the nu support vector regression (NuSVR) method was adopted to establish the mapping relationship model between the quality feature vector and the VMAF score of the video. The important application value is demonstrated that the proposed model can predict the VMAF score of the distorted videos without the original video information. Experimental results show that the proposed model can obtain higher prediction accuracy.