基于双模编码器-解码器框架的联机手写数学公式识别
Online Handwritten Mathematical Expression Recognition Based on Dual-mode Encoder-decoder Framework
-
摘要: 为了充分利用联机手写数学公式的笔迹特征和全局二维结构特征, 将联机模式和脱机模式联合, 设计了一种基于编码器-解码器框架的双模识别模型。该模型可接受一维坐标点序列和二维静态图像形式的手写数学公式数据, 通过联机编码器从输入的坐标点序列中提取笔迹特征信息, 并通过脱机编码器从静态图像中提取二维结构特征信息, 进而充分保留手写笔迹特征和全局二维结构特征。在编码器阶段, 针对联机模式提出正弦编码, 对输入的坐标点序列进行编码, 补充笔画级别的信息, 能够有效避免笔画间隔模糊导致的笔画信息丢失的问题; 针对脱机模式提出平滑注意力机制, 通过引入平滑窗口的方式, 实现特征图中每个像素特征的感受野的自适应调整, 在一定程度上解决了普通的注意力机制无法同时对尺寸相差较大的手写符号筛选有效特征信息的问题, 有效提升了注意力机制捕捉有效手写区域的能力。实验结果表明, 该模型的公式识别准确率可达58.76%, 并且与相同领域内其他的识别模型相比, 其可将公式识别准确率提升1.56%~4.71%, 达到较高水平。Abstract: To make full use of the handwriting order feature and global two-dimensional structure feature of online handwritten mathematical expression, a dual-mode recognition model based on an encoder-decoder framework was designed by combining online mode and offline mode. The model can accept the handwritten mathematical expression data in the form of one-dimension coordinate point sequence and two-dimensional static image. The model can extract the handwriting order feature information from the input coordinate point sequence through the online encoder, and extract the two-dimensional structure feature information from the static image through the offline encoder, so as to fully retain the handwriting order feature and global two-dimensional structure feature. In the encoder stage, sinusoidal coding was proposed for the online mode to encode the input coordinate point sequence and supplement the stroke level information, which can effectively avoid the loss of stroke information caused by fuzzy stroke interval. For the offline mode, the smooth attention mechanism was proposed. By adopting the smooth window, adaptive adjustment of the receptive field of each pixel feature in the feature map was realized, which had solved the problem that the ordinary attention mechanism cannot filter the effective feature information of handwritten symbols with large size differences at the same time. It had effectively improved the ability of the attention mechanism to capture the effective handwritten area. The experimental results show that the accuracy rate of mathematical expression recognition of the dual-mode model can reach 58.76%, and compared with other recognition models in the same field, the dual-mode model can improve the accuracy rate of mathematical expression recognition by 1.56%-4.71%, reaching a higher level.