• 综合性科技类中文核心期刊
    • 中国科技论文统计源期刊
    • 中国科学引文数据库来源期刊
    • 中国学术期刊文摘数据库(核心版)来源期刊
    • 中国学术期刊综合评价数据库来源期刊
HAN Jidong, LI Yujian. Survey of Catastrophic Forgetting Research in Neural Network Models[J]. Journal of Beijing University of Technology, 2021, 47(5): 551-564. DOI: 10.11936/bjutxb2020120014
Citation: HAN Jidong, LI Yujian. Survey of Catastrophic Forgetting Research in Neural Network Models[J]. Journal of Beijing University of Technology, 2021, 47(5): 551-564. DOI: 10.11936/bjutxb2020120014

Survey of Catastrophic Forgetting Research in Neural Network Models

More Information
  • Received Date: December 20, 2020
  • Available Online: August 03, 2022
  • Published Date: May 09, 2021
  • In recent years, neural network models have achieved great success in some fields, such as image segmentation, object detection, natural language processing (NLP), and so on. However, many key problems of neural network models have not been solved, for example, catastrophic forgetting. Human beings have the ability of continuous learning without catastrophic forgetting, but neural network models do not. Neural network models almost completely forget the previously learned tasks when it adapts to the new task. To solve this problem, many methods have been proposed. This paper summarized these methods to promote further research on this issue. The existing methods of mitigating catastrophic forgetting of neural network models were introduced in detail, and all methods were divided into four categories, namely exemplar-based methods, parameter-based methods, distillation-based methods and other methods. Different evaluation schemes were introduced to evaluate the effect of different methods on alleviating catastrophic forgetting of neural network models. An open discussion on the catastrophic forgetting problem in neural network models was carried out, and some research suggestions were given.

  • [1]
    SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. doi: 10.1038/nature16961
    [2]
    SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354-359. doi: 10.1038/nature24270
    [3]
    HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
    [4]
    HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
    [5]
    SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play[J]. Science, 2018, 362(6419): 1140-1144. doi: 10.1126/science.aar6404
    [6]
    YE D, LIU Z, SUN M, et al. Masteringcomplex control in MOBA games with deep reinforcement learning[C]//AAAI. Cambridge, MA: AAAI Press, 2020: 6672-6679.
    [7]
    LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 3431-3440.
    [8]
    HE K, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2961-2969.
    [9]
    RENARD F, GUEDRIA S, DE PALMA N, et al. Variability and reproducibility in deep learning for medical image segmentation[J]. Scientific Reports, 2020, 10(1): 1-16. doi: 10.1038/s41598-019-56847-4
    [10]
    PORZI L, HOFINGER M, RUIZ I, et al. Learning multi-object tracking and segmentation from automatic annotations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 6846-6855.
    [11]
    KONG T, SUN F, LIU H, et al. Foveabox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398. doi: 10.1109/TIP.2020.3002345
    [12]
    DING M, HUO Y, YI H, et al. Learning depth-guided convolutions for monocular 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 1000-1001.
    [13]
    QIN Z, LI Z, ZHANG Z, et al. ThunderNet: towards real-time generic object detection on mobile devices[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2019: 6718-6727.
    [14]
    DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[J]. arXiv, 2018: abs/1810. 04805.
    [15]
    LAN Z, CHEN M, GOODMAN S, et al. Albert: a lite bert for self-supervised learning of language representations[J]. arXiv, 2019: abs/1909. 11942.
    [16]
    RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners[J]. OpenAI Blog, 2019, 1(8): 9. http://web.archive.org/web/20190226183542/https:/d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf
    [17]
    BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[J]. arXiv, 2020: abs/2005. 14165.
    [18]
    ARTACHO B, SAVAKIS A. UniPose: unified human pose estimation in single images and videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 7035-7044.
    [19]
    CHEN L, AI H, CHEN R, et al. Cross-view tracking for multi-human 3D pose estimation at over 100 FPS[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3279-3288.
    [20]
    CHEN X, WANG G, GUO H, et al. Pose guided structured region ensemble network for cascaded hand pose estimation[J]. Neurocomputing, 2020, 395: 138-149. doi: 10.1016/j.neucom.2018.06.097
    [21]
    JIN S, XU L, XU J, et al. Whole-body human pose estimation in the wild[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 196-214.
    [22]
    CARPENTER G A, GROSSBERG S. The ART of adaptive pattern recognition by a self-organizing neural network[J]. Computer, 1988, 21(3): 77-88. doi: 10.1109/2.33
    [23]
    MCCLOSKEY M, COHEN N J. Catastrophic interference in connectionist networks: the sequential learning problem[J]. The Psychology of Learning and Motivation, 1989, 24: 109-165. http://www.sciencedirect.com/science/article/pii/S0079742108605368
    [24]
    HETHERINGTON P. Is there 'catastrophic interference' in connectionist networks?[C]//Proceedings of the 11th Annual Conference of the Cognitive Science Society. Mahwah: Lawrence Erlbaum Associates, 1989: 26-33.
    [25]
    MCRAE K, HETHERINGTON P A. Catastrophic interference is eliminated in pretrained networks[C]//Proceedings of the 15h Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates, 1993: 723-728.
    [26]
    FRENCH R M. Pseudo-recurrent connectionist networks: An approach to the 'sensitivity-stability' dilemma[J]. Connection Science, 1997, 9(4): 353-380. doi: 10.1080/095400997116595
    [27]
    ANS B, ROUSSET S. Avoiding catastrophic forgetting by coupling two reverberating neural networks[J]. Comptes Rendus de l'Académie des Sciences-Series III-Sciences de la Vie, 1997, 320(12): 989-997. doi: 10.1016/S0764-4469(97)82472-9
    [28]
    ROBINS A. Catastrophic forgetting, rehearsal and pseudorehearsal[J]. Connection Science, 1995, 7(2): 123-146. doi: 10.1080/09540099550039318
    [29]
    ARIVAZHAGAN N, BAPNA A, FIRAT O, et al. Massively multilingual neural machine translation in the wild: findings and challenges[J]. arXiv, 2019: abs/1907. 05019.
    [30]
    SHAZEER N, MIRHOSEINI A, MAZIARZ K, et al. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer[J]. arXiv, 2017: abs/1701. 06538.
    [31]
    ZACARIAS A, ALEXANDRE L A. SeNA-CNN: overcoming catastrophic forgetting in convolutional neural networks by selective network augmentation[C]//IAPR Workshop on Artificial Neural Networks in Pattern Recognition. Berlin: Springer, 2018: 102-112.
    [32]
    ROY D, PANDA P, ROY K. Tree-CNN: a hierarchical deep convolutional neural network for incremental learning[J]. Neural Networks, 2020, 121: 148-160. doi: 10.1016/j.neunet.2019.09.010
    [33]
    SCHAK M, GEPPERTH A. A study on catastrophic forgetting in deep LSTM networks[C]//International Conference on Artificial Neural Networks. Berlin: Springer, 2019: 714-728.
    [34]
    THANH-TUNG H, TRAN T. Catastrophic forgetting and mode collapse in GANs[C]//2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE, 2020: 1-10.
    [35]
    CHANG Y, LI W, PENG J, et al. Memory protection generative adversarial network (MPGAN): a framework to overcome the forgetting of GANs using parameter regularization methods[J]. IEEE Access, 2020, 8: 179942-179954. doi: 10.1109/ACCESS.2020.3028067
    [36]
    DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12(7): 2121-2159. http://web.stanford.edu/~jduchi/projects/DuchiHaSi10.html
    [37]
    ZEILER M D. Adadelta: an adaptive learning rate method[J]. arXiv, 2012: abs/1212. 5701.
    [38]
    TIELEMAN T, HINTON G. Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude[Z/OL]. [2012-02-11]. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
    [39]
    KINGMA D, BA J. Adam: a method for stochastic optimization[C]//Proceedings of International Conference on Learning Representations. Amsterdam: Amsterdam Machine Learning Lab, 2015.
    [40]
    REBUFFI S A, KOLESNIKOV A, SPERL G, et al. Icarl: incremental classifier and representation learning[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2001-2010.
    [41]
    SARWAR S S, ANKIT A, ROY K. Incremental learning in deep convolutional neural networks using partial network sharing[J]. IEEE Access, 2019, 8: 4615-4628. http://arxiv.org/abs/1712.02719
    [42]
    LI Z, HOIEM D. Learning without forgetting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12): 2935-2947. http://europepmc.org/abstract/MED/29990101
    [43]
    ZENG G, CHEN Y, CUI B, et al. Continual learning of context-dependent processing in neural networks[J]. Nature Machine Intelligence, 2019, 1(8): 364-372. doi: 10.1038/s42256-019-0080-x
    [44]
    VON OSWALD J, HENNING C, SACRAMENTO J, et al. Continual learning with hypernetworks[C]//International Conference on Learning Representations. Amsterdam: Elsevier, 2019.
    [45]
    LI X, ZHOU Y, WU T, et al. Learn to grow: a continual structure learning framework for overcoming catastrophic forgetting[C]//International Conference on Machine Learning. New York, NY: ACM, 2019: 3925-3934.
    [46]
    GOODFELLOW I J, MIRZA M, XIAO D, et al. An empirical investigation of catastrophic forgetting in gradient-based neural networks[J]. arXiv, 2013: abs/1312. 6211.
    [47]
    RUDER S. An overview of multi-task learning in deep neural networks[J]. arXiv, 2017: abs/1706. 05098.
    [48]
    ZHUANG F, QI Z, DUAN K, et al. Acomprehensive survey on transfer learning[C]//Proceedings of the IEEE. Piscataway: IEEE, 2020: 1-34.
    [49]
    FRENCH R M. Catastrophic forgetting in connectionist networks[J]. Trends in Cognitive Sciences, 1999, 3(4): 128-135. doi: 10.1016/S1364-6613(99)01294-2
    [50]
    DE LANGE M, ALJUNDI R, MASANA M, et al. Continual learning: acomparative study on how to defy forgetting in classification tasks[J]. arXiv, 2019: abs/1909. 08383.
    [51]
    LESORT T, LOMONACO V, STOIAN A, et al. Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges[J]. Information Fusion, 2020, 58: 52-68. doi: 10.1016/j.inffus.2019.12.004
    [52]
    PARISI G I, KEMKER R, PART J L, et al. Continual lifelong learning with neural networks: a review[J]. Neural Networks, 2019, 113: 54-71. doi: 10.1016/j.neunet.2019.01.012
    [53]
    BELOUADAH E, POPESCU A, KANELLOS I. Acomprehensive study of class incremental learning algorithms for visual tasks[J]. arXiv, 2020: abs/2011. 01844.
    [54]
    MASANA M, LIU X, TWARDOWSKI B, et al. Class-incremental learning: survey and performance evaluation[J]. arXiv, 2020: abs/2010. 15277.
    [55]
    YOSINSKI J, CLUNE J, BENGIO Y, et al. How transferable are features in deep neural networks?[C]//Advances in Neural Information Processing Systems. La Jolla, CA: Neural Information Processing Systems Foundation, 2014: 3320-3328.
    [56]
    GUO L, XIE G, XU X, et al. Exemplar-supported representation for effective class-incremental learning[J]. IEEE Access, 2020, 8: 51276-51284. doi: 10.1109/ACCESS.2020.2980386
    [57]
    YOU C, LI C, ROBINSON D P, et al. Scalable exemplar-based subspace clustering on class-imbalanced data[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 67-83.
    [58]
    BELOUADAH E, POPESCU A. Il2m: class incremental learning with dual memory[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2019: 583-592.
    [59]
    ISELE D, COSGUN A. Selective experience replay for lifelong learning[J]. arXiv, 2018: abs/1802. 10269.
    [60]
    HAYES T L, KAFLE K, SHRESTHA R, et al. Remind your neural network to prevent catastrophic forgetting[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 466-483.
    [61]
    ATKINSON C, MCCANE B, SZYMANSKI L, et al. Pseudo-rehearsal: achieving deep reinforcement learning without catastrophic forgetting[J]. arXiv, 2018: abs/1812. 02464.
    [62]
    ATKINSON C, MCCANE B, SZYMANSKI L, et al. Pseudo-recursal: solving the catastrophic forgetting problem in deep neural networks[J]. arXiv, 2018: 1802. 03875.
    [63]
    SHIN H, LEE J K, KIM J, et al. Continual learning with deep generative replay[C]//Advances in Neural Information Processing Systems. La Jolla, CA: Neural Information Processing Systems Foundation, 2017: 2990-2999.
    [64]
    MALLYA A, LAZEBNIK S. Packnet: adding multiple tasks to a single network by iterative pruning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7765-7773.
    [65]
    SINGH P, VERMA V K, MAZUMDER P, et al. Calibrating CNNs for lifelong learning[C]//Advances in Neural Information Processing Systems. La Jolla, CA: Neural Information Processing Systems Foundation, 2020: 33.
    [66]
    KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526. doi: 10.1073/pnas.1611835114
    [67]
    EL KHATIB A, KARRAY F. Preempting catastrophic forgetting in continual learning models by anticipatory regularization[C]//2019 International Joint Conference on Neural Networks. Piscataway: IEEE, 2019: 1-7.
    [68]
    HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv, 2015: abs/1503. 02531.
    [69]
    LI P, SHU C, XIE Y, et al. Hierarchical knowledge squeezed adversarial networkcompression[C]//AAAI. Cambridge, MA: AAAI Press, 2020: 11370-11377.
    [70]
    SUN S, CHENG Y, GAN Z, et al. Patient knowledge distillation for BERT modelcompression[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2019: 4314-4323.
    [71]
    WEI Y, PAN X, QIN H, et al. Quantization mimic: towards very tiny CNN for object detection[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin: Springer, 2018: 267-283.
    [72]
    YIM J, JOO D, BAE J, et al. A gift from knowledge distillation: fast optimization, network minimization and transfer learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 4133-4141.
    [73]
    XU J, NIE Y, WANG P, et al. Training a binary weight object detector by knowledge transfer for autonomous driving[C]//2019 International Conference on Robotics and Automation. Piscataway: IEEE, 2019: 2379-2384.
    [74]
    AHN S, HU S X, DAMIANOU A, et al. Variational information distillation for knowledge transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9163-9171.
    [75]
    SHMELKOV K, SCHMID C, ALAHARI K. Incremental learning of object detectors without catastrophic forgetting[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 3400-3409.
    [76]
    CHEN L, YU C, CHEN L. A new knowledge distillation for incremental object detection[C]//2019 International Joint Conference on Neural Networks. Piscataway: IEEE, 2019: 1-7.
    [77]
    HOU S, PAN X, CHANGE LOY C, et al. Lifelong learning via progressive distillation and retrospection[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 437-452.
    [78]
    CASTRO F M, MARÍN-JIMÉNEZ M J, GUIL N, et al. End-to-end incremental learning[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 233-248.
    [79]
    LEE K, LEE K, SHIN J, et al. Overcoming catastrophic forgetting with unlabeled data in the wild[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2019: 312-321.
    [80]
    MUÑOZ-MARTÍN I, BIANCHI S, PEDRETTI G, et al. Unsupervised learning to overcome catastrophic forgetting in neural networks[J]. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, 2019, 5(1): 58-66. doi: 10.1109/JXCDC.2019.2911135
    [81]
    TITSIAS M K, SCHWARZ J, MATTHEWS A G G, et al. Functional regularisation for continual learning with Gaussian processes[J]. arXiv, 2019: abs/1901. 11356.
    [82]
    CAO Z. Realizing continual learning through modeling a learning system as a fiber bundle[J]. arXiv, 2019: abs/1903. 03511.
    [83]
    KEMKER R, MCCLURE M, ABITINO A, et al. Measuring catastrophic forgetting in neural networks[J]. arXiv, 2017: abs/1708. 02072.
    [84]
    VAN DE VEN G M, TOLIAS A S. Three scenarios for continual learning[J]. arXiv, 2019: abs/1904. 07734.
    [85]
    PFÜLB B, GEPPERTH A. A comprehensive, application-oriented study of catastrophic forgetting in dnns[J]. arXiv, 2019: abs/1905. 08101.
    [86]
    LOMONACO V, MALTONI D. CORe50: a new dataset and benchmark for continuous object recognition [C]//Proceedings of the 1st Annual Conference on Robot Learning. Brookline, MA: Microtome Publishing, 2017: 17-26.
  • Related Articles

    [1]WANG Ding, LI Xin. Transferable Incremental Heuristic Dynamic Programming With Wastewater Treatment Applications[J]. Journal of Beijing University of Technology, 2025, 51(3): 277-283. DOI: 10.11936/bjutxb2023080013
    [2]GAO Tiaokang, JIN Xiaoning, LAI Yingxu. Model Heterogeneous Federated Learning for Intrusion Detection[J]. Journal of Beijing University of Technology, 2024, 50(5): 543-557. DOI: 10.11936/bjutxb2022060002
    [3]WANG Dan, WANG Meng, WANG Xiaoxi, YANG Ping. Ensemble of Incremental Learning Algorithm for Flight Delay Prediction[J]. Journal of Beijing University of Technology, 2020, 46(11): 1239-1245. DOI: 10.11936/bjutxb2019030009
    [4]ZHANG Yanhua, YANG Le, LI Meng, WU Wenjun, YANG Ruizhe, SI Pengbo. Optimization of Resource Allocation for Industrial Internet Based on Q-learning[J]. Journal of Beijing University of Technology, 2020, 46(11): 1213-1221. DOI: 10.11936/bjutxb2019070011
    [5]YU Jianjun, ZHENG Yijia, RUAN Xiaogang, ZHAO Shaoqiong. Parameter Optimization of Trajectory Imitation Learning Characterization Based on Gaussian Mixture Model[J]. Journal of Beijing University of Technology, 2017, 43(5): 719-728. DOI: 10.11936/bjutxb2016060071
    [6]YIN Bao-cai, WANG Wen-tong, WANG Li-chun. Review of Deep Learning[J]. Journal of Beijing University of Technology, 2015, 41(1): 48-59. DOI: 10.11936/bjutxb2014100026
    [7]LIU Bao, LIU Qun-feng, WANG Jun-hong, LI Liang-chuan, WANG Lei. Nonlinear-Dynamic-Incremental Internal Model Control Algorithm and Its Application[J]. Journal of Beijing University of Technology, 2014, 40(7): 1001-1005. DOI: 10.3969/j.issn.0254-0037.2014.07.008
    [8]WU Jing, LIU Yan-heng, MENG Fan-xue. Algorithm of Multi-category SVM Incremental Learning in Application of Intrusion Detection[J]. Journal of Beijing University of Technology, 2009, 35(12): 1697-1702. DOI: 10.3969/j.issn.0254-0037.2009.12.021
    [9]FU Yan-yan, JIANG Dai-mei, ZHOU Xiao-bing. Multidimensional Data Model the Building of Incremental Data Warehouse[J]. Journal of Beijing University of Technology, 2005, 31(4): 399-404. DOI: 10.3969/j.issn.0254-0037.2005.04.014
    [10]Zeng Yan-jun. Incremental Bulk Moduli of Elasticity of the Lung[J]. Journal of Beijing University of Technology, 1985, 11(2): 53-57.
  • Cited by

    Periodical cited type(4)

    1. 李莉,梁正霖. 纪检监察大语言模型:应用场景、算法逻辑及治理挑战. 成都理工大学学报(社会科学版). 2025(03): 1-11 .
    2. 陈亚当,杨刚,王铎霖,余文斌. 基于提示学习增强BERT的理解能力. 信息技术. 2024(06): 87-93 .
    3. 肖建平,朱永利,张翼,潘新朋. 基于增量学习的变压器局部放电模式识别. 电机与控制学报. 2023(02): 9-16 .
    4. 姚光乐,祝钧桃,周文龙,张贵宇,张伟,张谦. 基于特征分布学习的小样本类增量学习. 计算机工程与应用. 2023(14): 151-157 .

    Other cited types(22)

Catalog

    Article views (361) PDF downloads (98) Cited by(26)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return