MMNMT: Modularizing Multilingual Neural Machine Translation with Flexibly Assembled MoE and Dense Blocks, EMNLP, 2023. (顶级国际会议)
Abstract: This paper proposes a modularized framework for large-scale multilingual neural machine translation (MNMT) that can flexibly combine dense and Mixture-of-Experts (MoE)-based sparse modules to achieve the best of both worlds. The training strategy of the framework consists of three stages: (1) pre-training basic MNMT models with various training objectives or model architectures; (2) initializing the framework's modules with pre-trained counterparts, such as encoders, decoders, and embedding layers; (3) fine-tuning the modularized MNMT framework to integrate modules from different models. The researchers pre-trained three basic MNMT models from scratch: a dense model, an MoE-based sparse model, and a novel MoE model termed MoE-LGR, which explores multiple Language-Group-specific Routers to incorporate language group knowledge into MNMT. The modularized MNMT framework aims to integrate the strengths of these pre-trained models into a single model through proper initialization and fine-tuning.
摘要:本文提出了一种模块化的大规模多语种神经机器翻译(MNMT)框架,该框架能够灵活地组合密集和基于Mixture-of-Experts(MoE)的稀疏模块,以实现两者的优势结合。该框架的训练策略包括三个阶段:(1) 使用不同的训练目标或模型结构预训练基础MNMT模型;(2) 使用预训练的对应模块(例如,编码器、解码器和嵌入层)初始化框架的模块;(3) 对模块化的MNMT框架进行微调,以适应不同模型的模块。研究者们从零开始预训练了三种基础MNMT模型:一个密集模型,一个基于MoE的稀疏模型,以及一个新提出的MoE模型,称为MoE-LGR,它探索了多个语言组特定的路由器,以将语言组知识整合到MNMT中。模块化的MNMT框架试图通过合理的初始化和微调将这些优势整合到一个单一模型中。