Papers
arxiv:2512.10772

Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation

Published on Dec 11
Authors:
,
,

Abstract

Scaling pretrained models improves data efficiency and reduces catastrophic forgetting when adapting to new languages, and upscaled merges can enhance modular multilingual systems.

AI-generated summary

Achieving high-performing language models which include medium- and lower-resource languages remains a challenge. Massively multilingual models still underperform compared to language-specific adaptations, especially at smaller model scales. In this work, we investigate scaling as an efficient strategy for adapting pretrained models to new target languages. Through comprehensive scaling ablations with approximately FLOP-matched models, we test whether upscaling an English base model enables more effective and resource-efficient adaptation than standard continued pretraining. We find that, once exposed to sufficient target-language data, larger upscaled models can match or surpass the performance of smaller models continually pretrained on much more data, demonstrating the benefits of scaling for data efficiency. Scaling also helps preserve the base model's capabilities in English, thus reducing catastrophic forgetting. Finally, we explore whether such scaled, language-specific models can be merged to construct modular and flexible multilingual systems. We find that while merging remains less effective than joint multilingual training, upscaled merges perform better than smaller ones. We observe large performance differences across merging methods, suggesting potential for improvement through merging approaches specialized for language-level integration.

Community

Sign up or log in to comment

Models citing this paper 35

Browse 35 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.10772 in a Space README.md to link it from this page.

Collections including this paper 1