Are all of the differences in config.json between scout and maverick intended?

#42
by sbranco - opened

Specifically, scout has rope_scaling configured, use_qk_norm enabled, 10 times higher max_position_embeddings, only one interleaved_moe_layer_step and an empty array for no_rope_layers
image.png

At least the max_position_embeddings difference seems a bit suspicious, no?

sbranco changed discussion title from Are the differences in config.json between scout and maverick intended? to Are all of the differences in config.json between scout and maverick intended?

read the docs

Ups of course, for some reason in my mind I always thing maverick as the larger model has the longer context, even though that doesn't make much sense. The difference in rope scaling settings are due to the same thing then as well? I know people asking basic questions can be frustrating, but I'm just trying to eliminate as many things as possible because I can't get the model to run anywhere close to how it should be performing with default settings.

sbranco changed discussion status to closed

there's no problem at all about making basic questions. seems HF is the home of "taking as granted," i just wonder then why the platform says it's committed with education and democratization of AI, ML and NLP, if majority of answers, when any, are like that.

Sign up or log in to comment