--- license: mit datasets: - HuggingFaceFW/fineweb-edu - open-web-math/open-web-math language: - en pipeline_tag: text-generation --- # Model Card for Model ID ## Model Details ### Model Description - **Developed by:** xTimeCrystal - **Model type:** RWKV 7 **(NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)** - **Language(s) (NLP):** English - **License:** MIT ## Uses ### Direct Use Fast autocomplete model. ### Out-of-Scope Use Don't use it for anything serious, it lacks any form of intelligence. ## Bias, Risks, and Limitations Limited to ~couple exaFLOPs of compute, don't expect anything coherent beyond a couple sentences. ### Recommendations ## How to Get Started with the Model Use the code below to get started with the model. [More Information Needed] ## Training Details ### Training Data 50B Bytes of custom FineWeb Edu & Open Web Math mixture. #### Training Hyperparameters - **Training regime:** bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3. #### Speeds, Sizes, Times Throughput = 350 characters/second using unoptimized inference code. Prompt processing is basically instantaneous, so generation is likely bottlenecked by bandwidth and overhead. ## Evaluation ### Results Bits-per-byte: ~1 HellaSwag Accuracy: 33.4% (removed Wikihow entries) #### Summary ## Technical Specifications ### Model Architecture and Objective Modded RWKV 7 (see top) ### Compute Infrastructure 1 x RTX 4080 for 1 week