TranslateGemma 4B IT β€” Android / Google AI Edge Bundles

On-device translation model for Android using Google AI Edge. Converts google/translategemma-4b-it (55 languages, 4B params) into formats that run locally on Android without internet or cloud APIs.

Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible .litertlm bundles (LiteRT-LM format) with embedded chat template.


Files

File Size Notes
artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm ~2 GB INT4 blockwise quant β€” faster, lower RAM
artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm ~4 GB Dynamic INT8 β€” better quality

Start with INT4 if you're unsure β€” it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.


Quick Start β€” Google AI Edge Gallery (Android)

  1. Download a .litertlm file above
  2. Open Google AI Edge Gallery
  3. Import the model β†’ select your .litertlm file
  4. Use AI Chat mode

Input format

The embedded template supports structured input for any language pair:

<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>

Examples:

<src>he</src><dst>en</dst><text>Χ©ΧœΧ•Χ Χ’Χ•ΧœΧ</text>
<src>en</src><dst>he</dst><text>good morning</text>
<src>en</src><dst>fr</dst><text>hello world</text>
<src>ja</src><dst>en</dst><text>γ‚γ‚ŠγŒγ¨γ†γ”γ–γ„γΎγ™</text>

Use standard ISO 639-1 language codes: en, he, fr, es, de, ar, zh, ja, ko, ru, pt, etc.

Plain text (no tags) is also accepted β€” the model will attempt translation based on context.


Device Requirements

Spec Minimum
RAM 6 GB free (INT4) / 8 GB free (dynamic_int8)
Storage 2 GB (INT4) / 4 GB (dynamic_int8)
OS Android 10+
Runtime Google AI Edge Gallery or LiteRT-LM SDK

What's Different From Google's Official Files

Google's official TranslateGemma TFLite files target WebGPU only β€” they don't work with MediaPipe LLM inference on Android CPU.

This repo's files use native conversion via litert-torch with a custom build_translategemma_4b() builder that:

  • Produces proper prefill + decode signatures with KV cache (required by LiteRT-LM)
  • Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
  • Fixes qkv_fused_interleaved=False (critical β€” wrong default caused garbage output in all early builds)
  • Handles the language_model. weight prefix in TranslateGemma's multimodal safetensors
  • Embeds a generic Jinja chat template for any language pair via <src>/<dst>/<text> tags

Conversion Scripts

The scripts/ folder contains the full conversion pipeline:

Script Purpose
scripts/convert_translategemma_android.py Single-quant conversion via litert-torch native strategy
scripts/bundle_litertlm.py Bundle a TFLite + SentencePiece tokenizer into .litertlm with embedded Jinja template
scripts/multi_quant_build_upload.py Batch conversion + HuggingFace upload

Reproduce a build

Requirements: ~128 GB RAM, Python 3.12, litert-torch==0.8.0

# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm

pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub

# Download model
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it

# Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
python scripts/convert_translategemma_android.py \
  --model-dir ./translategemma-4b-it \
  --tflite-dir ./tflite_output/dynamic_int8 \
  --output-dir ./output \
  --task-file ./output/translategemma-4b-it-dynamic_int8.task \
  --quantize dynamic_int8 \
  --prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token

# Bundle as .litertlm
python scripts/bundle_litertlm.py \
  --tflite ./tflite_output/dynamic_int8/*.tflite \
  --tokenizer ./translategemma-4b-it/tokenizer.model \
  --output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
  --quant dynamic_int8

Supported Languages

TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See google/translategemma-4b-it for the full list.


License

Model weights: Google Gemma Terms of Use
Conversion scripts: Apache 2.0

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for barakplasma/translategemma-4b-it-android-task-quantized

Finetuned
(16)
this model