TranslateGemma 4B IT — Android / Google AI Edge Bundles

On-device translation model for Android using Google AI Edge. Converts google/translategemma-4b-it (55 languages, 4B params) into formats that run locally on Android without internet or cloud APIs.

Google only publishes WebGPU-only TFLite files. This repo bridges that gap with CPU/XNNPACK-compatible .litertlm bundles (LiteRT-LM format) with embedded chat template.

Files

File	Size	Notes
`artifacts/int4-generic/translategemma-4b-it-int4-generic.litertlm`	~2 GB	INT4 blockwise quant — faster, lower RAM
`artifacts/dynamic_int8-generic/translategemma-4b-it-dynamic_int8-generic.litertlm`	~4 GB	Dynamic INT8 — better quality

Start with INT4 if you're unsure — it loads faster and uses less RAM. Use dynamic_int8 for better translation quality.

Quick Start — Google AI Edge Gallery (Android)

Download a .litertlm file above
Open Google AI Edge Gallery
Import the model → select your .litertlm file
Use AI Chat mode

Input format

The embedded template supports structured input for any language pair:

<src>LANG</src><dst>LANG</dst><text>YOUR TEXT HERE</text>

Examples:

<src>he</src><dst>en</dst><text>שלום עולם</text>

<src>en</src><dst>he</dst><text>good morning</text>

<src>en</src><dst>fr</dst><text>hello world</text>

<src>ja</src><dst>en</dst><text>ありがとうございます</text>

Use standard ISO 639-1 language codes: en, he, fr, es, de, ar, zh, ja, ko, ru, pt, etc.

Plain text (no tags) is also accepted — the model will attempt translation based on context.

Device Requirements

Spec	Minimum
RAM	6 GB free (INT4) / 8 GB free (dynamic_int8)
Storage	2 GB (INT4) / 4 GB (dynamic_int8)
OS	Android 10+
Runtime	Google AI Edge Gallery or LiteRT-LM SDK

What's Different From Google's Official Files

Google's official TranslateGemma TFLite files target WebGPU only — they don't work with MediaPipe LLM inference on Android CPU.

This repo's files use native conversion via litert-torch with a custom build_translategemma_4b() builder that:

Produces proper prefill + decode signatures with KV cache (required by LiteRT-LM)
Uses the correct architecture: 34 layers, 2560 dim, 8 heads, 4 KV heads, sliding-window + global every 6th layer
Fixes qkv_fused_interleaved=False (critical — wrong default caused garbage output in all early builds)
Handles the language_model. weight prefix in TranslateGemma's multimodal safetensors
Embeds a generic Jinja chat template for any language pair via <src>/<dst>/<text> tags

Conversion Scripts

The scripts/ folder contains the full conversion pipeline:

Script	Purpose
`scripts/convert_translategemma_android.py`	Single-quant conversion via litert-torch native strategy
`scripts/bundle_litertlm.py`	Bundle a TFLite + SentencePiece tokenizer into `.litertlm` with embedded Jinja template
`scripts/multi_quant_build_upload.py`	Batch conversion + HuggingFace upload

Reproduce a build

Requirements: ~128 GB RAM, Python 3.12, litert-torch==0.8.0

# Clone LiteRT-LM builder (needed by bundle_litertlm.py)
git clone --depth=1 https://github.com/google-ai-edge/LiteRT-LM /tmp/litert-lm

pip install litert-torch==0.8.0 mediapipe transformers huggingface-hub

# Download model
huggingface-cli download google/translategemma-4b-it --local-dir ./translategemma-4b-it

# Convert to TFLite with KV cache (~30-60 min, needs ~128 GB RAM)
python scripts/convert_translategemma_android.py \
  --model-dir ./translategemma-4b-it \
  --tflite-dir ./tflite_output/dynamic_int8 \
  --output-dir ./output \
  --task-file ./output/translategemma-4b-it-dynamic_int8.task \
  --quantize dynamic_int8 \
  --prefill-seq-len 1024 --kv-cache-max-len 1024 --allow-no-token

# Bundle as .litertlm
python scripts/bundle_litertlm.py \
  --tflite ./tflite_output/dynamic_int8/*.tflite \
  --tokenizer ./translategemma-4b-it/tokenizer.model \
  --output ./output/translategemma-4b-it-dynamic_int8-generic.litertlm \
  --quant dynamic_int8

Supported Languages

TranslateGemma supports 55 languages including Arabic, Chinese, French, German, Hebrew, Hindi, Japanese, Korean, Portuguese, Russian, Spanish, and more. See google/translategemma-4b-it for the full list.

License

Model weights: Google Gemma Terms of Use
Conversion scripts: Apache 2.0

Downloads last month: 2

Model tree for barakplasma/translategemma-4b-it-android-task-quantized

Base model

google/translategemma-4b-it

Finetuned

(16)

this model