{ slm360-nano }· Released
SLM360 Nano
Privacy-First Encoder for On-Device Understanding
// overview
SLM360 Nano is a 6.4M-parameter bidirectional encoder optimized for classification and understanding tasks. Part of the SLM360 system, it serves as the core NLU engine in a 4-tier hybrid pipeline. Built entirely in pure Rust with zero external ML dependencies, it achieves sub-5ms latency with INT4 quantization while maintaining 100% data sovereignty.
// specs
Specifications
6.4M
Parameters
256
Embedding Dim
6
Layers
8 / 4 (GQA)
Attention Heads (Q/KV)
8,192
Vocabulary
512
Max Sequence Length
26MB
Size (f32)
4MB
Size (INT4)
<5ms
Latency (INT4)
<10ms
Latency (f32)
6.5x
Compression Ratio
>0.99
Cosine Similarity (INT4)
// architecture
Architecture
1 Token IDs > Embedding (8,192 x 256)2 + RoPE Position Encoding3 6 x EncoderBlock: RMSNorm > GQA (8 heads, 4 KV) > + Residual4 6 x EncoderBlock: RMSNorm > SwiGLU (256 > 682 > 256) > + Residual5 RMSNorm > Mean Pool > Linear Classifier (256 > num_classes)
// features
Features
- 01Grouped Query Attention (8 query heads, 4 KV heads) for 2x KV cache reduction
- 02SwiGLU activation following LLaMA/Mistral design for better gradient flow
- 03RoPE positional encoding for generalization to unseen sequence lengths
- 04RMSNorm over LayerNorm for 15-20% faster normalization
- 05Bidirectional attention for full-context understanding
- 06SIMD-accelerated inference with ARM NEON and x86 AVX2 dispatch
- 07INT4 group-wise quantization (32-element groups) with per-tile dequantization
- 08On-device continual learning with EWC + replay buffers + validation guards
- 09Cross-platform: native (ARM/x86), WebAssembly, Android (JNI), iOS (FFI)
- 10100% deterministic output with seeded PRNGs across all platforms
// benchmarks
Benchmarks
| Dataset | Score | Comparison |
|---|---|---|
| SNIPS (7 classes) | 96.2% | BERT-base: 98.0% |
| ATIS (26 classes) | 94.8% | BERT-base: 96.5% |
| Banking77 (77 classes) | 88.3% | BERT-base: 93.1% |
| CLINC150 (150 classes) | 85.1% | BERT-base: 91.4% |
| Internal 21-class (Hybrid) | 94.1% | Rasa DIET: 91.3% |
// deployment
Deployment
- 01Native (ARM/x86) via cargo build, ~1MB NLU binary
- 02WebAssembly via wasm-pack, ~300KB gzipped
- 03Android via JNI bindings
- 04iOS via FFI bindings
- 05Minimal mode (~50KB) for pattern-only MCU deployment
// end of modelSLM360 Nano