// slm360-nano
SLM360 Nano
ReleasedPrivacy-First Encoder for On-Device Understanding
SLM360 Nano is a 6.4M-parameter bidirectional encoder optimized for classification and understanding tasks. Part of the SLM360 system, it serves as the core NLU engine in a 4-tier hybrid pipeline. Built entirely in pure Rust with zero external ML dependencies, it achieves sub-5ms latency with INT4 quantization while maintaining 100% data sovereignty.
Specifications
6.4M
Parameters
256
Embedding Dim
6
Layers
8 / 4 (GQA)
Attention Heads (Q/KV)
8,192
Vocabulary
512
Max Sequence Length
26MB
Size (f32)
4MB
Size (INT4)
<5ms
Latency (INT4)
<10ms
Latency (f32)
6.5x
Compression Ratio
>0.99
Cosine Similarity (INT4)
Architecture
1 Token IDs > Embedding (8,192 x 256)2 + RoPE Position Encoding3 6 x EncoderBlock: RMSNorm > GQA (8 heads, 4 KV) > + Residual4 6 x EncoderBlock: RMSNorm > SwiGLU (256 > 682 > 256) > + Residual5 RMSNorm > Mean Pool > Linear Classifier (256 > num_classes)
Features
- Grouped Query Attention (8 query heads, 4 KV heads) for 2x KV cache reduction
- SwiGLU activation following LLaMA/Mistral design for better gradient flow
- RoPE positional encoding for generalization to unseen sequence lengths
- RMSNorm over LayerNorm for 15-20% faster normalization
- Bidirectional attention for full-context understanding
- SIMD-accelerated inference with ARM NEON and x86 AVX2 dispatch
- INT4 group-wise quantization (32-element groups) with per-tile dequantization
- On-device continual learning with EWC + replay buffers + validation guards
- Cross-platform: native (ARM/x86), WebAssembly, Android (JNI), iOS (FFI)
- 100% deterministic output with seeded PRNGs across all platforms
Benchmarks
| Dataset | Score | Comparison |
|---|---|---|
| SNIPS (7 classes) | 96.2% | BERT-base: 98.0% |
| ATIS (26 classes) | 94.8% | BERT-base: 96.5% |
| Banking77 (77 classes) | 88.3% | BERT-base: 93.1% |
| CLINC150 (150 classes) | 85.1% | BERT-base: 91.4% |
| Internal 21-class (Hybrid) | 94.1% | Rasa DIET: 91.3% |
Deployment Targets
- >Native (ARM/x86) via cargo build, ~1MB NLU binary
- >WebAssembly via wasm-pack, ~300KB gzipped
- >Android via JNI bindings
- >iOS via FFI bindings
- >Minimal mode (~50KB) for pattern-only MCU deployment