SLM360 Nano

Privacy-First Encoder for On-Device Understanding

// overview

SLM360 Nano is a 6.4M-parameter bidirectional encoder optimized for classification and understanding tasks. Part of the SLM360 system, it serves as the core NLU engine in a 4-tier hybrid pipeline. Built entirely in pure Rust with zero external ML dependencies, it achieves sub-5ms latency with INT4 quantization while maintaining 100% data sovereignty.

// specs

Specifications

6.4M

Parameters

256

Embedding Dim

Layers

8 / 4 (GQA)

Attention Heads (Q/KV)

8,192

Vocabulary

512

Max Sequence Length

26MB

Size (f32)

4MB

Size (INT4)

<5ms

Latency (INT4)

<10ms

Latency (f32)

6.5x

Compression Ratio

>0.99

Cosine Similarity (INT4)

// architecture

Architecture

 1  Token IDs > Embedding (8,192 x 256)
 2  + RoPE Position Encoding
 3  6 x EncoderBlock: RMSNorm > GQA (8 heads, 4 KV) > + Residual
 4  6 x EncoderBlock: RMSNorm > SwiGLU (256 > 682 > 256) > + Residual
 5  RMSNorm > Mean Pool > Linear Classifier (256 > num_classes)

// features

Features

01Grouped Query Attention (8 query heads, 4 KV heads) for 2x KV cache reduction
02SwiGLU activation following LLaMA/Mistral design for better gradient flow
03RoPE positional encoding for generalization to unseen sequence lengths
04RMSNorm over LayerNorm for 15-20% faster normalization
05Bidirectional attention for full-context understanding
06SIMD-accelerated inference with ARM NEON and x86 AVX2 dispatch
07INT4 group-wise quantization (32-element groups) with per-tile dequantization
08On-device continual learning with EWC + replay buffers + validation guards
09Cross-platform: native (ARM/x86), WebAssembly, Android (JNI), iOS (FFI)
10100% deterministic output with seeded PRNGs across all platforms

// benchmarks

Benchmarks

Dataset	Score	Comparison
SNIPS (7 classes)	96.2%	BERT-base: 98.0%
ATIS (26 classes)	94.8%	BERT-base: 96.5%
Banking77 (77 classes)	88.3%	BERT-base: 93.1%
CLINC150 (150 classes)	85.1%	BERT-base: 91.4%
Internal 21-class (Hybrid)	94.1%	Rasa DIET: 91.3%

// deployment

Deployment

01Native (ARM/x86) via cargo build, ~1MB NLU binary
02WebAssembly via wasm-pack, ~300KB gzipped
03Android via JNI bindings
04iOS via FFI bindings
05Minimal mode (~50KB) for pattern-only MCU deployment

// end of modelSLM360 Nano