[360Labs.ai]
0%
[360Labs.ai]
0%
// slm360-nano

SLM360 Nano

Released

Privacy-First Encoder for On-Device Understanding

SLM360 Nano is a 6.4M-parameter bidirectional encoder optimized for classification and understanding tasks. Part of the SLM360 system, it serves as the core NLU engine in a 4-tier hybrid pipeline. Built entirely in pure Rust with zero external ML dependencies, it achieves sub-5ms latency with INT4 quantization while maintaining 100% data sovereignty.

Specifications

6.4M

Parameters

256

Embedding Dim

6

Layers

8 / 4 (GQA)

Attention Heads (Q/KV)

8,192

Vocabulary

512

Max Sequence Length

26MB

Size (f32)

4MB

Size (INT4)

<5ms

Latency (INT4)

<10ms

Latency (f32)

6.5x

Compression Ratio

>0.99

Cosine Similarity (INT4)

Architecture

1 Token IDs > Embedding (8,192 x 256)
2 + RoPE Position Encoding
3 6 x EncoderBlock: RMSNorm > GQA (8 heads, 4 KV) > + Residual
4 6 x EncoderBlock: RMSNorm > SwiGLU (256 > 682 > 256) > + Residual
5 RMSNorm > Mean Pool > Linear Classifier (256 > num_classes)

Features

  • Grouped Query Attention (8 query heads, 4 KV heads) for 2x KV cache reduction
  • SwiGLU activation following LLaMA/Mistral design for better gradient flow
  • RoPE positional encoding for generalization to unseen sequence lengths
  • RMSNorm over LayerNorm for 15-20% faster normalization
  • Bidirectional attention for full-context understanding
  • SIMD-accelerated inference with ARM NEON and x86 AVX2 dispatch
  • INT4 group-wise quantization (32-element groups) with per-tile dequantization
  • On-device continual learning with EWC + replay buffers + validation guards
  • Cross-platform: native (ARM/x86), WebAssembly, Android (JNI), iOS (FFI)
  • 100% deterministic output with seeded PRNGs across all platforms

Benchmarks

DatasetScoreComparison
SNIPS (7 classes)96.2%BERT-base: 98.0%
ATIS (26 classes)94.8%BERT-base: 96.5%
Banking77 (77 classes)88.3%BERT-base: 93.1%
CLINC150 (150 classes)85.1%BERT-base: 91.4%
Internal 21-class (Hybrid)94.1%Rasa DIET: 91.3%

Deployment Targets

  • >Native (ARM/x86) via cargo build, ~1MB NLU binary
  • >WebAssembly via wasm-pack, ~300KB gzipped
  • >Android via JNI bindings
  • >iOS via FFI bindings
  • >Minimal mode (~50KB) for pattern-only MCU deployment