The Hottest Path in Our Stack
JWT verification runs on every single authenticated request. Every one. If you are building a multi-tenant SaaS platform that handles millions of auth requests per day, the middleware chain sitting between "HTTP request arrives" and "route handler executes" is, by definition, your hottest code path.
We run Auth1, a multi-tenant authentication platform. At our scale, we noticed that authentication middleware was consuming a disproportionate share of CPU time on our API servers. Not database queries. Not business logic. The middleware chain that runs before any of that even starts.
What the Middleware Actually Does
A typical Express or Fastify auth middleware is not just one function. It is a chain of 4-6 discrete operations that execute sequentially on every request:
- Extract the Bearer token from the
Authorizationheader - Verify the JWT signature (HMAC-SHA256) and check expiration
- Enforce token type — reject refresh tokens used as access tokens (a common attack vector in token confusion attacks)
- Resolve tenant context — in a multi-tenant system, determine which tenant this request belongs to from JWT claims, the
X-Tenant-IDheader, or the subdomain - Rate limit check — per-user, per-tenant token bucket
- Generate a request ID (UUID v4) for tracing and logging
In a standard Node.js implementation, each of these is a separate middleware function.
The request object gets passed through the chain, each function reads from it, writes to it,
and calls next(). This is clean, composable architecture. It is also slow.
JWT verification alone using the popular jsonwebtoken npm package takes roughly
180 microseconds per call. Add crypto.randomUUID() for the request ID, a Map
lookup for rate limiting, string parsing for tenant resolution, and the overhead of multiple
JS-to-native boundary crossings for the crypto operations, and you are looking at around
184 microseconds per request.
At 5,000 requests per second, that is 920 milliseconds of CPU time per second spent purely on authentication overhead. At 50,000 requests per second, it becomes the bottleneck.
Why Rust + napi-rs (and Not a Full Rewrite)
We considered several approaches. Rewriting the entire API server in Rust was the nuclear option — high risk, months of work, and most of our code is I/O-bound business logic that does not benefit from Rust's performance characteristics. Writing a separate Rust microservice for auth verification would add network latency and operational complexity.
The insight was simpler: we did not need to replace Express. We needed to replace the one function that runs on every request.
napi-rs is a framework for building compiled Node.js native
addons in Rust. You write Rust functions annotated with #[napi], and the build
tool generates a .node binary that you can require() from
JavaScript like any other module.
This gave us the architecture we wanted:
- Keep Express/Fastify for routing, error handling, and business logic
- Keep all I/O-bound code in JavaScript (database queries, HTTP calls, template rendering)
- Replace only the CPU-bound hot path with a single native Rust call
The compiled binary is about 3.1 MB. No runtime dependencies. No Rust toolchain required on the deployment target.
The Implementation
The core idea is operation fusion. Instead of five separate middleware functions that each cross the JS-to-native boundary for crypto operations, we do everything in a single native call. One boundary crossing. One function. All six operations.
Here is the Rust function signature:
#[napi] pub fn authenticate_request( auth_header: Option<String>, tenant_header: Option<String>, host: Option<String>, config: MiddlewareConfig, ) -> AuthResult { let request_id = Uuid::new_v4().to_string(); // 1. Extract Bearer token from Authorization header let token = match &auth_header { Some(header) => { if let Some(token) = header.strip_prefix("Bearer ") { token.to_string() } else { return AuthResult { authenticated: false, error: Some("Invalid Authorization header format".into()), error_status: Some(401), request_id, ..Default::default() }; } } None => { return AuthResult { authenticated: false, error: Some("No Authorization header provided".into()), .. }; } }; // 2. Verify JWT and enforce token type (access vs refresh) let payload = match verify_access_token(token, jwt_config) { Ok(p) => p, Err(e) => { return AuthResult { authenticated: false, .. }; } }; // 3. Resolve tenant ID (JWT claim > header > subdomain) let resolved_tenant = payload.tenant_id.clone() .or_else(|| tenant_header.clone()) .or_else(|| /* subdomain extraction */); // 4. Validate tenant match if enforced // 5. Rate limiting (per user + tenant, DashMap-backed token bucket) // 6. Return AuthResult with all fields populated AuthResult { authenticated: true, user_id: Some(payload.sub), email: payload.email, role: payload.role, tenant_id: resolved_tenant, request_id, rate_limit_remaining: Some(rate_result.1), .. } }
The AuthResult struct is annotated with #[napi(object)], which means
napi-rs automatically converts it to a plain JavaScript object when it crosses the boundary.
No manual serialization. No JSON.parse. The JavaScript side receives a normal
object with all the fields populated.
The rate limiter uses a global DashMap<String, TokenBucket> — a lock-free
concurrent hashmap from the dashmap crate. Each key (formatted as
userId:tenantId) gets its own token bucket that refills at a configurable rate.
The Express Middleware Wrapper
On the JavaScript side, the integration is minimal:
const { authenticateRequest } = require('auth-shield'); const config = { accessSecret: process.env.JWT_ACCESS_SECRET, refreshSecret: process.env.JWT_REFRESH_SECRET, rateLimitMax: 100, rateLimitRefill: 10.0, enforceTenantMatch: true, issuer: 'auth1', }; function rustAuth(req, res, next) { const result = authenticateRequest( req.headers.authorization || null, req.headers['x-tenant-id'] || null, req.headers.host || null, config ); req.id = result.requestId; res.setHeader('X-Request-Id', result.requestId); if (!result.authenticated) { return res.status(result.errorStatus || 401).json({ error: result.error }); } req.user = { userId: result.userId, email: result.email, role: result.role, }; req.appId = result.tenantId; next(); } app.use('/api', rustAuth);
That is the entire integration. One middleware function that calls one native function.
Everything that used to be five separate app.use() calls is now a single
synchronous call that returns a result object.
The Benchmark
We benchmarked both implementations with 100,000 iterations on the same machine, same
JWT token, same configuration. The Rust middleware performs all six operations. The JavaScript
baseline performs JWT verification via the jsonwebtoken package plus
crypto.randomUUID().
Benchmarking 100,000 iterations... Rust fused middleware: Total: 435.6ms Per-op: 4.36us Ops/sec: 229,574 JS middleware chain (jwt.verify + crypto.randomUUID): Total: 18,418.6ms Per-op: 184.19us Ops/sec: 5,429 Speedup: 42.28x faster with Rust
The Rust implementation does more work (rate limiting, tenant resolution, token type enforcement) and is still 42x faster than the JavaScript version that does less work.
Why Is It 42x Faster?
The speedup is not from any single optimization. It compounds from five factors:
1. No garbage collector pauses. Every JavaScript middleware invocation allocates objects — the decoded JWT payload, the rate limit result, intermediate strings. These accumulate and trigger GC pauses. Rust allocates on the stack where possible, and heap allocations are freed deterministically when they go out of scope. Zero GC pressure from the auth path.
2. Fused operations eliminate boundary crossings. In the JavaScript version,
jwt.verify() calls into OpenSSL via Node's crypto bindings (JS to native and back),
then crypto.randomUUID() does another round trip. Each crossing has overhead.
Our Rust middleware does everything in a single native call. One crossing instead of four or five.
3. DashMap vs JavaScript Map. The rate limiter uses DashMap, a
lock-free concurrent hashmap that uses fine-grained sharding. It does not need the event loop.
It does not need async. JavaScript's Map is single-threaded and relies on the
event loop for any form of concurrency control.
4. UUID generation. Rust's uuid crate generates v4 UUIDs roughly
5-10x faster than Node's crypto.randomUUID(). Both use the OS CSPRNG, but the
Rust version has less overhead in formatting the output string.
5. JWT verification. The jsonwebtoken Rust crate uses ring
for HMAC-SHA256 computation. The npm package uses Node's built-in crypto bindings. Both ultimately
call optimized C code, but the Rust version avoids the V8 binding overhead and the JavaScript
object allocation for the decoded claims.
What We Kept in JavaScript
Being precise about what deserves to be in Rust is just as important as knowing what to move there:
- Route handlers and business logic. These are I/O-bound. Making them "faster" with Rust would save microseconds on code that spends milliseconds waiting for I/O.
- Database queries. PostgreSQL query time dominates. The overhead of building a SQL string in JavaScript vs Rust is noise.
- Middleware ordering and composition. Express's
app.use()pattern is well-understood and easy to reason about. - Error formatting and HTTP response building. These run once per request at most. Not a hot path.
Profile first, then replace only the code that is both CPU-bound and runs on every request. If it is I/O-bound, Rust will not help. If it runs once per deploy, the optimization does not matter.
Beyond JWT: What Else We Moved to Rust
Once the napi-rs infrastructure was in place, we identified other CPU-bound security operations
that benefited from native performance. The auth-shield library now includes:
- Argon2id password hashing — replaces
bcryptjs. Uses theargon2crate with recommended parameters (m=19456 KiB, t=2, p=1). - Timing-safe comparison — a
timing_safe_equal()function using thesubtlecrate'sConstantTimeEq. Prevents timing side-channel attacks. - OTP verification with attempt tracking — generates cryptographically random numeric codes and verifies them with constant-time comparison.
- Input sanitization — HTML sanitization via the
ammoniacrate, SQL interval validation, filename sanitization. - CSP header builder — a fluent API for constructing Content-Security-Policy headers with strict defaults.
- API key generation and validation — generates prefixed API keys (
auth1_pk_...) with HMAC-SHA256 hashes for storage.
The Results in Production
After deploying auth-shield behind a feature flag (USE_RUST_AUTH=true), we observed:
- Auth middleware latency dropped from ~184 microseconds to ~4.4 microseconds per request. Consistent with the benchmark results.
- CPU usage on auth servers dropped significantly during peak traffic periods.
- Zero behavioral changes. Every error message, every HTTP status code, every header — identical. The Rust middleware is a drop-in replacement.
- The binary is 3.1 MB with LTO enabled, symbols stripped, and
opt-level = 3. No runtime dependencies beyond Node.js itself.
We ran both implementations in parallel for two weeks, comparing outputs on every request, before switching fully. Flipping one environment variable switches to the Rust path. If anything goes wrong, flip it back. No deployment needed.
How to Do This Yourself
Here is the step-by-step process to replace a hot path in your Node.js application with Rust via napi-rs.
Step 1: Set Up the Project
mkdir my-native-module && cd my-native-module npm init -y cargo init --lib
Step 2: Configure Cargo.toml
[package] name = "my-native-module" version = "1.0.0" edition = "2021" [lib] crate-type = ["cdylib"] [dependencies] napi = { version = "2", default-features = false, features = ["napi4"] } napi-derive = "2" [build-dependencies] napi-build = "2" [profile.release] lto = true opt-level = 3 codegen-units = 1 strip = "symbols"
Step 3: Write Your Hot-Path Function
use napi_derive::napi; #[napi(object)] pub struct MyResult { pub success: bool, pub value: String, } #[napi] pub fn my_hot_function(input: String) -> MyResult { // Your CPU-bound logic here MyResult { success: true, value: input.to_uppercase(), } }
Step 4: Build and Use
npm install -D @napi-rs/cli npx napi build --release --platform
const { myHotFunction } = require('./my-native-module'); const result = myHotFunction('hello'); console.log(result); // { success: true, value: 'HELLO' }
Common Pitfalls
- Do not pass complex nested objects across the boundary. Flat structs with primitive fields are fast. Deeply nested objects with arrays of objects require serialization and will eat your performance gains.
- Do not make the Rust function async unless you need to. Synchronous napi functions run on the main thread with no overhead. Async functions spawn a thread pool task, which adds scheduling latency.
- LTO matters. Link-Time Optimization (
lto = true) gave us roughly 15-20% improvement. Worth the longer compile time for release builds. - Test the binary on your deployment platform. A
.nodebinary compiled on macOS will not work on Linux. Use napi-rs's cross-compilation support.
Closing Thoughts
The best performance optimization is not rewriting your entire application. It is finding the one function that runs on every single request and making that function as fast as possible.
For us, that function was authentication. Six discrete operations — token extraction, JWT verification, token type enforcement, tenant resolution, rate limiting, and request ID generation — fused into a single native call that completes in 4.36 microseconds.
The tooling has matured to the point where this is a weekend project, not a quarter-long initiative.
napi-rs handles the V8 bindings. Cargo handles the build. The result is a single .node
file you can require() like any other module.
auth-shield is open source: github.com/auth1/auth-shield