How We Built a 42x Faster Auth Middleware in Rust

The Hottest Path in Our Stack

JWT verification runs on every single authenticated request. Every one. If you are building a multi-tenant SaaS platform that handles millions of auth requests per day, the middleware chain sitting between "HTTP request arrives" and "route handler executes" is, by definition, your hottest code path.

We run Auth1, a multi-tenant authentication platform. At our scale, we noticed that authentication middleware was consuming a disproportionate share of CPU time on our API servers. Not database queries. Not business logic. The middleware chain that runs before any of that even starts.

What the Middleware Actually Does

A typical Express or Fastify auth middleware is not just one function. It is a chain of 4-6 discrete operations that execute sequentially on every request:

Extract the Bearer token from the Authorization header
Verify the JWT signature (HMAC-SHA256) and check expiration
Enforce token type — reject refresh tokens used as access tokens (a common attack vector in token confusion attacks)
Resolve tenant context — in a multi-tenant system, determine which tenant this request belongs to from JWT claims, the X-Tenant-ID header, or the subdomain
Rate limit check — per-user, per-tenant token bucket
Generate a request ID (UUID v4) for tracing and logging

In a standard Node.js implementation, each of these is a separate middleware function. The request object gets passed through the chain, each function reads from it, writes to it, and calls next(). This is clean, composable architecture. It is also slow.

JWT verification alone using the popular jsonwebtoken npm package takes roughly 180 microseconds per call. Add crypto.randomUUID() for the request ID, a Map lookup for rate limiting, string parsing for tenant resolution, and the overhead of multiple JS-to-native boundary crossings for the crypto operations, and you are looking at around 184 microseconds per request.

At 5,000 requests per second, that is 920 milliseconds of CPU time per second spent purely on authentication overhead. At 50,000 requests per second, it becomes the bottleneck.

Why Rust + napi-rs (and Not a Full Rewrite)

We considered several approaches. Rewriting the entire API server in Rust was the nuclear option — high risk, months of work, and most of our code is I/O-bound business logic that does not benefit from Rust's performance characteristics. Writing a separate Rust microservice for auth verification would add network latency and operational complexity.

The insight was simpler: we did not need to replace Express. We needed to replace the one function that runs on every request.

napi-rs is a framework for building compiled Node.js native addons in Rust. You write Rust functions annotated with #[napi], and the build tool generates a .node binary that you can require() from JavaScript like any other module.

This gave us the architecture we wanted:

Keep Express/Fastify for routing, error handling, and business logic
Keep all I/O-bound code in JavaScript (database queries, HTTP calls, template rendering)
Replace only the CPU-bound hot path with a single native Rust call

The compiled binary is about 3.1 MB. No runtime dependencies. No Rust toolchain required on the deployment target.

The Implementation

The core idea is operation fusion. Instead of five separate middleware functions that each cross the JS-to-native boundary for crypto operations, we do everything in a single native call. One boundary crossing. One function. All six operations.

Here is the Rust function signature:

Rust src/middleware.rs

#[napi]
pub fn authenticate_request(
    auth_header: Option<String>,
    tenant_header: Option<String>,
    host: Option<String>,
    config: MiddlewareConfig,
) -> AuthResult {
    let request_id = Uuid::new_v4().to_string();

    // 1. Extract Bearer token from Authorization header
    let token = match &auth_header {
        Some(header) => {
            if let Some(token) = header.strip_prefix("Bearer ") {
                token.to_string()
            } else {
                return AuthResult {
                    authenticated: false,
                    error: Some("Invalid Authorization header format".into()),
                    error_status: Some(401),
                    request_id,
                    ..Default::default()
                };
            }
        }
        None => {
            return AuthResult {
                authenticated: false,
                error: Some("No Authorization header provided".into()),
                ..
            };
        }
    };

    // 2. Verify JWT and enforce token type (access vs refresh)
    let payload = match verify_access_token(token, jwt_config) {
        Ok(p) => p,
        Err(e) => { return AuthResult { authenticated: false, .. }; }
    };

    // 3. Resolve tenant ID (JWT claim > header > subdomain)
    let resolved_tenant = payload.tenant_id.clone()
        .or_else(|| tenant_header.clone())
        .or_else(|| /* subdomain extraction */);

    // 4. Validate tenant match if enforced
    // 5. Rate limiting (per user + tenant, DashMap-backed token bucket)
    // 6. Return AuthResult with all fields populated

    AuthResult {
        authenticated: true,
        user_id: Some(payload.sub),
        email: payload.email,
        role: payload.role,
        tenant_id: resolved_tenant,
        request_id,
        rate_limit_remaining: Some(rate_result.1),
        ..
    }
}

The AuthResult struct is annotated with #[napi(object)], which means napi-rs automatically converts it to a plain JavaScript object when it crosses the boundary. No manual serialization. No JSON.parse. The JavaScript side receives a normal object with all the fields populated.

The rate limiter uses a global DashMap<String, TokenBucket> — a lock-free concurrent hashmap from the dashmap crate. Each key (formatted as userId:tenantId) gets its own token bucket that refills at a configurable rate.

The Express Middleware Wrapper

On the JavaScript side, the integration is minimal:

JavaScript middleware.js

const { authenticateRequest } = require('auth-shield');

const config = {
  accessSecret: process.env.JWT_ACCESS_SECRET,
  refreshSecret: process.env.JWT_REFRESH_SECRET,
  rateLimitMax: 100,
  rateLimitRefill: 10.0,
  enforceTenantMatch: true,
  issuer: 'auth1',
};

function rustAuth(req, res, next) {
  const result = authenticateRequest(
    req.headers.authorization || null,
    req.headers['x-tenant-id'] || null,
    req.headers.host || null,
    config
  );

  req.id = result.requestId;
  res.setHeader('X-Request-Id', result.requestId);

  if (!result.authenticated) {
    return res.status(result.errorStatus || 401).json({ error: result.error });
  }

  req.user = {
    userId: result.userId,
    email: result.email,
    role: result.role,
  };
  req.appId = result.tenantId;
  next();
}

app.use('/api', rustAuth);

That is the entire integration. One middleware function that calls one native function. Everything that used to be five separate app.use() calls is now a single synchronous call that returns a result object.

The Benchmark

We benchmarked both implementations with 100,000 iterations on the same machine, same JWT token, same configuration. The Rust middleware performs all six operations. The JavaScript baseline performs JWT verification via the jsonwebtoken package plus crypto.randomUUID().

Text benchmark-results.txt

Benchmarking 100,000 iterations...

Rust fused middleware:
  Total: 435.6ms
  Per-op: 4.36us
  Ops/sec: 229,574

JS middleware chain (jwt.verify + crypto.randomUUID):
  Total: 18,418.6ms
  Per-op: 184.19us
  Ops/sec: 5,429

Speedup: 42.28x faster with Rust

The Rust implementation does more work (rate limiting, tenant resolution, token type enforcement) and is still 42x faster than the JavaScript version that does less work.

Why Is It 42x Faster?

The speedup is not from any single optimization. It compounds from five factors:

1. No garbage collector pauses. Every JavaScript middleware invocation allocates objects — the decoded JWT payload, the rate limit result, intermediate strings. These accumulate and trigger GC pauses. Rust allocates on the stack where possible, and heap allocations are freed deterministically when they go out of scope. Zero GC pressure from the auth path.

2. Fused operations eliminate boundary crossings. In the JavaScript version, jwt.verify() calls into OpenSSL via Node's crypto bindings (JS to native and back), then crypto.randomUUID() does another round trip. Each crossing has overhead. Our Rust middleware does everything in a single native call. One crossing instead of four or five.

3. DashMap vs JavaScript Map. The rate limiter uses DashMap, a lock-free concurrent hashmap that uses fine-grained sharding. It does not need the event loop. It does not need async. JavaScript's Map is single-threaded and relies on the event loop for any form of concurrency control.

4. UUID generation. Rust's uuid crate generates v4 UUIDs roughly 5-10x faster than Node's crypto.randomUUID(). Both use the OS CSPRNG, but the Rust version has less overhead in formatting the output string.

5. JWT verification. The jsonwebtoken Rust crate uses ring for HMAC-SHA256 computation. The npm package uses Node's built-in crypto bindings. Both ultimately call optimized C code, but the Rust version avoids the V8 binding overhead and the JavaScript object allocation for the decoded claims.

What We Kept in JavaScript

Being precise about what deserves to be in Rust is just as important as knowing what to move there:

Route handlers and business logic. These are I/O-bound. Making them "faster" with Rust would save microseconds on code that spends milliseconds waiting for I/O.
Database queries. PostgreSQL query time dominates. The overhead of building a SQL string in JavaScript vs Rust is noise.
Middleware ordering and composition. Express's app.use() pattern is well-understood and easy to reason about.
Error formatting and HTTP response building. These run once per request at most. Not a hot path.

The Principle

Profile first, then replace only the code that is both CPU-bound and runs on every request. If it is I/O-bound, Rust will not help. If it runs once per deploy, the optimization does not matter.

Beyond JWT: What Else We Moved to Rust

Once the napi-rs infrastructure was in place, we identified other CPU-bound security operations that benefited from native performance. The auth-shield library now includes:

Argon2id password hashing — replaces bcryptjs. Uses the argon2 crate with recommended parameters (m=19456 KiB, t=2, p=1).
Timing-safe comparison — a timing_safe_equal() function using the subtle crate's ConstantTimeEq. Prevents timing side-channel attacks.
OTP verification with attempt tracking — generates cryptographically random numeric codes and verifies them with constant-time comparison.
Input sanitization — HTML sanitization via the ammonia crate, SQL interval validation, filename sanitization.
CSP header builder — a fluent API for constructing Content-Security-Policy headers with strict defaults.
API key generation and validation — generates prefixed API keys (auth1_pk_...) with HMAC-SHA256 hashes for storage.

The Results in Production

After deploying auth-shield behind a feature flag (USE_RUST_AUTH=true), we observed:

Auth middleware latency dropped from ~184 microseconds to ~4.4 microseconds per request. Consistent with the benchmark results.
CPU usage on auth servers dropped significantly during peak traffic periods.
Zero behavioral changes. Every error message, every HTTP status code, every header — identical. The Rust middleware is a drop-in replacement.
The binary is 3.1 MB with LTO enabled, symbols stripped, and opt-level = 3. No runtime dependencies beyond Node.js itself.

Feature Flag Approach

We ran both implementations in parallel for two weeks, comparing outputs on every request, before switching fully. Flipping one environment variable switches to the Rust path. If anything goes wrong, flip it back. No deployment needed.

How to Do This Yourself

Here is the step-by-step process to replace a hot path in your Node.js application with Rust via napi-rs.

Step 1: Set Up the Project

Bash terminal

mkdir my-native-module && cd my-native-module
npm init -y
cargo init --lib

Step 2: Configure Cargo.toml

TOML Cargo.toml

[package]
name = "my-native-module"
version = "1.0.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
napi = { version = "2", default-features = false, features = ["napi4"] }
napi-derive = "2"

[build-dependencies]
napi-build = "2"

[profile.release]
lto = true
opt-level = 3
codegen-units = 1
strip = "symbols"

Step 3: Write Your Hot-Path Function

Rust src/lib.rs

use napi_derive::napi;

#[napi(object)]
pub struct MyResult {
    pub success: bool,
    pub value: String,
}

#[napi]
pub fn my_hot_function(input: String) -> MyResult {
    // Your CPU-bound logic here
    MyResult {
        success: true,
        value: input.to_uppercase(),
    }
}

Step 4: Build and Use

Bash terminal

npm install -D @napi-rs/cli
npx napi build --release --platform

JavaScript index.js

const { myHotFunction } = require('./my-native-module');

const result = myHotFunction('hello');
console.log(result); // { success: true, value: 'HELLO' }

Common Pitfalls

Do not pass complex nested objects across the boundary. Flat structs with primitive fields are fast. Deeply nested objects with arrays of objects require serialization and will eat your performance gains.
Do not make the Rust function async unless you need to. Synchronous napi functions run on the main thread with no overhead. Async functions spawn a thread pool task, which adds scheduling latency.
LTO matters. Link-Time Optimization (lto = true) gave us roughly 15-20% improvement. Worth the longer compile time for release builds.
Test the binary on your deployment platform. A .node binary compiled on macOS will not work on Linux. Use napi-rs's cross-compilation support.

Closing Thoughts

The best performance optimization is not rewriting your entire application. It is finding the one function that runs on every single request and making that function as fast as possible.

For us, that function was authentication. Six discrete operations — token extraction, JWT verification, token type enforcement, tenant resolution, rate limiting, and request ID generation — fused into a single native call that completes in 4.36 microseconds.

The tooling has matured to the point where this is a weekend project, not a quarter-long initiative. napi-rs handles the V8 bindings. Cargo handles the build. The result is a single .node file you can require() like any other module.

auth-shield is open source: github.com/auth1/auth-shield

The Hottest Path in Our Stack

What the Middleware Actually Does

Why Rust + napi-rs (and Not a Full Rewrite)

The Implementation

The Express Middleware Wrapper

The Benchmark

Why Is It 42x Faster?

What We Kept in JavaScript

Beyond JWT: What Else We Moved to Rust

The Results in Production

How to Do This Yourself

Step 1: Set Up the Project

Step 2: Configure Cargo.toml

Step 3: Write Your Hot-Path Function

Step 4: Build and Use

Common Pitfalls

Closing Thoughts

Drop-In Rust Auth Middleware

Related Articles