Auth1 Blog 42x Faster Auth Middleware
Performance Rust · 22 min read

How We Built a 42x Faster Auth Middleware in Rust

We replaced our Node.js auth middleware chain (JWT verification, rate limiting, tenant resolution, request ID generation) with a single Rust function compiled to a native Node.js addon via napi-rs. The fused middleware runs in 4.36 microseconds per request instead of 184 microseconds — a 42.28x speedup.

The Hottest Path in Our Stack

JWT verification runs on every single authenticated request. Every one. If you are building a multi-tenant SaaS platform that handles millions of auth requests per day, the middleware chain sitting between "HTTP request arrives" and "route handler executes" is, by definition, your hottest code path.

We run Auth1, a multi-tenant authentication platform. At our scale, we noticed that authentication middleware was consuming a disproportionate share of CPU time on our API servers. Not database queries. Not business logic. The middleware chain that runs before any of that even starts.


What the Middleware Actually Does

A typical Express or Fastify auth middleware is not just one function. It is a chain of 4-6 discrete operations that execute sequentially on every request:

  1. Extract the Bearer token from the Authorization header
  2. Verify the JWT signature (HMAC-SHA256) and check expiration
  3. Enforce token type — reject refresh tokens used as access tokens (a common attack vector in token confusion attacks)
  4. Resolve tenant context — in a multi-tenant system, determine which tenant this request belongs to from JWT claims, the X-Tenant-ID header, or the subdomain
  5. Rate limit check — per-user, per-tenant token bucket
  6. Generate a request ID (UUID v4) for tracing and logging

In a standard Node.js implementation, each of these is a separate middleware function. The request object gets passed through the chain, each function reads from it, writes to it, and calls next(). This is clean, composable architecture. It is also slow.

JWT verification alone using the popular jsonwebtoken npm package takes roughly 180 microseconds per call. Add crypto.randomUUID() for the request ID, a Map lookup for rate limiting, string parsing for tenant resolution, and the overhead of multiple JS-to-native boundary crossings for the crypto operations, and you are looking at around 184 microseconds per request.

At 5,000 requests per second, that is 920 milliseconds of CPU time per second spent purely on authentication overhead. At 50,000 requests per second, it becomes the bottleneck.


Why Rust + napi-rs (and Not a Full Rewrite)

We considered several approaches. Rewriting the entire API server in Rust was the nuclear option — high risk, months of work, and most of our code is I/O-bound business logic that does not benefit from Rust's performance characteristics. Writing a separate Rust microservice for auth verification would add network latency and operational complexity.

The insight was simpler: we did not need to replace Express. We needed to replace the one function that runs on every request.

napi-rs is a framework for building compiled Node.js native addons in Rust. You write Rust functions annotated with #[napi], and the build tool generates a .node binary that you can require() from JavaScript like any other module.

This gave us the architecture we wanted:

The compiled binary is about 3.1 MB. No runtime dependencies. No Rust toolchain required on the deployment target.


The Implementation

The core idea is operation fusion. Instead of five separate middleware functions that each cross the JS-to-native boundary for crypto operations, we do everything in a single native call. One boundary crossing. One function. All six operations.

Here is the Rust function signature:

Rust src/middleware.rs
#[napi]
pub fn authenticate_request(
    auth_header: Option<String>,
    tenant_header: Option<String>,
    host: Option<String>,
    config: MiddlewareConfig,
) -> AuthResult {
    let request_id = Uuid::new_v4().to_string();

    // 1. Extract Bearer token from Authorization header
    let token = match &auth_header {
        Some(header) => {
            if let Some(token) = header.strip_prefix("Bearer ") {
                token.to_string()
            } else {
                return AuthResult {
                    authenticated: false,
                    error: Some("Invalid Authorization header format".into()),
                    error_status: Some(401),
                    request_id,
                    ..Default::default()
                };
            }
        }
        None => {
            return AuthResult {
                authenticated: false,
                error: Some("No Authorization header provided".into()),
                ..
            };
        }
    };

    // 2. Verify JWT and enforce token type (access vs refresh)
    let payload = match verify_access_token(token, jwt_config) {
        Ok(p) => p,
        Err(e) => { return AuthResult { authenticated: false, .. }; }
    };

    // 3. Resolve tenant ID (JWT claim > header > subdomain)
    let resolved_tenant = payload.tenant_id.clone()
        .or_else(|| tenant_header.clone())
        .or_else(|| /* subdomain extraction */);

    // 4. Validate tenant match if enforced
    // 5. Rate limiting (per user + tenant, DashMap-backed token bucket)
    // 6. Return AuthResult with all fields populated

    AuthResult {
        authenticated: true,
        user_id: Some(payload.sub),
        email: payload.email,
        role: payload.role,
        tenant_id: resolved_tenant,
        request_id,
        rate_limit_remaining: Some(rate_result.1),
        ..
    }
}

The AuthResult struct is annotated with #[napi(object)], which means napi-rs automatically converts it to a plain JavaScript object when it crosses the boundary. No manual serialization. No JSON.parse. The JavaScript side receives a normal object with all the fields populated.

The rate limiter uses a global DashMap<String, TokenBucket> — a lock-free concurrent hashmap from the dashmap crate. Each key (formatted as userId:tenantId) gets its own token bucket that refills at a configurable rate.

The Express Middleware Wrapper

On the JavaScript side, the integration is minimal:

JavaScript middleware.js
const { authenticateRequest } = require('auth-shield');

const config = {
  accessSecret: process.env.JWT_ACCESS_SECRET,
  refreshSecret: process.env.JWT_REFRESH_SECRET,
  rateLimitMax: 100,
  rateLimitRefill: 10.0,
  enforceTenantMatch: true,
  issuer: 'auth1',
};

function rustAuth(req, res, next) {
  const result = authenticateRequest(
    req.headers.authorization || null,
    req.headers['x-tenant-id'] || null,
    req.headers.host || null,
    config
  );

  req.id = result.requestId;
  res.setHeader('X-Request-Id', result.requestId);

  if (!result.authenticated) {
    return res.status(result.errorStatus || 401).json({ error: result.error });
  }

  req.user = {
    userId: result.userId,
    email: result.email,
    role: result.role,
  };
  req.appId = result.tenantId;
  next();
}

app.use('/api', rustAuth);

That is the entire integration. One middleware function that calls one native function. Everything that used to be five separate app.use() calls is now a single synchronous call that returns a result object.


The Benchmark

We benchmarked both implementations with 100,000 iterations on the same machine, same JWT token, same configuration. The Rust middleware performs all six operations. The JavaScript baseline performs JWT verification via the jsonwebtoken package plus crypto.randomUUID().

Text benchmark-results.txt
Benchmarking 100,000 iterations...

Rust fused middleware:
  Total: 435.6ms
  Per-op: 4.36us
  Ops/sec: 229,574

JS middleware chain (jwt.verify + crypto.randomUUID):
  Total: 18,418.6ms
  Per-op: 184.19us
  Ops/sec: 5,429

Speedup: 42.28x faster with Rust

The Rust implementation does more work (rate limiting, tenant resolution, token type enforcement) and is still 42x faster than the JavaScript version that does less work.


Why Is It 42x Faster?

The speedup is not from any single optimization. It compounds from five factors:

1. No garbage collector pauses. Every JavaScript middleware invocation allocates objects — the decoded JWT payload, the rate limit result, intermediate strings. These accumulate and trigger GC pauses. Rust allocates on the stack where possible, and heap allocations are freed deterministically when they go out of scope. Zero GC pressure from the auth path.

2. Fused operations eliminate boundary crossings. In the JavaScript version, jwt.verify() calls into OpenSSL via Node's crypto bindings (JS to native and back), then crypto.randomUUID() does another round trip. Each crossing has overhead. Our Rust middleware does everything in a single native call. One crossing instead of four or five.

3. DashMap vs JavaScript Map. The rate limiter uses DashMap, a lock-free concurrent hashmap that uses fine-grained sharding. It does not need the event loop. It does not need async. JavaScript's Map is single-threaded and relies on the event loop for any form of concurrency control.

4. UUID generation. Rust's uuid crate generates v4 UUIDs roughly 5-10x faster than Node's crypto.randomUUID(). Both use the OS CSPRNG, but the Rust version has less overhead in formatting the output string.

5. JWT verification. The jsonwebtoken Rust crate uses ring for HMAC-SHA256 computation. The npm package uses Node's built-in crypto bindings. Both ultimately call optimized C code, but the Rust version avoids the V8 binding overhead and the JavaScript object allocation for the decoded claims.


What We Kept in JavaScript

Being precise about what deserves to be in Rust is just as important as knowing what to move there:

The Principle

Profile first, then replace only the code that is both CPU-bound and runs on every request. If it is I/O-bound, Rust will not help. If it runs once per deploy, the optimization does not matter.


Beyond JWT: What Else We Moved to Rust

Once the napi-rs infrastructure was in place, we identified other CPU-bound security operations that benefited from native performance. The auth-shield library now includes:


The Results in Production

After deploying auth-shield behind a feature flag (USE_RUST_AUTH=true), we observed:

Feature Flag Approach

We ran both implementations in parallel for two weeks, comparing outputs on every request, before switching fully. Flipping one environment variable switches to the Rust path. If anything goes wrong, flip it back. No deployment needed.


How to Do This Yourself

Here is the step-by-step process to replace a hot path in your Node.js application with Rust via napi-rs.

Step 1: Set Up the Project

Bash terminal
mkdir my-native-module && cd my-native-module
npm init -y
cargo init --lib

Step 2: Configure Cargo.toml

TOML Cargo.toml
[package]
name = "my-native-module"
version = "1.0.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
napi = { version = "2", default-features = false, features = ["napi4"] }
napi-derive = "2"

[build-dependencies]
napi-build = "2"

[profile.release]
lto = true
opt-level = 3
codegen-units = 1
strip = "symbols"

Step 3: Write Your Hot-Path Function

Rust src/lib.rs
use napi_derive::napi;

#[napi(object)]
pub struct MyResult {
    pub success: bool,
    pub value: String,
}

#[napi]
pub fn my_hot_function(input: String) -> MyResult {
    // Your CPU-bound logic here
    MyResult {
        success: true,
        value: input.to_uppercase(),
    }
}

Step 4: Build and Use

Bash terminal
npm install -D @napi-rs/cli
npx napi build --release --platform
JavaScript index.js
const { myHotFunction } = require('./my-native-module');

const result = myHotFunction('hello');
console.log(result); // { success: true, value: 'HELLO' }

Common Pitfalls


Closing Thoughts

The best performance optimization is not rewriting your entire application. It is finding the one function that runs on every single request and making that function as fast as possible.

For us, that function was authentication. Six discrete operations — token extraction, JWT verification, token type enforcement, tenant resolution, rate limiting, and request ID generation — fused into a single native call that completes in 4.36 microseconds.

The tooling has matured to the point where this is a weekend project, not a quarter-long initiative. napi-rs handles the V8 bindings. Cargo handles the build. The result is a single .node file you can require() like any other module.

auth-shield is open source: github.com/auth1/auth-shield

Drop-In Rust Auth Middleware

auth-shield gives you 42x faster JWT verification, Argon2id hashing, rate limiting, and PII encryption. One npm install. Zero Rust toolchain required.

Start Free → Read the Docs
Free tier · 1,000 verifications/month · No credit card required