Reproducible, vendor-neutral performance benchmarks for production API gateways under a policy × protocol × load matrix.
We compare seven API gateways under identical conditions and let any independent reviewer:
- Re-run the benchmark locally or on AWS with a single command.
- Obtain a byte-for-byte equivalent ranking (within tolerance — see docs/REPRODUCIBILITY.md).
- Verify that all gateways treat the same request the same way (parity attestation).
- Inspect the full run manifest: image digests, git SHA, RNG seed, host info.
This project is developed and maintained by Wallarm, Inc. — the author of one of the gateways under test. To neutralise the conflict of interest, we follow strict rules:
- All gateway configs, k6 scenarios, and infrastructure are open and frozen at report release (the git SHA is pinned in
manifest.json). - Parity attestation runs before every cell: the same request, the same JWT seed, the same rate-limit window — gateways either behave identically, or the cell is marked as a
deviationand excluded from the aggregate. - Reasonable external tuning of a competing gateway is accepted as a PR — see
CONTRIBUTING.md § Gateway-tuning PRsand the PR template. - All deviations are documented in docs/GATEWAYS.md with a reason and an upstream reference.
| Gateway | Language | Role |
|---|---|---|
| Wallarm API Gateway | Rust | subject under test |
| NGINX | C | baseline |
| Envoy | C++ | baseline |
| HAProxy (candidate) | C | baseline |
| Kong | Lua/OpenResty | baseline |
| Apache APISIX | Lua/OpenResty | baseline |
| Traefik | Go | baseline |
| Tyk | Go | baseline |
(11 ranking policy profiles × 4 load profiles) + (2 HTTPS scenarios × 4 load profiles) = 52 cells per gateway × 7 gateways = 364 cells per run — plus the supplemental p03-jwks-rs256-basic capability scenario (parity-only, off-grid). See TASK.md §7.
Requirements: Linux/macOS host, Docker ≥ 24, 8+ physical cores, 16 GB RAM,
make,go ≥ 1.23.
Two independent scenarios. Pick one — don't mix them. The Makefile has a preflight check that refuses to boot a second stack while another one is up and tells you exactly which command clears it.
Run the full matrix (the default workflow):
git clone https://github.com/wallarm/gateway-benchmarks
cd gateway-benchmarks
make prereqs-check # verify the environment
make perf-local-run # parity → load → aggregate → manifest → reportThe orchestrator brings per-cell stacks up and down by itself — no separate
perf-local-up is needed. Result: reports/<run-id>/report.html
(bench run calls bench report automatically; make bench-report BENCH_RUN_ID=<run-id> and make bench-report BENCH_REPORT_COMBINE=run-a,run-b are also available).
Long-running smoke stack (only when you want to poke a live gateway by hand — parity, curl, logs):
make perf-local-up # bring loadgen + gateway + backend up in separate namespaces
make perf-local-parity # parity-check against localhost:9080
make perf-local-cycle-smoke # HTTP + HTTPS round-trip through the stack
make perf-local-down # tear down (also cleans up any orphan gwb-<gw>* per-cell stacks)Requirements: AWS credentials,
tofu≥ 1.7 (orterraform≥ 1.6), ~$15 per full run.
cd infra/aws
cp terraform.tfvars.example terraform.tfvars # set your CIDR and region
tofu init && tofu apply -auto-approve
cd ../..
make perf-aws-deploy # provision the stack on all 3 EC2 hosts
make perf-aws-run # run the matrix (same orchestrator)
make perf-aws-report # render reports/<run-id>/report.html (bench report)
make perf-aws-down # tear down EC2 (edits tfvars and runs apply).
├── TASK.md # PRD — what we measure and why
├── Makefile # single entry point
├── backend/ # forked go-httpbin — a predictable upstream
├── gateways/ # per-gateway configs × policy profile
├── k6/ # load profiles and scenarios
├── orchestrator/ # Go binary driving the run
├── infra/
│ ├── local/ # docker-compose + resource pins
│ └── aws/ # Terraform (3 EC2 cluster PG)
├── scripts/ # prereqs, parity, deploy, fetch
├── reports/ # output runs (local-only, never tracked — see docs/REPORT.md)
└── docs/ # ARCHITECTURE / POLICIES / LOAD-PROFILES / GATEWAYS / REPRODUCIBILITY / REPORT
- TASK.md — PRD (mandatory properties of the benchmark)
- CHANGELOG.md — versioned release notes
- CONTRIBUTING.md — how to submit tuning PRs, what we review
- SECURITY.md — security policy + what is and isn't a secret in this tree
- docs/ARCHITECTURE.md — local/AWS topology and network path
- docs/POLICIES.md — 11 ranking + 1 supplemental policy profiles, parity requirements
- docs/LOAD-PROFILES.md — 4 load profiles
- docs/GATEWAYS.md — versions, digests, deviations
- docs/REPORT.md — how to read the HTML report
- docs/REPRODUCIBILITY.md — manifest, seeds, tolerance,
bench compare-runsgate - docs/CANONICAL-RUN-HANDOFF.md — executable playbook for the AWS canonical sweep
- docs/RELEASE.md — maintainer release process
- orchestrator/README.md —
benchGo binary (Phases 6 + 7 + 8)
Apache 2.0 — see LICENSE.