Skip to content

Latest commit

 

History

History
147 lines (112 loc) · 6.08 KB

File metadata and controls

147 lines (112 loc) · 6.08 KB

🧪 Python Async Proxy | High-Performance Cythonized HTTP/HTTPS CONNECT Proxy

Language: Python 3.x Status: Active Optimization: Cython License: MIT

An ultra-fast, robust HTTP/HTTPS CONNECT proxy engineered entirely in asynchronous Python and heavily optimized via Cython for massive parallel concurrency.

📑 Table of Contents

🚀 Overview

This repository provides an elite-grade HTTP/HTTPS CONNECT proxy built using async Python and supercharged by Cython. It is designed to handle C10K loads with minimal latency and high throughput.

Note

System Tuning: Running high-concurrency benchmarks on Linux may require increasing the number of allowed file handles and tweaking sysctl settings for optimal TCP backlog performance.

Example Bash Output

$ sudo tail -n 5 /var/log/nginx/access.log
127.0.0.1 - - [29/May/2026:21:21:43 -0600] "GET /index.html HTTP/1.1" 200 612 "-" "wrk"
127.0.0.1 - - [29/May/2026:21:21:43 -0600] "GET /index.html HTTP/1.1" 200 612 "-" "wrk"

💻 Installation & Setup

Prerequisites & Dependencies

  • Python Distribution: We recommend using Miniforge.
  • Core Libraries:
    mamba install cython uvloop
    (Alternatively, pip install cython uvloop)
  • Source Builds: If compiling Python from source, use pip3 install cython setuptools and add flags like --enable-optimizations --enable-experimental-jit for maximum speed.
  • System Tools (Ubuntu):
    sudo apt update
    sudo apt -y install curl wrk siege

Build the Proxy

Compile the critical proxy.pyx path via Cython:

CFLAGS='-O3 -march=native' python setup.py build_ext --inplace --force

💡 Usage

Running the Proxy and Benchmarks

The built-in benchmark scripts handle firing up the proxy and blasting it with wrk and siege payloads.

  1. Prepare a Local Backend: Serve the provided index.html via Nginx (or Apache) at /var/www/html/.
    sudo cp index.html /var/www/html/index.html
    sudo systemctl start nginx
  2. Launch the Benchmark: The benchmark script (bench.sh) generates payload files (1KB, 16KB, 128KB, 1MB) using dd and runs tests for HTTP/HTTPS requests.
    bash bench.sh
    Override parameters: DURATION=5 or CONC=200 bash bench.sh.
  3. Advanced Toggles:
    • Multi-Core: Set WORKERS=<n> to leverage SO_REUSEPORT workers across cores.
    • C-Relay Offload: Set USE_C_RELAY=1 (build via make build) for poll + splice optimizations. Add CRELAY_THREADS=<n> to grow the thread pool.
    • CPU Pinning: Set PIN_WORKERS=1 alongside WORKERS>1 to lower cache thrashing.
    • C10K Sweeps: Run make bench-c10k with variables like CONC_LIST="1000 5000 10000 15000" PROXY_ENV="WORKERS=4 PIN_WORKERS=1" WRK_THREADS=4 SYSCTL_TUNE=1.

Verifying Traffic Flow

Confirm the proxy correctly routes data by checking backend access logs before and after a quick burst:

sudo wc -l /var/log/nginx/access.log
wrk -t1 -c50 -d2 --latency -H "Proxy-Authorization: Basic $(echo -n 'username:password' | base64)" http://127.0.0.1:8888/index.html
sudo tail -n 5 /var/log/nginx/access.log

Benchmarking Results

Tested with uvloop + -O3 -march=native build at 100 concurrency / 5s duration per payload.

  • small (HTML) (/index.html): wrk 28,216.40 req/s (p50 3.24ms, p99 10.45ms); siege 4,061.90 trans/s, throughput 0.26 MB/sec
  • 1KB binary (/payload_1k.bin): wrk 25,993.51 req/s (p50 3.46ms, p99 10.76ms); siege 4,120.40 trans/s, throughput 4.02 MB/sec
  • 16KB binary (/payload_16k.bin): wrk 17,002.71 req/s (p50 5.09ms, p99 15.28ms); siege 3,381.71 trans/s, throughput 52.84 MB/sec
  • 128KB binary (/payload_128k.bin): wrk 6,739.34 req/s (p50 11.49ms, p99 31.57ms); siege 1,875.13 trans/s, throughput 234.39 MB/sec
  • 1024KB binary (/payload_1024k.bin): wrk 983.22 req/s (p50 78.61ms, p99 176.76ms); siege 415.80 trans/s, throughput 415.80 MB/sec

HTTPS CONNECT (via proxy to local nginx SSL on :8443)

  • CONNECT HTML (/index.html): 611.62 req/s, transfer 0.04 MB/s
  • CONNECT 1KB binary (/payload_1k.bin): 416.67 req/s, transfer 0.41 MB/s
  • CONNECT 16KB binary (/payload_16k.bin): 341.30 req/s, transfer 5.33 MB/s
  • CONNECT 128KB binary (/payload_128k.bin): 374.53 req/s, transfer 46.82 MB/s
  • CONNECT 1024KB binary (/payload_1024k.bin): 281.49 req/s, transfer 281.49 MB/s

C10K sweep (wrk only)

  • best: wrk 1,743.01 req/s at C=1000
  • C=10000: wrk 898.55
  • sweep:
    • C=1000: wrk 1,743.01 req/s
    • C=5000: wrk 1,408.40 req/s
    • C=10000: wrk 898.55 req/s
    • C=15000: wrk 523.61 req/s

🐛 Issues & Support

If you encounter bugs, performance regressions, or simply have questions about the codebase, please open an issue in the repository. Provide reproducible steps and your system architecture details.

🤝 Contributing

We welcome community contributions to improve performance, documentation, and feature sets!

  1. Fork this repository.
  2. Create a new feature branch (git checkout -b feature/awesome-addition).
  3. Commit your changes (git commit -m 'Add awesome new feature').
  4. Push to the branch (git push origin feature/awesome-addition).
  5. Open a Pull Request for review.

📄 License

This project is open-sourced under the MIT License.


Explore more optimized projects on our main profile.