This document describes the key components and design decisions behind the rpycbench benchmark suite.
rpycbench compares the performance of RPyC (Remote Python Call) against HTTP/REST APIs across three server configurations:
- RPyC Threaded - Thread-per-connection model
- RPyC Forking - Process-per-connection model
- HTTP Threaded - Flask with threaded request handling
All servers run in separate processes (via multiprocessing.Process) to isolate them from the client's GIL and provide fair comparison.
RPyC Servers (rpyc_servers.py)
BenchmarkService (lines 12-59)
- RPyC service exposing remote methods for benchmarking
- Methods:
ping(),echo(),upload(),download(),compute(),sleep() - File transfer methods:
upload_file(),download_file(),upload_file_chunked(),download_file_chunked()
Server Modes (_run_rpyc_server, lines 61-109)
- ThreadedServer (lines 76-82): Spawns new thread for each client connection
- ForkingServer (lines 83-89): Spawns new process for each client connection
- Both use pickle serialization via RPyC protocol
- Configuration:
allow_public_attrs=True,allow_pickle=True,sync_request_timeout=30s
RPyCServer Wrapper (lines 111-186)
- Context manager for server lifecycle management
- Runs server in separate process to isolate from client
- Uses
multiprocessing.Event()for readiness signaling - Socket verification ensures server is accepting connections before proceeding
Connection Factory (lines 188-199)
create_rpyc_connection(): Creates client connection with matching protocol config- Returns RPyC connection object with
.rootattribute for remote method calls
HTTP Server (http_servers.py)
Flask Application (_run_http_server, lines 12-112)
- REST endpoints mirroring RPyC service methods
GET /ping- latency testingPOST /upload,GET /download/<size>- bandwidth testingPOST /upload-file,GET /download-file/<size>- file transferPOST /upload-file-chunked,GET /download-file-chunked/<size>/<chunk_size>- chunked transfers- Line 104:
threaded=Trueenables Flask's threaded mode (thread-per-request)
HTTPBenchmarkServer Wrapper (lines 114-188)
- Context manager for server lifecycle management
- Runs Flask server in separate process
- Same readiness signaling pattern as RPyC server
Connection Factory (lines 190-203)
create_http_session(): Createsrequests.Sessionwith connection poolingHTTPAdapter: 10 connections, max 100 pool size, 3 retries- Supports HTTP keep-alive for persistent connections
2. Benchmark Framework (benchmark.py)
BenchmarkBase abstract class defines lifecycle:
def execute() -> BenchmarkMetrics:
setup() # Create connections, warmup
metrics.start()
run() # Actual benchmark
metrics.end()
teardown() # Cleanup connections
return metricsAll benchmarks follow this pattern with BenchmarkMetrics tracking results.
Purpose: Measure connection establishment overhead
Process:
- Sequential creation of
num_connections(default 100) connections - Time each:
start → connection_factory() → duration - Store all connections for cleanup
What it tests: Server's ability to accept and establish new connections quickly
Purpose: Measure request/response latency
Process:
- Create single persistent connection
- Warmup: 10 requests to prime caches
- Timed requests:
num_requests(default 1000) sequential calls - Time each:
start → request_func(conn) → duration
What it tests: Round-trip time for minimal requests under no load
Purpose: Measure data transfer throughput
Process:
- Test multiple data sizes: 1KB, 10KB, 100KB, 1MB
- For each size, run 10 iterations:
- Upload:
upload_func(conn, data) - Download:
download_func(conn, size)
- Upload:
- Calculate bandwidth:
bytes / duration
What it tests: Protocol overhead and serialization cost at different data sizes
Purpose: Test large file transfers with realistic sizes
File sizes: 1.5MB, 128MB, 500MB Chunk size: 64KB (configurable) Iterations: 3 per test
Process:
- Non-chunked transfers (lines 376-426):
- Single call with entire file
- Tests maximum throughput with minimal call overhead
- Chunked transfers (lines 429-489):
- Multiple calls with 64KB chunks
- Tests latency impact when data must be split
What it tests:
- Raw transfer throughput for large files
- Impact of chunking on overall transfer time
- Tradeoff between latency and bandwidth
Purpose: Measure performance under concurrent load
Configuration:
num_clients: Number of concurrent connections (default 10 for suite, supports 128+)requests_per_client: Requests each client makes (default 100)max_workers: Thread pool size (capped at 128)
Process (lines 586-632):
- Create
ThreadPoolExecutorwithnum_clientsthreads - Each worker thread (
_client_worker, lines 537-584):- Establishes own connection
- Makes
requests_per_clientrequests - Tracks per-client metrics (latencies, errors)
- Closes connection
- Aggregate all client metrics
Critical detail: All clients run in threads within the client process, not separate processes. This tests how well the server handles concurrent load from multiple simultaneous connections.
What it tests:
- Server's ability to handle multiple concurrent connections
- Performance degradation under load
- Connection establishment overhead under concurrent load
3. Orchestration (suite.py)
Configuration:
num_serial_connections: Connections for ConnectionBenchmark (default 100)num_requests: Requests for LatencyBenchmark (default 1000)num_parallel_clients: Concurrent clients for ConcurrentBenchmark (default 10)requests_per_client: Requests per client in concurrent test (default 100)
Sequential execution of three server configurations:
-
RPyC Threaded (lines 55-68):
with RPyCServer(host, port, mode='threaded'): _run_rpyc_benchmarks(...)
- Starts ThreadedServer in separate process
- Runs all 5 benchmark types
- Server context manager ensures cleanup
-
RPyC Forking (lines 71-84):
with RPyCServer(host, port, mode='forking'): _run_rpyc_benchmarks(...)
- Starts ForkingServer in separate process
- Runs same 5 benchmark types
-
HTTP Threaded (lines 87-99):
with HTTPBenchmarkServer(host, port, threaded=True): _run_http_benchmarks(...)
- Starts Flask server with
threaded=True - Runs same 5 benchmark types
- Starts Flask server with
Connection pattern:
connection_factory = lambda: create_rpyc_connection(host, port)
request_func = lambda conn: conn.root.ping()- Direct method calls via
conn.root.method() - Binary data passed as Python objects (pickle serialization)
- Single persistent TCP connection per benchmark
Connection pattern:
connection_factory = lambda: create_http_session()
request_func = lambda session: session.get(f"{url}/ping")- REST endpoints via
session.get/post(url) - Binary data as raw bytes in request body
- Chunked data as JSON with hex-encoded chunks
- HTTP connection pooling via requests.Session
| Aspect | RPyC Threaded | RPyC Forking | HTTP Threaded |
|---|---|---|---|
| Server concurrency | Thread per connection | Process per connection | Thread per request |
| Server file | rpyc_servers.py:76-82 | rpyc_servers.py:83-89 | http_servers.py:104 |
| Isolation | Low (shared memory) | High (separate process) | Low (shared memory) |
| Connection cost | Low | High (fork overhead) | Low |
| Memory overhead | Low | High (copy-on-write) | Low |
| GIL impact | High (threads share GIL) | None (separate processes) | High (threads share GIL) |
| Serialization | pickle (binary) | pickle (binary) | JSON + HTTP headers |
| Transport | Raw TCP | Raw TCP | HTTP over TCP |
| Protocol overhead | Minimal (RPyC) | Minimal (RPyC) | High (HTTP headers, JSON) |
| Client API | conn.root.method() |
conn.root.method() |
session.get/post(url) |
| Keep-alive | Single persistent connection | Single persistent connection | HTTP connection pooling |
Based on the architecture:
- RPyC Threaded: Fast (thread creation)
- RPyC Forking: Slow (process fork overhead)
- HTTP: Fast (thread creation)
- RPyC Threaded: Fastest (minimal protocol overhead)
- RPyC Forking: Fast (minimal protocol overhead)
- HTTP: Slower (HTTP headers, JSON parsing)
- RPyC Threaded: Fast (pickle is efficient for binary)
- RPyC Forking: Fast (pickle is efficient for binary)
- HTTP: Slower for chunked (JSON hex encoding), comparable for raw binary
- RPyC Threaded: May degrade due to GIL contention
- RPyC Forking: Better isolation, higher memory cost
- HTTP: May degrade due to GIL contention
All servers run in separate processes via multiprocessing.Process to:
- Isolate server from client's GIL
- Prevent client measurement from affecting server performance
- Provide fair comparison between protocols
Both server types use:
multiprocessing.Event()- server signals when started- Socket verification - client verifies connection acceptance
This ensures benchmarks don't start until server is truly ready.
All servers and some benchmarks use context managers (__enter__/__exit__) to:
- Guarantee cleanup even on errors
- Simplify server lifecycle management
- Make benchmark code cleaner
- ConnectionBenchmark: Sequential (tests raw connection overhead)
- LatencyBenchmark: Sequential (tests baseline latency)
- BandwidthBenchmark: Sequential (tests maximum throughput)
- ConcurrentBenchmark: Parallel threads (tests under load)
This separation allows understanding both optimal and realistic performance.
HTTP chunked transfers use JSON with hex-encoded chunks rather than streaming because:
- Simplifies implementation for benchmarking
- Allows measuring serialization overhead
- Tests JSON parsing performance
- More comparable to RPyC's object serialization
This may not represent optimal HTTP performance but provides fair protocol comparison.
- Minimal request latency (less protocol overhead)
- Binary data transfer (efficient pickle serialization)
- Persistent connections (no HTTP header overhead per request)
- Python-to-Python communication (native object serialization)
- Multi-language environments (standard protocol)
- Firewall-friendly deployments (standard port 80/443)
- REST API compatibility (standard HTTP verbs)
- Caching and proxy support (HTTP infrastructure)
- CPU-bound server operations (no GIL contention)
- Isolation requirements (separate memory space)
- Per-connection state that shouldn't leak
- I/O-bound operations (less overhead than forking)
- Memory-constrained environments (shared memory)
- High connection rate (faster than fork)