This repository provides an inference-ready ONNX implementation of a Deep Q-Network (DQN) agent trained to select the optimal data center in a 5G / MEC latency optimization scenario.
The model is exported from Stable-Baselines3 (DQN) and distributed as a reproducible, lightweight artifact suitable for deployment, benchmarking, and research use.
This model performs a Reinforcement Learning–based decision agent that predicts a discrete action:
Input:
A normalized tensor representing the current state of three candidate data centers, shaped as (3 × 10), where each row corresponds to a data center and each column represents a feature such as client identifier, resource utilization, network metrics, latency statistics, packet loss, and carbon intensity.
Output:
A discrete action in {0, 1, 2} corresponding to the selection of the optimal data center, computed as the index with the highest predicted Q-value.
.
├── model/
│ ├── 5g_latency_opt_dqn_model.onnx # ONNX model (opset 18)
│ └── model_config.json # Model metadata, I/O specs, preprocessing params
│
├── src/
│ ├── inference_engine.py # ONNXRuntime inference wrapper
│ ├── state_serializer.py # Builds (1,3,10) input tensor from JSON
│ ├── minmax_scaler.py # MinMax scaling (training-fitted params)
│ └── action_interpreter.py # Human-readable action decoding
│
├── demo.py # End-to-end inference demo
├── requirements.txt # Minimal inference dependencies
└── README.md
The model was trained on a tabular dataset containing per–data center telemetry and network metrics. Each decision step groups three rows (one per candidate data center) into a single observation.
After preprocessing, each data center is represented by 10 features:
client_id(label-encoded)cpu_usage_percentmemory_usage_percentdisk_usage_percentnet_in_percentnet_out_percentlatency_avglatency_mdevlost_percentcarbon_intensity
All numeric features are MinMax-scaled using parameters learned on the training dataset.
- Algorithm: Deep Q-Network (DQN)
- Framework: Stable-Baselines3
- Policy network: MLP-based Q-network
- Export format: ONNX (opset 18)
The ONNX model contains only the Q-network, optimized for inference.
Inputs
- Name:
observation - Type:
float32 - Shape:
(batch_size, 3, 10)
Where:
3= number of candidate data centers10= feature vector length per data center
Feature Order (Last Dimension)
The feature order must be exactly:
[
client_id,
cpu_usage_percent,
memory_usage_percent,
disk_usage_percent,
net_in_percent,
net_out_percent,
latency_avg,
latency_mdev,
lost_percent,
carbon_intensity
]
Outputs
- Name:
q_values - Type:
float32 - Shape:
(batch_size, 3)
Each output value represents the Q-value of selecting a specific data center:
0→ Data Center 0 (Milan)1→ Data Center 1 (Rome)2→ Data Center 2 (Cosenza)
The final decision is:
action = argmax(q_values)Setup Environment
Create and activate a virtual environment, then install dependencies:
python -m venv .venv
source .venv/bin/activate # Linux / macOS
# .venv\Scripts\activate # Windows
pip install -r requirements.txtMinimum runtime dependencies:
onnxruntimenumpypandas
Run Inference Script
Run the demo script:
python demo.pyThe demo will:
- Load a JSON scenario containing dataCenterStates
- Apply preprocessing (MinMax scaling + client_id encoding)
- Run inference using ONNXRuntime
- Print the selected data center and corresponding Q-values
- The model is trained for exactly three data centers; input shape is fixed.
- Inference requires the same preprocessing parameters used during training:
-- MinMaxScaler
data_min/data_max--client_idencoding mapping - Unknown or unseen
client_idvalues must be handled explicitly. - Performance outside the training distribution is not guaranteed.
- This is a decision-support model, not a guaranteed optimal controller.
The ONNX model and inference bundle are archived on Zenodo for reproducibility and citation:
- Zenodo record: 10.5281/zenodo.18303750
- DOI: 10.5281/zenodo.18303750