Authors: Muhammad Sabeeh (23K-0002), Rayyan Merchant (23K-0073)
A Deep Reinforcement Learning (DRL) system that dynamically routes network traffic using DQN and DDQN algorithms, built on ns-3.35 + ns3-gym + PyTorch. The agent learns to minimize delay, packet loss, and maximize throughput β outperforming traditional Dijkstra routing.
CN Project/
βββ agent/ # PyTorch DRL agents
β βββ network.py # QNetwork architecture (Embedding β FC β 3 Q-values)
β βββ dqn_agent.py # DQNAgent + DDQNAgent classes
β βββ replay_buffer.py # Experience Replay Buffer
βββ baseline/ # Dijkstra baseline
β βββ run_baseline.py # Runs 3 scenarios Γ 3 runs via ns-3
β βββ parse_flowmon.py # Parses FlowMonitor XML β DataFrame
βββ configs/
β βββ hyperparams.py # Single source of truth for ALL hyperparameters
βββ env/ # RL environment
β βββ ns3_wrapper.py # Gym wrapper around ns3-gym ZMQ interface
β βββ metrics.py # DRSIR reward function (cost minimization)
βββ training/ # Training & evaluation scripts
β βββ train_dqn.py # DQN training (500 episodes)
β βββ train_ddqn.py # DDQN training (500 episodes)
β βββ run_inference.py # Run trained agent in greedy mode
β βββ evaluate.py # Evaluate all algorithms Γ all scenarios
β βββ health_check.py # 5-point training verification
βββ results/ # Generated outputs
β βββ checkpoints/ # Saved model weights (.pt files)
β βββ logs/ # Training CSVs + comparison CSVs
β βββ plots/ # PDF figures + generate_all.py
β βββ raw/ # Raw FlowMonitor XML files
βββ routing_sim.cc # ns-3 C++ simulation (topology + traffic + opengym hooks)
βββ routing_env.cc # ns-3 C++ RL environment (obs/action/reward interface)
βββ routing_env.h # Header for RoutingEnv class
βββ README.md # This file
S1(0)ββββββR1(2)ββββββD1(5)
β \ β \ / |
β \ β \ / |
β R2(3)ββ R3(4) |
β / β / |
S2(1)ββββββR2(3)ββββββββ
- 6 nodes: S1, S2 (sources), R1, R2, R3 (routers), D1 (destination)
- 10 point-to-point links with varying bandwidth (3β10 Mbps) and delay (2β12 ms)
- 3 candidate paths per source-destination pair
- Link failure: R1βD1 fails at t=40s in failure scenario
The WSL environment is pre-configured with:
- ns-3.35 at
~/ns-allinone-3.35/ns-3.35/(WAF-based build) - ns3-gym (opengym) in
contrib/opengym/(WAF-compatibleappbranch) - Python 3.10 with:
torch,gym,ns3gym,zmq,protobuf,pandas,matplotlib,numpy - C++ files compiled in
scratch/drl_routing/
# 1. Install system dependencies
sudo apt update && sudo apt install -y gcc g++ python3 python3-pip \
libzmq5-dev libprotobuf-dev protobuf-compiler
# 2. Download and extract ns-3.35
cd ~
wget https://www.nsnam.org/releases/ns-allinone-3.35.tar.bz2
tar xf ns-allinone-3.35.tar.bz2
# 3. Clone ns3-gym (WAF-compatible branch)
cd ~/ns-allinone-3.35/ns-3.35/contrib
git clone https://github.com/tkn-tub/ns3-gym.git opengym
cd opengym && git checkout app
# 4. Copy C++ simulation files
mkdir -p ~/ns-allinone-3.35/ns-3.35/scratch/drl_routing
cp routing_sim.cc routing_env.cc routing_env.h ~/ns-allinone-3.35/ns-3.35/scratch/drl_routing/
# 5. Configure and build ns-3
cd ~/ns-allinone-3.35/ns-3.35
./waf configure --build-profile=optimized --disable-examples --disable-tests --disable-python
./waf build -j4
# 6. Install Python dependencies
pip install torch numpy pandas matplotlib gym zmq protobuf
cd ~/ns-allinone-3.35/ns-3.35/contrib/opengym/model/ns3gym
pip install -e .
# 7. Patch ns3gym for NumPy 2.0 compatibility
# In ns3gym/ns3env.py, replace:
# np.float β np.float64
# np.int β np.int64
# np.uint β np.uint64
# 8. Copy Python project
cp -r "CN Project/" ~/drl_project/| Column | Description |
|---|---|
episode |
Episode number (0β499) |
reward |
Total DRSIR cost for the episode (lower = better) |
avg_loss |
Average MSE loss for the episode |
epsilon |
Exploration rate (1.0 β 0.05) |
action{0,1,2}_frac |
Fraction of steps using each path |
| Parameter | Value | Description |
|---|---|---|
N_EPISODES |
500 | Training episodes |
STEPS_PER_EP |
20 | Steps per episode (100s Γ· 5s) |
GAMMA |
0.1 | Discount factor (near-sighted) |
EPS_MAX/MIN |
1.0/0.05 | Epsilon-greedy range |
REPLAY_START |
200 | Steps before training begins |
BATCH_SIZE |
15 | Replay buffer mini-batch |
K_PATHS |
3 | Candidate paths per SD pair |
HIDDEN_NEURONS |
50 | Network hidden layer size |
| Check | What It Verifies |
|---|---|
| 1. Epsilon decay | Started at 1.0, ended at 0.05 |
| 2. Loss non-zero | β₯30% episodes have training loss |
| 3. Cost trend | Later episodes cost less than early ones |
| 4. Path exploration | Agent uses all 3 paths |
| 5. No NaN | No corrupted values in logs |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ns-3 Simulator β
β routing_sim.cc β topology, traffic, FlowMonitor β
β routing_env.cc β RoutingEnv (obs/action/reward) β
β β ZeroMQ (port 5555) via ns3-gym β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Python Agent β
β ns3_wrapper.py β Gym interface β
β metrics.py β DRSIR reward computation β
β dqn_agent.py β DQN/DDQN with experience replay β
β network.py β QNetwork (Embedding β FC β 3 Q-values) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- ns-3 simulates the network, generates traffic, measures throughput/delay/loss
- ns3-gym exposes observations (per-path BW, delay, loss) and accepts routing actions via ZMQ
- Python agent observes network state, selects a path (Ξ΅-greedy), receives DRSIR cost
- DQN/DDQN learns to minimize cost using experience replay and target networks
- DQN: Target uses
Q_target(s').min()directly β can overestimate - DDQN: Online network selects action (
argmin), target network evaluates it β more stable
| Problem | Solution |
|---|---|
Address already in use (ZMQ port 5555) |
killall -9 drl_routing in WSL |
ns3gym import error |
cd ~/ns-allinone-3.35/ns-3.35/contrib/opengym/model/ns3gym && pip install -e . |
np.float deprecated |
Patch ns3env.py: np.float β np.float64 |
gymnasium not found |
Use import gym (not gymnasium) β ns3gym uses old gym |
| Build fails on Python bindings | Add --disable-python to waf configure |
| ns-3 runs but Python doesn't connect | Make sure ns-3 has --enableRL=true |
| Training loss is 0 for early episodes | Normal β buffer needs 200 steps (10 episodes) to warm up |
- Problem: Static routing (Dijkstra) can't adapt to congestion or link failures
- Solution: DRL agent learns optimal routing through trial-and-error
- Architecture: ns-3 (C++) β ZMQ β Python (PyTorch DQN/DDQN)
- Results: Agent trains for 500 episodes, epsilon decays from 1.0β0.05
- Baseline: 3 scenarios tested β Normal (0% loss), Congested (10% loss), Failure (15% loss)
- Health: All 10/10 health checks pass for both DQN and DDQN
Happy Routing! π