System monitoring is crucial for maintaining optimal server performance, identifying issues before they become critical, and ensuring high availability of services. Prometheus Node Exporter is a powerful, lightweight tool that collects comprehensive hardware and operating system metrics from Linux servers, providing deep insights into system health and performance.
In this hands-on project, you'll learn to:
- Install and configure Prometheus Node Exporter for comprehensive system monitoring
- Set up Prometheus server to collect and store metrics
- Create effective monitoring dashboards and alerting rules
- Understand key system metrics and their significance
- Implement monitoring best practices for production environments
- Scale monitoring across multiple servers and services
This project builds on your DevOps foundation and demonstrates practical system monitoring that can be extended for enterprise-grade observability solutions.
- Monitoring Server: A Linux server/VM to run Prometheus (Ubuntu/Debian preferred)
- Target Server(s): Linux server(s) to monitor with Node Exporter
- Network Connectivity: Both servers must be able to communicate (Prometheus → Node Exporter on port 9100)
- Storage Space: Sufficient space for Prometheus data storage (TSDB)
- Memory: At least 2GB RAM recommended for Prometheus server
- Basic Linux command line skills and system administration
- Understanding of networking concepts (ports, firewalls)
- Text editor familiarity (nano, vim, etc.)
- Previous completion of Ansible projects (Mini Projects 6-9)
- Screenshots of each major step and monitoring dashboards
- Configuration files (Prometheus config, systemd services)
- Command outputs showing successful Node Exporter and Prometheus setup
- Monitoring verification evidence (metrics queries, graphs, targets status)
- Troubleshooting evidence (if issues occurred)
Objective: Ensure your system is ready for monitoring setup.
- Check Linux Distribution:
cat /etc/os-releaseExpected Output: Should show Ubuntu, Debian, or similar Linux distribution.
- Verify Sudo Access:
sudo whoamiExpected Output: Should return root (confirming sudo privileges).
- Check Network Connectivity:
ping -c 3 google.comExpected Output: Successful ping responses.
- Check Available Resources:
# Memory check
free -h
# Disk space check
df -h
# CPU information
nprocExpected Output: System resource information for capacity planning.
Objective: Set up Prometheus server to collect and store monitoring metrics.
Download and install Prometheus:
# Create prometheus user and directories
sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
# Download Prometheus
curl -LO https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
# Extract and install
tar -xvf prometheus-2.45.0.linux-amd64.tar.gz
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
sudo cp -r prometheus-2.45.0.linux-amd64/consoles /etc/prometheus
sudo cp -r prometheus-2.45.0.linux-amd64/console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/*
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
# Cleanup
rm -rf prometheus-2.45.0.linux-amd64*Create Prometheus configuration file:
sudo tee /etc/prometheus/prometheus.yml > /dev/null << 'EOF'
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
scrape_interval: 30s
EOFCreate Prometheus systemd service:
sudo tee /etc/systemd/system/prometheus.service > /dev/null << 'EOF'
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--storage.tsdb.retention.time=200h \
--web.listen-address=0.0.0.0:9090 \
--web.external-url=
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
[Install]
WantedBy=multi-user.target
EOFStart Prometheus service:
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
sudo systemctl status prometheusVerify Prometheus is running:
# Check service status
sudo systemctl status prometheus
# Test web interface
curl http://localhost:9090
# Check targets (should show prometheus and node-exporter if installed locally)
curl http://localhost:9090/api/v1/targetsObjective: Install Node Exporter on the server you want to monitor.
# Download Node Exporter
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
# Extract
tar -xvf node_exporter-1.6.1.linux-amd64.tar.gz
# Install binary
sudo mv node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter
# Cleanup
rm -rf node_exporter-1.6.1.linux-amd64*Create Node Exporter systemd service:
sudo tee /etc/systemd/system/node_exporter.service > /dev/null << 'EOF'
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--web.listen-address=:9100 \
--web.telemetry-path=/metrics \
--collector.systemd \
--collector.processes \
--collector.cpu \
--collector.meminfo \
--collector.diskstats \
--collector.filesystem \
--collector.loadavg \
--collector.netstat \
--collector.stat \
--collector.interrupts \
--collector.ksmd
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
[Install]
WantedBy=multi-user.target
EOFCreate node_exporter user and start service:
sudo useradd --no-create-home --shell /bin/false node_exporter
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
sudo systemctl status node_exporterVerify Node Exporter is working:
# Check service status
sudo systemctl status node_exporter
# Test metrics endpoint
curl http://localhost:9100/metrics | head -20
# Check listening port
sudo netstat -tlnp | grep :9100Objective: Update Prometheus configuration to collect metrics from Node Exporter.
Update the Prometheus configuration file:
sudo nano /etc/prometheus/prometheus.ymlEnhanced configuration example:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['TARGET_SERVER_IP:9100'] # Replace with your target server IP
scrape_interval: 30s
metrics_path: /metrics
params:
format: ['prometheus']
# Optional: Monitor multiple servers
- job_name: 'web-servers'
static_configs:
- targets: ['web1.example.com:9100', 'web2.example.com:9100']Create alert rules file:
sudo tee /etc/prometheus/alert_rules.yml > /dev/null << 'EOF'
groups:
- name: server_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is {{ $value }}% on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage"
description: "Memory usage is {{ $value }}% on {{ $labels.instance }}"
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space"
description: "Disk usage is {{ $value }}% on {{ $labels.instance }}"
EOFRestart Prometheus to apply changes:
sudo systemctl restart prometheus
sudo systemctl status prometheusObjective: Confirm that Prometheus is successfully collecting metrics from Node Exporter.
Access Prometheus web interface:
- Open web browser and navigate to
http://PROMETHEUS_SERVER_IP:9090 - Go to Status → Targets to verify Node Exporter is listed as "UP"
- Check that the endpoint shows
http://TARGET_SERVER_IP:9100/metrics
Objective: Query and analyze key system metrics using Prometheus Query Language (PromQL).
Basic metric queries to try:
# CPU Usage
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory Usage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
# Disk Usage
100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)
# Network Traffic
rate(node_network_receive_bytes_total[5m])
# Load Average
node_load1
# File System Information
node_filesystem_size_bytes / node_filesystem_avail_bytes
# Process Count
node_processes_running
Advanced queries with time ranges:
# CPU usage over last 5 minutes
rate(node_cpu_seconds_total[5m])
# Memory usage trend over last hour
avg_over_time(node_memory_MemAvailable_bytes[1h])
# Network I/O rate
rate(node_network_receive_bytes_total{device!="lo"}[5m])
# Disk I/O operations
rate(node_disk_io_time_seconds_total[5m])
Create custom dashboards:
- Go to Graph tab in Prometheus UI
- Enter queries and visualize metrics over time
- Use Console templates for advanced queries
Objective: Configure alerting rules for critical system conditions.
Test alert rules:
# Check if alert rules are loaded
curl http://localhost:9090/api/v1/rules
# View current alerts
curl http://localhost:9090/api/v1/alerts
# Simulate high CPU usage to test alerts
stress -c 4 # This will generate CPU loadMonitor alert status in Prometheus UI:
- Go to Alerts tab to view active alerts
- Check Status → Rules to see configured alert rules
| Problem | Symptoms | Solution |
|---|---|---|
| Node Exporter Not Accessible | curl http://localhost:9100/metrics returns connection refused |
Check service status: sudo systemctl status node_exporter, verify port 9100 is open in firewall |
| Prometheus Target Down | Target shows as "DOWN" in Prometheus UI | Verify network connectivity, check firewall rules, ensure Node Exporter is running |
| No Metrics in Prometheus | Queries return "No data" | Check Prometheus configuration, verify scrape targets, ensure Node Exporter is exporting metrics |
| High Memory Usage | Prometheus consuming too much memory | Adjust retention time in config, increase memory allocation, use remote storage |
| Service Startup Failures | Systemd service fails to start | Check logs: sudo journalctl -u prometheus -f, verify file permissions and paths |
| Configuration Errors | Prometheus fails to reload config | Test config: promtool check config /etc/prometheus/prometheus.yml, check YAML syntax |
# Check Node Exporter service logs
sudo journalctl -u node_exporter -f
# Check Prometheus service logs
sudo journalctl -u prometheus -f
# Test Node Exporter metrics endpoint
curl -s http://localhost:9100/metrics | grep -E "(node_cpu|node_memory)" | head -10
# Check Prometheus targets via API
curl -s http://localhost:9090/api/v1/targets
# Verify Prometheus configuration
curl -s http://localhost:9090/api/v1/status/config
# Check active alerts
curl -s http://localhost:9090/api/v1/alerts
# Monitor system resources
htop # Or top, free -h, df -h
# Check network connectivity
telnet TARGET_IP 9100listen tcp :9100: bind: address already in use
- Cause: Another service is using port 9100
- Solution: Change Node Exporter port or stop conflicting service
Target down
- Cause: Network connectivity issues or firewall blocking
- Solution: Check firewall:
sudo ufw allow 9100, verify target server accessibility
No data in queries
- Cause: Metrics not being collected or configuration issues
- Solution: Verify Prometheus config, check target status, ensure Node Exporter is running
-
Prerequisites Verification
evidence-01-system-info.png- OS, resources, and network verificationevidence-02-prometheus-version.png- Prometheus installation verification
-
Installation and Configuration
evidence-03-node-exporter-download.png- Node Exporter download and installationevidence-04-node-exporter-service.png- Node Exporter service statusevidence-05-prometheus-service.png- Prometheus service statusevidence-06-prometheus-config.png- Prometheus configuration file
-
Integration Verification
evidence-07-targets-status.png- Prometheus targets page showing Node Exporter as UPevidence-08-metrics-endpoint.png- Node Exporter /metrics endpoint outputevidence-09-prometheus-ui.png- Prometheus web interface main page
-
Metrics Exploration
evidence-10-cpu-query.png- CPU usage query in Prometheusevidence-11-memory-query.png- Memory usage query in Prometheusevidence-12-disk-query.png- Disk space query in Prometheusevidence-13-network-query.png- Network traffic query in Prometheusevidence-14-custom-graph.png- Custom time-series graph
-
Advanced Features
evidence-15-alert-rules.png- Alert rules configuration (if implemented)evidence-16-alert-status.png- Active alerts status (if triggered)evidence-17-multi-target.png- Multiple targets configuration (if using multiple servers)
All screenshots should be saved in the img/ directory with descriptive names:
evidence-XX-description.png- Include terminal prompts and outputs
- Ensure text is readable and URLs/commands are visible
- Capture both successful and failed attempts (for troubleshooting evidence)
- Metrics Collection: Automated gathering of system performance data
- Time Series Data: Historical data storage and analysis
- Alerting: Proactive notification of system issues
- Dashboards: Visual representation of system health
- Service Discovery: Automatic detection of monitoring targets
- Pull-based Model: Prometheus actively scrapes metrics from targets
- TSDB (Time Series Database): Efficient storage of time-series data
- PromQL: Powerful query language for metrics analysis
- Exporters: Standardized metric collection from various systems
- Alertmanager: Handles alerting and notification routing
- Golden Signals: Latency, Traffic, Errors, Saturation
- Cardinality: Managing metric label combinations
- Retention Policies: Data lifecycle management
- High Availability: Redundant monitoring setup
- Security: Secure communication and access controls
scrape_configs:
- job_name: 'node-servers'
static_configs:
- targets: ['server1:9100', 'server2:9100', 'server3:9100']
labels:
environment: 'production'
- job_name: 'web-servers'
static_configs:
- targets: ['web1:9100', 'web2:9100']
labels:
service: 'web'scrape_configs:
- job_name: 'node-exporter'
consul_sd_configs:
- server: 'consul-server:8500'
services: ['node-exporter']remote_write:
- url: "https://remote-storage.example.com/write"
queue_config:
max_samples_per_send: 1000
remote_read:
- url: "https://remote-storage.example.com/read"groups:
- name: cpu_recording_rules
rules:
- record: cpu_usage_percent
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)- Prerequisites verified (OS, resources, network connectivity)
- Prometheus server installed and running on port 9090
- Node Exporter installed and running on port 9100
- Prometheus configured to scrape Node Exporter metrics
- Integration verified (targets showing as UP in Prometheus UI)
- Key metrics queried (CPU, memory, disk, network)
- Custom graphs created using PromQL expressions
- Prometheus web interface explored and documented
- All screenshots captured for evidence
- Troubleshooting documented (if applicable)
With system monitoring mastered, you can now:
- Grafana Integration: Add beautiful dashboards for visualization
- Alert Manager Setup: Configure email/Slack notifications for alerts
- Application Monitoring: Add custom application metrics
- Distributed Monitoring: Scale across multiple data centers
- Performance Optimization: Use metrics for capacity planning
- Compliance Reporting: Generate monitoring reports for audits
By completing this project, you have:
✅ Deployed comprehensive system monitoring using Prometheus and Node Exporter ✅ Configured automated metric collection from Linux servers ✅ Implemented alerting rules for critical system conditions ✅ Explored key system metrics and their significance ✅ Created monitoring dashboards for operational visibility ✅ Gained practical experience with production-grade monitoring tools ✅ Documented the entire process for submission and review
Congratulations on mastering Linux server monitoring with Prometheus! 🎉
This project demonstrates your ability to implement critical observability practices, making you ready for enterprise monitoring administration and DevOps observability roles.
For questions or issues, refer to the troubleshooting section or consult the official Prometheus documentation.
- Prometheus Official Documentation: https://prometheus.io/docs/
- Node Exporter Documentation: https://prometheus.io/docs/guides/node-exporter/
- PromQL Tutorial: https://prometheus.io/docs/prometheus/latest/querying/basics/
- Grafana Integration: https://grafana.com/docs/grafana/latest/datasources/prometheus/
- Monitoring Best Practices: https://sre.google/sre-book/monitoring-distributed-systems/







