Skip to content

Conversation

@sumit9165
Copy link

Day-03 DevOps learning file added it has Linux system and Network commands, user management and group management commands added with description and example. Please check.

Document learning plan for DevOps skills and goals.
Linux architecture Notes
Added comprehensive notes on Linux operating system, covering its components, distributions, installation methods, process states, and commonly used commands.
Refactor and clarify sections on Linux OS components, distributions, and process states.
Added a comprehensive cheatsheet for Linux commands, including usage and examples for various commands related to file management, user management, and system information.
Updated the Linux commands section and improved formatting.
Removed redundant 'Goals -' and 'Core DevOps Skills -' prefixes for clarity.
Updated current level description for clarity.
Documented various Linux process management commands including ps, top, and kill with examples and descriptions.
Updated Linux practice document with detailed command descriptions and examples for process management, memory usage, and disk space.
Added system management and logging commands to the Linux practice guide.
Added detailed troubleshooting steps for Linux server performance issues, including service/process checks, resource snapshots, and immediate actions.
Added detailed troubleshooting steps and observations for the OpenSSH Daemon (sshd).
Added a comprehensive troubleshooting runbook for Docker incidents, including steps for CPU and disk pressure investigations, immediate containment actions, and escalation triggers.
Documented a comprehensive troubleshooting runbook for nginx failures, including steps for simulating, diagnosing, and resolving issues.
Added a comprehensive NGINX troubleshooting runbook covering incident types, simulated failures, environment checks, and recovery steps.
Added a comprehensive runbook for troubleshooting database failures, including steps for PostgreSQL and nginx rate limiting scenarios.
Added a markdown file with mock SRE interview questions and model answers.
Added a comprehensive Kubernetes troubleshooting runbook covering pod crashloop, Redis caching failures, and DNS outages.
Added a comprehensive on-call runbook template covering various operational aspects including CPU, memory, disk, network, and common failure patterns.
Added a visual cheat sheet decision tree for on-call incidents, detailing steps for diagnosing and resolving issues.
This cheat sheet provides a quick reference for emergency on-call procedures, including system checks, logs, common containment actions, and troubleshooting steps for various services.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant