Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# https://circleci.com/docs/2.0/circleci-images/#python
# We may as well use the same image we use for actually deploying our sites.
FROM cimg/python:3.9.2

# Dependencies
RUN pip3 install mkdocs pymdown-extensions pygments

# Install the PagerDuty theme.
WORKDIR /tmp
RUN git clone https://github.com/pagerduty/mkdocs-theme-pagerduty \
&& cd mkdocs-theme-pagerduty \
&& python3 setup.py install

# Set our working directory and user
WORKDIR /docs
RUN sudo useradd -m --uid 1000 mkdocs
USER mkdocs

# Expose MkDocs server
EXPOSE 8000

# Start the local MkDocs server.
ENTRYPOINT ["mkdocs"]
CMD ["serve", "--dev-addr=0.0.0.0:8000"]
30 changes: 22 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,33 @@
# Elimu Informatics Incident Response Documentation
This is a public version of the Incident Response process used at Elimu Informatics. It is also used to prepare new employees for on-call responsibilities, and provides information not only on preparing for an incident, but also what to do during and after. See the [about page](docs/about.md) for more information on what this documentation is and why it exists.

You can view the documentation [directly](/docs/index.md) in this repository, or rendered as a website at https://response.elimuinformatics.com.
You can view the documentation [directly](docs/index.md) in this repository, or rendered as a website at https://response.elimuinformatics.com.

[![Elimu Informatics Incident Response Documentation](screenshot.png)](https://response.elimuinformatics.com)

## Development
We use [MkDocs](http://www.mkdocs.org/) to create a static site from this repository. For local development,
We use [MkDocs](https://www.mkdocs.org/) to create a static site from this repository.

1. Install v0.1.0 of [MkDocs Bootswatch](https://github.com/mkdocs/mkdocs-bootswatch) `pip install mkdocs-bootswatch==0.1.0`
1. Install v0.1.1 of [MkDocs Bootstrap](https://github.com/mkdocs/mkdocs-bootstrap) `pip install mkdocs-bootstrap==0.1.1`
1. Install v0.15.3 of [MkDocs](http://www.mkdocs.org/#installation). `pip install mkdocs==0.15.3`
1. Install v0.2.4 of the [MkDocs Material theme](https://github.com/squidfunk/mkdocs-material). `pip install mkdocs-material==0.2.4`
### Native
For local development on your native device,

1. Install [MkDocs](https://www.mkdocs.org/user-guide/installation/). `pip install mkdocs`
1. Install [MkDocs PyMdown Extensions](https://squidfunk.github.io/mkdocs-material/extensions/pymdown/). `pip install pymdown-extensions`
1. Install [Pygments](https://pygments.org/) if you want syntax highlighting for any code examples. `pip install pygments`
1. Install the [PagerDuty MkDocs Theme](https://github.com/pagerduty/mkdocs-theme-pagerduty).
1. `git clone https://github.com/pagerduty/mkdocs-theme-pagerduty`
1. `cd mkdocs-theme-pagerduty & python3 setup.py install`
1. To test locally, run `mkdocs serve` from the project directory.
1. You can now view the website in your browser at `http://127.0.0.1:8000`. The site will automatically update as you edit the code.

### Docker
For local development using Docker,

1. Build the docker image and load it for immediate use. `docker build --load -t mkdocs .`
1. Run the container and pass through your current working directory. `docker run -v $(pwd):/docs -p 127.0.0.1:8000:8000 mkdocs`
1. You can now view the website in your browser at `http://127.0.0.1:8000`. The site will automatically update as you edit the code.

_Note: If you're using an Apple Silicon device, add `--platform linux/arm64/v8` to the `docker build` command to get a native Apple Silicon image. That will work faster than translating an arm64 image._

## Deploying
1. Run `mkdocs build --clean` to produce the static site for upload.
Expand All @@ -25,14 +39,14 @@ We use [MkDocs](http://www.mkdocs.org/) to create a static site from this reposi
--delete

## License
[Apache 2](http://www.apache.org/licenses/LICENSE-2.0) (See [LICENSE](LICENSE) file)
[Apache 2](https://www.apache.org/licenses/LICENSE-2.0) (See [LICENSE](LICENSE) file)

## Contributing
Thank you for considering contributing! If you have any questions, just ask - or submit your issue or pull request anyway. The worst that can happen is we'll politely ask you to change something. We appreciate all friendly contributions.

Here is our preferred process for submitting a pull request,

1. Fork it ( https://github.com/PagerDuty/incident-response-docs/fork )
1. Fork it ( https://github.com/elimuinformatics/incident-response-docs/fork )
1. Create your feature branch (`git checkout -b my-new-feature`)
1. Commit your changes (`git commit -am 'Add some feature'`)
1. Push to the branch (`git push origin my-new-feature`)
Expand Down
22 changes: 0 additions & 22 deletions config/lamba_edge.js

This file was deleted.

5 changes: 3 additions & 2 deletions docs/about.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
cover: assets/img/covers/incident_response_docs.png
hero: assets/img/headers/iStock-1097331490-3992x2242-e4f3f2d.png
hero_alt_text: Elimu Informatics
---

This site documents parts of the Elimu Informatics Incident Response process. It is a cut-down version of our internal documentation, used at Elimu Informatics for any major incidents, and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident, but also what to do during and after.

Few companies seem to talk about their internal processes for dealing with major incidents. We would like to change that by opening up our documentation to the community, in the hopes that it proves useful to others who may want to formalize their own processes. Additionally, it provides an opportunity for others to suggest improvements, which ends up helping everyone.
Expand All @@ -20,7 +21,7 @@ Incident response is something you hope to never need, but when you do, you want

## What is covered?

Anything from preparing to [go on-call](/oncall/being_oncall.md), definitions of [severities](/before/severity_levels.md), incident [call etiquette](/before/call_etiquette.md), all the way to how to run a [post-mortem](/after/post_mortem_process.md), and providing our [post-mortem template](/after/post_mortem_template.md). We even include our [security incident response process](/during/security_incident_response.md).
Anything from preparing to [go on-call](oncall/being_oncall.md), definitions of [severities](before/severity_levels.md), incident [call etiquette](before/call_etiquette.md), all the way to how to run a [postmortem](after/post_mortem_process.md), and providing our [postmortem template](after/post_mortem_template.md). We even include our [security incident response process](during/security_incident_response.md).

## What is missing?

Expand Down
30 changes: 15 additions & 15 deletions docs/after/after_an_incident.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
cover: assets/img/covers/resolved.png
description: Information on what to do after a major incident. Our followup and after action review procedures.
---
Information on what to do after a major incident. Our followup and after action review procedures.
Information on what to do after a major incident. Our follow-up and after action review procedures.

## Followup Actions for Response Roles
In addition to any direct followup items generated from an incident, each of our response roles will have a few standard followup tasks. These are generally lightweight actions that ensure we organize information and followup with customers appropriately.
## Follow-up Actions for Response Roles
In addition to any direct follow-up items generated from an incident, each of our response roles will have a few standard follow-up tasks. These are generally lightweight actions that ensure we organize information and followup with customers appropriately.

### Steps for Incident Commander

Expand All @@ -14,42 +14,42 @@ In addition to any direct followup items generated from an incident, each of our
* Set the final severity of the incident.
* Resolve the incident.

1. Create the post-mortem, and assign an owner to the post-mortem for the incident.
1. Create the postmortem, and assign an owner to the postmortem for the incident.

1. Send out an internal email to the relevant stakeholders explaining that we had an incident, provide a link to the post-mortem.
1. Send out an internal email to the relevant stakeholders explaining that we had an incident, provide a link to the postmortem.

1. Occasionally check on the progress of the post-mortem to ensure that it is completed within the desired time frame.
1. Occasionally check on the progress of the postmortem to ensure that it is completed within the desired time frame.

### Steps for Deputy
There are no additional steps after an incident is resolved. However the IC may ask for your help with their steps.
There are no additional steps after an incident is resolved. However, the IC may ask for your help with their steps.

### Steps for Scribe

1. Review the chat communications and extract any relevant items from key events.

1. Collect all `TODO` items and add them to the post-mortem.
1. Collect all `TODO` items and add them to the postmortem.

### Steps for Subject Matter Experts
### Steps for Subject Matter Experts

1. Add any notes you think are relevant to the post-mortem.
1. Add any notes you think are relevant to the postmortem.

### Steps for Customer Liaison

1. Reply to any customer enquiries we received about the incident.

1. Follow the post-mortem progress, and update our status page with the external message once it is available.
1. Follow the postmortem progress, and update our status page with the external message once it is available.

### Steps for Internal Liaison
There are no additional steps after an incident is resolved. However the IC may ask for your help with answering questions from internal stakeholders.

## Reviewing the Incident
It's important that we review the incident in detail to see exactly what went wrong, why it went wrong, and what we can do to make sure it doesn't happen again. These take many names; after-action reviews, incident review, followup review, etc. We use the term post-mortem.
It's important that we review the incident in detail to see exactly what went wrong, why it went wrong, and what we can do to make sure it doesn't happen again. These take many names; after-action reviews, incident review, follow-up review, etc. We use the term postmortem.

You can read all about our [post-mortem process](post_mortem_process.md), which goes over this in more detail.
You can read all about our [postmortem process](post_mortem_process.md), which goes over this in more detail.

## Reviewing the Process
As well as reviewing the incident, it's important to review our process. Did we handle the incident well, or are there things we could have done better?

This review isn't very formal yet, and typically involves a few of the incident commanders getting together to discuss how we might have done things differently, or if there are any tweaks we can make to our incident response process.
This review isn't very formal yet, and typically involves a few of the Incident Commanders getting together to discuss how we might have done things differently, or if there are any tweaks we can make to our incident response process.

If you're interested in joining these meetings, just let one of the incident commanders know and we'll be sure to invite you.
If you're interested in joining these meetings, just let one of the Incident Commanders know and we'll be sure to invite you.
Loading
Loading