Skip to content

Storing and Retrieving Knowledge From Code

Peter Kofler edited this page Jun 23, 2020 · 4 revisions

Code is a very precise way to store knowledge, because it has to run on a computer.

What can we store and retrieve from code?

As discussed previously, an easy way to perform this analysis is by conceptualizing knowledge stored as "how many types of questions can be answered by looking at the code?". We are interested in this analysis since often times we need to transfer the knowledge from code into a human brain in order to perform changes.

The code contains all the necessary knowledge about a specific algorithm, and about the way it should run (eg. configuration files). Therefore the code can answer, slower or faster, questions such as:

  • how does this specific behavior work?
  • what are the inputs, outputs, and configurations that allow it to work?

The code cannot answer questions such as:

  • why we decided to implement this behavior with this particular code structure?
  • why did the code evolve in this direction?

How do we store knowledge in code?

Knowledge is stored in code in a few ways:

  • Names
  • Sequence of instructions
  • Meta text: white space, punctuation
  • Layout and symmetry (e.g. if/else versus guard clause)
  • Grouping: order of declarations, blocks, scopes
  • Relationships: delegate, inherit etc.
  • Code Comments

Unfortunately, every one of these means can be done in a way that obfuscates knowledge. Names can be meaningless or misleading, the sequences of instructions can be garbled in various ways, the meta text can be misleading, and the relationships can be random.

Aside: this description seems to mean that we can store knowledge in a way that introduces a lot of entropy or in ways that reduce entropy, and we should look for ways to reduce information entropy

Tests as a way to store knowledge

Tests are code, but they are special since they exercise the production code. Due to this characteristic, tests can offer answers to types of questions typically difficult to answer by just code by showcasing usage examples:

  • How can I use an existing object, method, or function?
  • What sequence of calls makes sense to implement a behavior?

While knowledge stored in tests is correct, it is almost never complete. It can also be obfuscated by the same means that obfuscate code: poor naming, exaggerated length etc.

How can we optimize knowledge retrieval from code?

TBD

Clone this wiki locally