Source repository for the TTS Research Technology Guides -- a collection of resources, documentation, and asynchronous tutorials to advance computational research across disciplines. Developed and maintained by the Research Technology (RT) team within Tufts Technology Services (TTS) at Tufts University.
Important
See the published website for content. The following is a development guide intended for contributors. Feedback, suggestions, and error reports can be submitted via tts-research@tufts.edu or by creating an issue.
- Prerequisites
- Initial Setup
- Repository Overview
- Utility Scripts
- Structure Configuration
- Content Types
- Style Guidelines
- Accessibility
- Subject Tags
- Contribution Workflow
- Publishing Workflow
An installation of a Python distribution that supports virtual environments and can handle Conda environment specification YAML files is required. Miniforge is strongly recommended but other similar distributions like Anaconda are acceptable. It is assumed that a conda-enabled console is used to run all the following commands.
The following includes commands from the GitHub CLI (command-line interface), which can be used as a substitute to some git commands and browser-based GitHub workflows like creating or merging pull requests (PRs). Once installed, run the following command for initial setup. Authentication via SSH (secure shell) key is strongly recommended over other alternatives.
gh auth login
The majority of the content within the guides is written using MyST (Markedly Structured Text) Markdown, which is an extension of the popular CommonMark Markdown specification and heavily inspired by R Markdown. MyST Markdown contains several special directives that most Markdown editors are unable to properly format or preview. Hence the use of Visual Studio Code along with the MyST-Markdown extension is strongly recommended. Alternatively, the JupyterLab distribution included in the project environment is equipped with an extension that adds MyST Markdown support and can be used to edit and preview MyST Markdown files instead.
Clone the repository using the GitHub CLI or some other equivalent method.
gh repo clone tuftsrt/guides
Change into the directory containing the cloned repository.
cd guides
Create the required Python environment using the provided YAML configuration file. (The command provided uses the faster mamba package manager included in Miniforge, but the traditional conda package manager can be used as well.)
mamba env create --file environment.yaml --yes
Remember to activate the environment before proceeding. (In certain command line interfaces, the mamba command can also be used for environment activation.)
conda activate guides
It is assumed that all of the following commands are run from within the guides environment. Make sure to activate the environment whenever returning to work with the repository.
Select pre-commit hooks are used to ensure consistent style and formatting across all files, fix simple errors like common misspellings, and filter out content that should not be versioned like Jupyter Notebook cell outputs. These hooks are automatically run on all pull requests (PRs) and merging is not possible until all automated checks pass. The PR pre-commit process is able to automatically fix most minor issues that might prevent merging, and hence the use and configuration of pre-commit hooks in the local repository is not required. However, the local use of pre-commit is strongly recommended as the this ensures feature branches are clean and follow a consistent style, and are are easier to work with. The pre-commit library is included in the development environment and can be set up as follows.
Set up all the pre-commit hooks specified in the repository configuration. Run this from the repository root.
pre-commit install
Test the installation by running the pre-commit hooks against all files in the repository.
pre-commit run --all-files
Now the hooks will run before every commit attempt. If any hooks fail or perform automatic fixes, the commit attempt is interrupted and a new commit attempt must be made.
The repository contains the following permanent (non-deletable) branches.
announcement-- contains an HTML file used to specify the content of the announcement bannerdevelop-- receiver of development PRs and the source for the development version of the built websitegh-pages-- receiver of build artifacts and deployment source of the build websitemain-- source for the published version of the built websiteswitcher-- contains a JSON file used to configure the version switcher dropdownstaging-- staging area for content to be pushed to themainbranch and published
All pushes to the gh-pages branch are automated, and developers should never interact with this branch directly. Pushes to the develop and main branches are only allowed via pull requests (PRs), which automatically trigger a new build of the website. The announcement and switcher branches accept direct pushes and do not trigger a rebuild of the website. However, the content of these branches directly affect the published website immediately after push. Therefore, any changes to the announcement and switcher branches should be made with care and the proper functioning of the website manually verified after every push.
The repository may also contain the following non-permanent (deletable) branches.
hotfix-- used for critical changes pushed directly tomainand then propagated elsewherepre-commit-ci-update-config-- automated updates to the pre-commit configuration file
Updates to pre-commit-ci-update-config are automated and developers should not directly interact with this branch.
All other branches in this repository are feature branches forked from the develop branch. These are intended for the active development of a specific feature and all development activity should be confined to these branches. The term feature here could refer to any of the following.
- updates to a specific section of the website
- development of a new section of the website
- changes to configuration files or templates
- development of internal extensions or utility scripts
- updates to the README or other internal documentation
Note that the list above is not exhaustive. Feature branches should be named descriptively (so that it is immediately clear what kind of active development is taking place in the branch) but concisely (not exceeding a soft limit of roughly 32 characters). All branch names should only contain lowercase letters, hyphens (-), and numbers. Do not use more than one hyphen in a row. Avoid working on different sections of the website within a single feature branch.
All branches that contain source files (develop, hotfix, main, staging, and all feature branches) are structured similarly and contain the following.
.github/workflows-- configuration files for automatic GitHub workflowsenvironment.yaml-- build environment specification filesource/_ext-- internally-developed and other custom Sphinx extensionssource/_static/css-- cascading style sheet (CSS) files used to override default stylingsource/_static-- static HTML content like the website logo and faviconsource/_templates-- custom Jinja templates including new templates and default overridessource/_toc.yaml-- Sphinx External ToC site map configuration filesource/404.md-- custom 404 page templatesource/conf.py-- Sphinx configuration filesource/index.md-- documentation/website root (default landing page)utils-- various internally-developed utility scripts
All other contents of source define the website structure and content, with the directory tree corresponding to the site map and files serving as content sources. Note that all files in source that are not listed above are automatically published online even if not linked to from anywhere in the content. Do not place any non-content files like utility scripts or developer-facing documentation within the source directory.
Caution
Do not include content unsuitable for version control like PDF documents or other binary files. Any non-plaintext files that are to be made available via the website should be uploaded to a designated directory on Tufts Box and made sharable via URL. Either the viewing or direct download URL for the file should be included on the website as a link. Remember to require a Tufts login and disable the download option for any resources that cannot be shared outside of the Tufts University community.
All file and directory names should be URL-friendly and hence only consist of lowercase letters, hyphens (-), and numbers. No more than one hyphen may be used in a row and file names should not exceed 32 characters (excluding extension). Prefixes consisting of periods (.) or underscores (_) are allowed to denote special files and any numerical prefixes should be formatted to two digits. Underscores (_) should be used instead of hyphens when naming Python (py) scripts. These rules are enforced via the AutoSlug pre-commit hook with automatic fixes applied whenever possible. Any files that do not have an extension recognized by AutoSlug are treated as directories. File extensions are case-sensitive and care should be taken to ensure that the extensions of R scripts (R) and R Markdown documents (Rmd) are properly capitalized. See below for examples of appropriate names.
_special-file.yaml
.hidden-directory
01-file-with-numeric-prefix.rst
python_script.py
python-notebook.ipynb
r-markdown-file.Rmd
this-is-a-directory
this-is-a-file.md
Any images intended for inclusion in the source files should be placed in an img directory within the same directory as the corresponding source file. Images should be in PNG or SVG format and resized to the desired dimensions whenever possible. All references to images within source files should be relative. Do not include references to any externally hosted images.
The build process generates the following git-ignored directories that should not be manually modified but are safe to remove.
build-- all build artifactsjupyter_execute-- executed Jupyter Notebooks derived from source filessource/tags-- automatically generated source files for the tags index
Utility scripts are available for various console environments and can be found in the corresponding directory within the utils directory.
utils/bash-- shell scripts (sh) prefixed with#!/bin/bashand intended to be run via Bashutils/cmd-- Windows Command Prompt (cmd) scripts intended for Miniforge Prompt usersutils/pwsh-- PowerShell (ps1) scripts intended for users of cross-platform PowerShell (version 7.X or higher)
All utility scripts are written such that they can be run from anywhere within the repository without errors. (The script uses git commands to derive the repository root and all paths in the script are relative to the repository root.) The following utility scripts are provided for all platforms.
autobuild-- automatic self-updating preview build of website hosted on a local serverbuild-- one-time build of static HTML filesclean-autobuild-- runscleanand thenautobuildclean-build-- runscleanand thenbuildclean-- removal of all build artifacts (needed to ensure a clean build)
New utility scripts should follow the example of existing scripts and be executable without errors from anywhere within the repository. Bash scripts should be developed first and analogous courtesy scripts for Command Prompt and PowerShell users provided when possible. Utility scripts should exit with zero for success and an appropriate positive exit code for failure.
The autobuild utility script uses sphinx-autobuild to display an automatically self-updating preview build of the website by running the following command. (The variable $root refers to the repository root.)
sphinx-autobuild --nitpicky --ignore "$root/source/tags" -- "$root/source" "$root/build"The live preview automatically updates whenever chances to source files are detected and can be accessed via 127.0.0.1:8000 (localhost port number 8000). Note that changes to configuration and template files might not be detected and usually require a clean build to be properly displayed.
The build utility script uses the standard sphinx-build command to build static HTML files that make up the website by running the following command. (The variable $root refers to the repository root.)
sphinx-build --nitpicky "$root/source" "$root/build"The generated HTML files can be previewed by directly opening the landing page (build/index.html) or any other desired page using a web browser. Note that some functionality that requires a web server might not work properly when previewing static HTML files.
The clean utility script attempts to delete all build artifacts. The usual build process first checks for build artifacts and then only rebuilds the pages where a change to the source file is detected. This means that changes that affect several pages like modifications to the configuration, table of contents, style sheets, or templates might not be accurately reflected when preexisting build artifacts are detected. Hence it is strongly recommended to run clean before autobuild or build whenever making modifications that are not confined to specific source files. Note that the clean-autobuild and clean-build scripts can be used instead of manually running clean before the desired build script.
Documentation structure is managed using the Sphinx External ToC (table of contents) extension with the _toc.yaml configuration file written such that the site map mimics the layout of the source directory. Content is grouped into primary sections with each section appearing in the top navigation bar and having an index file serving as the section root. Primary sections contain content pages which can be further divided into subtrees. Pages in each subtree are ordered using the natural sort order of the source file names. Content pages could also have child pages, in which case their structure resembles that of a primary section with an index file serving as the parent page.
Content pages can be added to preexisting sections, subtrees, and parent pages without having to modify the site map configuration file. Only when adding a new section, subtree, or parent page does the _toc.yaml file need to be updated. See the sample source directory tree below along with its corresponding site map configuration file for examples on how to define various structures. Note that the title field defines how the name of a primary section is displayed in the top navigation bar and the caption field defines how the name of a subtree is displayed in the ToC. Content page display names in the secondary sidebar and the ToC are equivalent to their first heading.
📂source
┣ 📄index
┗ 📂primary-section
┣ 📄index
┣ 📄01-content-page
┣ 📄02-content-page
┣ 📂10-page-with-children
┃ ┣ 📄index
┃ ┣ 📄01-child-page
┃ ┗ 📄02-child-page
┣ 📄21-content-page
┣ 📄22-content-page
┣ 📄31-subtree-page
┗ 📄32-subtree-page
root: index
subtrees:
- caption: Primary Section Display Name in ToC
entries:
- file: primary-section/index
title: Primary Section Display Name in Navigation
subtrees:
- entries:
- glob: primary-section/0*
- file: primary-section/10-page-with-children/index
entries:
- glob: primary-section/10-page-with-children/*
- glob: primary-section/2*
- caption: Section Subtree Display Name
entries:
- glob: primary-section/3*File extensions should be omitted when listing source files in the _toc.yaml file. This allows for the easy change of source file type without having to modify the structure configuration file. Use file prefixes instead of directories to create subtrees. This avoids the creation of dead URLs where an index file would usually be expected.
Content that is not code-heavy is considered narrative and can be written using either MyST Markdown or reStructuredText. Narrative content can contain code snippets but these are not executed and their output is not intended to be included. Content that is focused on code and intends to include both the code itself and its output is considered executable and can be written using a variety of notebook formats. Code snippets included in executable content are executed during the build and the outputs are automatically included in the built document. Efforts are underway to make the code included in the pages derived from executable content interactive using WebAssembly (WASM).
All content regardless of format must adhere to the following rules.
- Files must have a single title. Generally this means that files must begin with a single first-level heading and no other first-level headings should appear. (R Markdown is an exception. There the title is defined in the YAML header and hence multiple first-level headings can be used.)
- Headings must increase linearly. Subsections of sections with a first-level header should have a second-level header, the subsections of which should have a third-level header and so on. Skipping a header level will result in a build error.
Narrative content can be written using either MyST Markdown (stored in md files) or reStructuredText (stored in rst files). The use of MyST Markdown over reStructuredText is strongly encouraged due to its superior functionality and readability. Support for reStructuredText exists primarily to allow the inclusion of preexisting legacy materials.
MyST (Markedly Structured Text) Markdown is an extension of the popular CommonMark Markdown specification and is heavily inspired by R Markdown. It supports simple formatting, lists, images, tables, mathematical formulas, and code snippets like most other Markdown flavours and adds various roles and directives that allow for extra functionality like admonitions, footnotes, citations, and glossaries. Furthermore, the PyData Sphinx Theme and the Sphinx Design Extension add additional design elements and functionality like grids, cards, dropdowns, tabs, sidebars, and iconography. See below for relevant resources.
- CommonMark Markdown Reference
- CommonMark Markdown Tutorial
- MyST Markdown Cheat Sheet
- MyST Markdown Documentation
- Sphinx Design Extension Documentation
- Elements Specific to the PyData Sphinx Theme
- PyData Sphinx Theme Kitchen Sink (preview of almost all design elements)
MyST Markdown is extremely configurable and has various syntax extensions. The following have been enabled for this project.
- Typography
- Strikethrough
- Math Shortcuts
- Linkify (with fuzzy matching disabled)
- Substitutions
- Auto-Generated Header Anchors
- Definition Lists
- Task Lists
- Field Lists
- Attributes
Markdown documents may start with an optional YAML metadata header with any of the following configured.
---
tocdepth: 2 # maximum depth for the page table of contents on the right sidebar
orphan: true # must be set if page is not included in the structure configuration
no-search: true # omit the page from text-based search
html_theme.sidebar_secondary.remove: true # remove the right sidebar for the page
---Although possible, the inclusion of raw HTML within Markdown documents is strongly discouraged and should only be done to implement advanced functionality or accessibility improvements that otherwise would not be possible. All styling should be defined via CSS (cascading style sheets) and HTML should not be used to define any styling within the content file. Use the attributes extension to add HTML attributes like class or id values when applicable.
Note
README documents should be written using GitHub Flavored Markdown (GFM) instead of MyST Markdown.
reStructuredText (RST) is a plaintext markup language used primarily for technical documentation within the Python community. It is the default markup language in Sphinx and hence also supported by this project. However, the use of reStructuredText is discouraged and any new material should be written using MyST Markdown whenever possible. RST support exists primarily to allow the inclusion of preexisting legacy materials. Here are some resources to assist in understanding RST syntax and converting existing RST documents to MyST Markdown.
Executable content can be written using a variety of notebook formats described below. Executable code snippets or cells included in the notebooks are executed during the build process and any outputs are automatically included into the built document. Functionality to make the code snippets interactive and executable by the end user within the browser might be included in the future. All executable content regardless of format must adhere to the following guidelines.
- All dependencies must be included in the
guidesenvironment. Code included in the notebooks may not attempt to install any packages and all dependencies must be listed inenvironment.yamlwith pinned versions. Ensure the functionality of the environment and the success of the build process after adding or updating any dependencies. Ensure added dependencies are easily removable by adding them in a single descriptive commit or using comments withinenvironment.yamlto denote which dependencies are specific to a given notebook. - All data should be pre-downloaded and included within the
sourcedirectory. Unless specifically demonstrating the downloading of data, any data used within the notebooks should be pre-downloaded and included either within the same directory as the notebook or within a designateddatadirectory that is a child of the notebook directory. Direct download links to any included data files should be included in the notebook to allow the end user to manually download and explore the data. Data files should not exceed 50 megabytes. - Computation should be relatively fast and lightweight. Any notebook run should not exceed two minutes and use no more than two gigabytes of RAM. Note that these are upper limits -- faster and less-intensive computation is strongly encouraged. Remember that all notebooks get executed during every build and the functionality to run the code within the browser is being added. Hence this platform should be used for lightweight examples and demonstrations requiring significant computation should be hosted elsewhere.
Jupyter Notebooks (stored in ipynb files) using either the interactive Python kernel (python3) or the interactive R kernel (ir) are supported. Cell magic commands are allowed but discouraged as functionality is not guaranteed. Markdown cells can include MyST Markdown syntax and the JupyterLab installation included in the project environment is equipped with an extension that adds MyST Markdown rendering support. Hence it is recommended to use the JupyterLab installation included in the project environment to develop notebooks. JupyterLab can be launched as follows.
jupyter lab
MyST Markdown includes the functionality for text-based Jupyter notebooks via the MyST NB extension. These are written entirely in Markdown (stored in md files) and include special YAML metadata that define the computation kernel and special code cell directives that specify which code should be executed during runtime. MyST Markdown notebooks are very similar to R Markdown notebooks but support all MyST Markdown syntax. Both the interactive Python kernel (python3) and the interactive R kernel (ir) can be used. See below for resources.
R Markdown notebooks (stored in Rmd files) are supported with certain limitations. Package installations via CRAN are not allowed (all dependencies must be installed via environment.yaml) and Python is not supported. Syntax specific to R Markdown can be used but note that some functionality might be lost during the build process. R Markdown is not directly supported and thus all R Markdown documents are converted to Jupyter Notebooks using JupyText and then executed using the interactive R kernel (ir). Hence certain elements might not be rendered as expected. Also note that any functionality specific to RStudio and the knitr package is not supported as the R Markdown documents are never actually knit. See below for resources.
Caution
The properly capitalized Rmd extension must be used for R Markdown files to be correctly identified.
Style guidelines for prose are in active development and subject to change. The following are current recommendations that are not enforced.
- Use title case for all headings.
- Do not use "you" or "we" and avoid addressing the reader directly.
- Use active voice for specific instructions and passive voice otherwise.
- Define any acronyms the first time they are mentioned.
- Numbers up to ten should be written using words instead of numerals.
- Avoid duplication of effort -- link out to existing internal or external materials whenever possible.
- Prefer official resources and only link out to non-commercial publicly accessible content.
- Use reference-style links for URLs that appear multiple times within the same document.
- Use substitutions for frequently included text subject to change like emails.
- Use substitutions over reference-style links for URLs that are linked to in several different sections of the website.
Warning
Do not include any copyrighted material without permission. Material under copyright protections intended to be used for educational content under fair use should be limited to Tufts affiliates only and not be included in any publicly accessible resource.
Python code should conform to the Black style and R code should conform to the Tidyverse style. Style guidelines are enforced and automatically applied via pre-commit hooks during commits (if configured) and pull requests. Manual effort to conform to style guidelines is not needed. Style guides for other languages might be added in the future. Bash utility scripts should be committed with execute permissions.
Note
Execute permissions can be added on Windows systems via Git as follows.
git add --chmod=+x -- <file>
All content should be mindful of screen readers and color contrast guidelines. All images should have alternative text and any decorative elements be marked accordingly to be ignored by screen readers. The colors used by the PyData Sphinx Theme have been carefully selected to meet accessibly guidelines and hence should not be manually modified. Further accessibility guidelines are in active development and to be added in the future.
Tags can be defined using the tags field in the file-wide metadata. The field content must be a single string representing a space-delimited list of tags. Tags can only contain lowercase letters, numbers, and hyphens (-) with no more than one consecutive hyphen. This is enforced and improperly formatted tags will result in an extension error during the build process.
Tags can be defined in the YAML metadata header of the file as follows.
---
tags: tag tag2 another-tag
---Tags can be specified in the metadata field list at the top of the file as follows.
:tags: tag tag2 another-tagTags can be added to the notebook metadata JSON as follows. The metadata JSON can be accessed via the Property Inspector in the top-right of the JupyterLab interface (gear icon) or by opening the notebook as a text document and locating the "metadata" field (usually located after the "cells" field) in the notebook JSON.
{
"tags": "tag tag2 another-tag"
}The following contribution workflow allows various development efforts to take place simultaneously and ensures changes and new content are incorporated into the develop branch in a safe and controlled manner that reduces errors and merge conflicts.
Switch to the develop branch and ensure it is up to date.
git switch develop
git pull
Create a new feature branch from the latest development state.
git switch --create <name>
Push the new branch to the remote repository and set the upstream tracking branch.
git push --set-upstream origin <name>
If there have been any updates to the environment.yaml environment specification file, the environment should be updated as follows.
mamba env update --file environment.yaml
In some cases a full rebuild of the environment is needed. The environment can be removed and recreated as follows.
mamba env remove --name guides --yes
mamba env create --file environment.yaml --yes
Frequent commits are encouraged. The subject line of the commit message should be informative and not exceed 50 characters. Commit message bodies are not required but encouraged to provide additional detail where needed. Avoid using sentence-style capitalization and punctuation in the commit message subject line. When using pre-commit hooks, avoid passing the commit message directly via the -m flag in case checks fail and automatic fixes are applied.
Create a pull request (PR) to merge your feature branch into the develop branch. Follow the prompts to add a title and description for your PR.
gh pr create --base develop
View the status and details of your PR.
gh pr view
You can also open the PR on the GitHub website.
gh pr view --web
A manual review is not required when merging to the develop branch and the pull request can be merged when all automated status checks pass. Run the following to have this automatically happen. Manual intervention is needed when status checks fail and automatic fixes are not possible. Ask for help if needed.
gh pr merge --auto
The merging of the PR triggers a build and deployment of the development version of the website. Note that this could take several minutes. Once the updated website has been deployed, make sure to navigate to the development version of the website and ensure all changes are reflected as expected.
Once the PR has been successfully merged, the branch should be deleted both locally and on GitHub. This can be done as follows.
git push -d origin <name>
git branch -d <name>
The following workflow incorporates updates from the develop branch into the published website. This is automatically triggered every week (if updates are detected) but can be manually triggered as follows anytime when updates to the published site are needed.
Switch to the staging branch to prepare for publishing.
git switch staging
git pull
Merge the latest version of the develop branch into staging to incorporate the changes.
git fetch origin
git merge origin/develop
Thoroughly review all changes and fix any uncovered issues. Request fixes or explanations from original committers when needed.
Create a pull request (PR) to merge the staging branch into the main branch.
gh pr create --base main
Merges to main require approval and a manual review. Adding a comment tagging a reviewer to grab their attention is encouraged. Comments can be added via the GitHub CLI as follows.
gh pr comment
Comments can also be added using the web interface which can be easily accessed as follows.
gh pr view --web
The PR can be merged once it is manually approved and all checks pass. Run the following to have this automatically happen. Merging of the PR will trigger the publishing workflow and result in a new build of the public website.
gh pr merge --auto