This project features an implementation of an NLP pipeline for the disaster tweets Kaggle competition using the Brane framework. The implementation is divided into the following Brane packages which can be imported individually and used in other workflows: compute and visualization.
computeexposes utilities for preprocessing data, training a classifier, and generating a valid submission file for the challenge.visualizationprovides functions to generate plots and charts based on the dataset.
We also include a github.yml specification which defines an OpenAPI container that exposes a function to download arbitrary files from GitHub repositories.
Each package can be individually imported with the following command:
brane package import -c epi-project/brane-disaster-tweets-example packages/<PACKAGE_NAME>/container.ymlHowever, we also provide a shell script for convenience. The user can clone the repository and simply run ./build-package.sh to build all of our packages. Additionally, you also can run the following commands to build a specific package.
# build the computation package
./build-package.sh compute
# build the visualization package
./build-package.sh visualizationOf course, you can always navigate to the package directory and run the following command to build the brane package.
brane package build container.ymlBesides packages, we also need to build the datasets used by the workflow. This can be done using the included ./build-data.sh script to build the training and testing dataset.
# For the training dataset
brane data build ./data/train/data.yml
# For the testing dataset
brane data build ./data/test/data.ymlOur pipeline implementation can be executed locally by simply running the following command in the root folder of the project:
brane workflow run pipeline.bsThe following picture shows an example that our package uses the pipeline.bs to run the whole pipeline in the Kubernetes cluster.

This repository is the up-to-date version of the work of Andrea Marino and Jingye Wang, with the aim to implement exactly the same as they have done for a newer version of the framework. Their original repository can be found here.