Skip to content

User Guide

WARP stands for Workspace Allowing Reproducible Pipelines. This page will serve as a reference for how to use WARP in practice. It contains a tutorial.


Introductory Tutorial

In this tutorial, we will create a minimal pipeline that will guide you through the functionality of WARP. The pipeline itself is fairly abstract, intending to reflect structural details one might encounter in a machine learning workflow rather than the implementation minutia.

The basic idea of WARP is to consider your pipeline as a Directed Acyclic Graph (DAG) in which edges represent pieces of data and vertices represent functions that operate on those pieces of data.

There are three core classes you need to understand in order to use WARP:

  1. Pipe: The core functional unit of WARP -- a chunk of code that takes in pieces of data and outputs (new) pieces of data. Pipes behave like relatively self-contained functions that explicitly declare what data they depend on and what data they produce, thereby implicitly defining a DAG. The user will wrap their pipeline functionality with these pipe classes.

  2. Graph: The data structure that formally connects pipes together into a DAG. The user will rarely interact directly with this object beyond instantiating it and passing it to Workspace.

  3. Workspace: An API for the user to interact with the graph that keeps track of metadata and data provenance.

Using Pipe

Abstract

This section demonstrates the following concepts:

  • Source pipes -- how to ingest external data artifacts.
  • Pipe subclasses -- how to define data processing functionality.
  • The @dependencies decorator -- how to declare which pieces of data the pipe ingests.
  • The @produces decorator -- how to declare which pieces of data the pipe outputs for ingestion by other pipes.
  • TODO

In this section, we will create two pipes. The full pipeline used in this tutorial contains additional pipes, which you can find [TODO].

The first thing you must do is encapsulate your pipeline data processing code into a Pipe subclass.

Important

You can only declare one Pipe subclass per file.

Example

In this example, we create a Pipe subclass that will ingest a dataset and output a new preprocessed dataset. The next example creates a downstream pipe that relies on this one.

TODO

Advanced Tutorial