Setting Up DVC

Data Version Control serves two purposes for our framework.

  1. It allows us to track large data files that cannot be uploaded to GitHub using the dvc.lock file.
  2. It gives structure to our project by properly defining a pipeline with inputs, outputs, parameters, metrics and visualizations.

Installation

DVC should be installed like any other package by running:

pdm add dvc

Initialization

From here, DVC can be initialized by running:

pdm run dvc init

This creates a .dvc directory with a few internal files. Add these changes to Git with a commit.