Setting Up DVC
Data Version Control serves two purposes for our framework.
- It allows us to track large data files that cannot be uploaded to GitHub using the
dvc.lock
file. - It gives structure to our project by properly defining a pipeline with inputs, outputs, parameters, metrics and visualizations.
Installation
DVC should be installed like any other package by running:
pdm add dvc
Initialization
From here, DVC can be initialized by running:
pdm run dvc init
This creates a .dvc
directory with a few internal files. Add these changes to Git with a commit.