Microsoft SBOMTool Overview

This page is an overview of the microsoft/sbom-tool highlighting our findings from knowledge graph analysis on the previous pages.

Let’s break this up by section of the generated SBOM to highlight what its good at, and what it isn’t.

We transformed the SBOM files generated by this tool into RDF format, enabling us to conduct analyses from a knowledge graph perspective for the following case studies:

Overview

This tool generates SBOMs following the SPDX-2.2 specification and saves them in JSON format. In it it identifies files, packages, relationships and project metadata. Project metadata includes project name, organization name, and project version which are all provided as command line arguments upon execution.

The sbom-tool is a command line tool that has options to either generate SBOMs, or validate SBOMs. In our experiments we’ve only focused on generation. We run the tool using this command:

sbom-tool generate -b <drop path> -bc <build components path> -pn <package name> -pv <package version> -ps <package supplier>

The drop path is the folder where all the files to be shipped are located and build components path is the source folder. Besides that the generation requires a number of command line arguments including, package name, version, and supplier organization.

The sbom-tool is capable of detecting packages within a source folder. However, it is unable to scan packages within container images.

Files

The sbom-tool generates a list of all files within a parent directory recursively. For each file it identifies the path (from the parent directory) and calculates the SHA256 and SHA1 checksums as a unique file identifier.

The sbom-tool does not identify individual file licenses nor file authors despite the SPDX schema providing identifiers for these.

The list of generated files includes all files within a parent directory including files labeled in the .gitignore. This means all files present in a repository including those not for a package distribution and files for local python virtual environments will be identified by the sbom-tool.

Packages

The sbom-tool uses microsoft/component-detection to detect components and dependencies of a repository. It supports many open-source package ecosystems, such as CocoaPods, Linux, Gradle, Go, Maven, NPM, NuGet, Pip, Poetry, Ruby, and Rust. Since most of our experiments have been with python (Pip) based projects, further descriptions will refer to that.

The sbom-tool requires either a requirements.txt or setup.py file to identify project dependencies. In it it can detect both packages using PyPI or from github (using the git: identifier). Dependencies of each package are found with PyPI where a depth first search approach is used to find all transitive dependencies.

Our studies show the sbom-tool skips all python stdlib packages and does not specify the python version used in the project.

Further in the case of non python subdependencies, the sbom-tool fails to identify these. For example, numpy uses C/C++ based linear algebra libraries that are not detected by the sbom-tool.

Finally, the sbom-tool does not identify license and distribution information for each package, which is readily available on PyPI.

Relationships

The sbom-tool also identifies some relationships (specified by the SPDX standard) in the project. However, these relationships are limited to the type DEPENDS_ON where the RootPackage DEPENDS_ON some-package.

There is opportunity to identify many other types of relationships under the SPDX standard including but not limited to the dependency tree (direct dependencies vs transitive dependencies), which files use which packages, authors to files, etc.