Let’s look at an incredibly common and well developed python data science package numpy to analyze how accurate and complete our SBOM generation tools are.
Known Dependencies
According to numpy’s installation website: > “NumPy doesn’t depend on any other Python packages, however, it does depend on an accelerated linear algebra library - typically Intel MKL or OpenBLAS. Users don’t have to worry about installing those (they’re automatically included in all NumPy install methods).”
To test this let’s pretend we are creating an empty conda environmennt with just numpy to see what packages will be installed
Looking at the output above, we see a number of dependencies. Most of these are required for python (which is required for numpy) But some are those C/C++ libraries needed for linear algebra. So, how will a generated SBOM reflect these?
microsoft/sbom-tool
The sbom-tool queries PyPi for all package dependencies. Since numpy is uploaded to PyPi, we can query it directly before running the tool on the repo:
PyPi shows no deps for numpy, which makes sense as there are no python dependencies. This means when using the sbom-tool on a pacakge that depends on numpy, no child dependencies will be found for numpy.
Running the tool against the repo cloned as is produced 0 packages. As established the sbom-tool uses microsoft/component-detection to detect all packages used in the repository. This tool looks for either setup.py or requirements.txt.
In the numpy repo there are:
setup.py
build_requirements.txt
doc_requirements.txt
linter_requirements.txt
release_requirements.txt
test_requirments.txt
Github’s Dependency Graph
Now let’s examine the dependency graph from github.
This can be obtained by clicking insights > dependency graph > export SBOM
kg = kglab.KnowledgeGraph()kg.load_rdf("../../sboms/rdf/numpy.rdf.xml", format="xml")
<kglab.kglab.KnowledgeGraph>
packages = get_package_data(kg)packages
package
annotations
attributionTexts
checksums
downloadLocation
externalRefs
hasFiles
licenseConcluded
licenseDeclared
licenseInfoFromFiles
name
supplier
versionInfo
relationships
0
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N726dcd71a340470685ef364943e7b132
spdx:noassertion
spdx:noassertion
actions:actions/setup-python
NOASSERTION
57ded4d7d5e986d7296eab16560982c6dd7c923b
1
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
spdx:noassertion
spdx:noassertion
pip:nose
NOASSERTION
2
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N0a62d1ba8e694d50af92ec1f9089922c
spdx:noassertion
spdx:noassertion
actions:github/codeql-action/init
NOASSERTION
83f0fe6c4988d98a455712a27f0255212bba9bd4
3
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
spdx:noassertion
spdx:noassertion
pip:sphinx
NOASSERTION
>= 4.5.0
4
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N1e74fb5265634c60913b6f514f3f46df
<http://spdx.org/licenses/MIT>
spdx:noassertion
pip:wheel
NOASSERTION
0.38.1
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
163
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N8401a35c750842c0b8efe7a106016764
spdx:noassertion
spdx:noassertion
actions:actions/checkout
NOASSERTION
3
164
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N26c1f9259f6049029979af6cbb6f9439
spdx:noassertion
spdx:noassertion
actions:pypa/cibuildwheel
NOASSERTION
5e15bb25b428e1bf2daf2215f173d2b40135f56f
165
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N22049996a4bb4a9487f698619f82f9f5
spdx:noassertion
spdx:noassertion
actions:github/codeql-action/analyze
NOASSERTION
0225834cc549ee0ca93cb085b92954821a145866
166
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N295ae896197c49bb80b44e71f7055547
spdx:noassertion
spdx:noassertion
actions:pypa/cibuildwheel
NOASSERTION
2.3.1
167
<https://github.com/numpy/numpy/dependency_gra...
spdx:noassertion
N3a63a0c04d8b4ee9a34437a084b3b03f
spdx:noassertion
spdx:noassertion
actions:ossf/scorecard-action
NOASSERTION
e38b1902ae4f44df626f11ba0734b14fb91f8f86
168 rows × 14 columns
There are many packages identified in this SBOM despite numpy having 0 dependencies. Looking through the numpy repo we see they come from various sources not identified in this knowledge graph.
the various *_requirments.txt files for various source code functions
github actions
It would be useful in the Knowledge graph to identify which dependencies are for what (distribution, testing, CI, etc.)