numpy

Let’s look at an incredibly common and well developed python data science package numpy to analyze how accurate and complete our SBOM generation tools are.

Known Dependencies

According to numpy’s installation website: > “NumPy doesn’t depend on any other Python packages, however, it does depend on an accelerated linear algebra library - typically Intel MKL or OpenBLAS. Users don’t have to worry about installing those (they’re automatically included in all NumPy install methods).”

To test this let’s pretend we are creating an empty conda environmennt with just numpy to see what packages will be installed

!echo 'n' | conda create -n numpy numpy
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 22.11.1
  latest version: 23.5.0

Please update conda by running

    $ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

     conda install conda=23.5.0



## Package Plan ##

  environment location: /afs/crc.nd.edu/user/p/painswor/.conda/envs/numpy

  added / updated specs:
    - numpy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2023.01.10 |       h06a4308_0         120 KB
    intel-openmp-2023.1.0      |   hdb19cb5_46305        17.1 MB
    libffi-3.4.4               |       h6a678d5_0         142 KB
    libuuid-1.41.5             |       h5eee18b_0          27 KB
    mkl-2023.1.0               |   h6d00ec8_46342       171.5 MB
    mkl-service-2.4.0          |  py311h5eee18b_1          54 KB
    mkl_fft-1.3.6              |  py311ha02d727_1         217 KB
    mkl_random-1.2.2           |  py311ha02d727_1         291 KB
    ncurses-6.4                |       h6a678d5_0         914 KB
    numpy-1.24.3               |  py311h08b1b3b_1          11 KB
    numpy-base-1.24.3          |  py311hf175353_1         7.2 MB
    openssl-1.1.1t             |       h7f8727e_0         3.7 MB
    pip-23.0.1                 |  py311h06a4308_0         2.8 MB
    python-3.11.3              |       h7a1cb2a_0        32.6 MB
    readline-8.2               |       h5eee18b_0         357 KB
    setuptools-67.8.0          |  py311h06a4308_0         1.4 MB
    sqlite-3.41.2              |       h5eee18b_0         1.2 MB
    tbb-2021.8.0               |       hdb19cb5_0         1.6 MB
    tzdata-2023c               |       h04d1e81_0         116 KB
    wheel-0.38.4               |  py311h06a4308_0          79 KB
    xz-5.4.2                   |       h5eee18b_0         642 KB
    ------------------------------------------------------------
                                           Total:       242.1 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main 
  _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu 
  blas               pkgs/main/linux-64::blas-1.0-mkl 
  bzip2              pkgs/main/linux-64::bzip2-1.0.8-h7b6447c_0 
  ca-certificates    pkgs/main/linux-64::ca-certificates-2023.01.10-h06a4308_0 
  intel-openmp       pkgs/main/linux-64::intel-openmp-2023.1.0-hdb19cb5_46305 
  ld_impl_linux-64   pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1 
  libffi             pkgs/main/linux-64::libffi-3.4.4-h6a678d5_0 
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1 
  libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1 
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1 
  libuuid            pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0 
  mkl                pkgs/main/linux-64::mkl-2023.1.0-h6d00ec8_46342 
  mkl-service        pkgs/main/linux-64::mkl-service-2.4.0-py311h5eee18b_1 
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.3.6-py311ha02d727_1 
  mkl_random         pkgs/main/linux-64::mkl_random-1.2.2-py311ha02d727_1 
  ncurses            pkgs/main/linux-64::ncurses-6.4-h6a678d5_0 
  numpy              pkgs/main/linux-64::numpy-1.24.3-py311h08b1b3b_1 
  numpy-base         pkgs/main/linux-64::numpy-base-1.24.3-py311hf175353_1 
  openssl            pkgs/main/linux-64::openssl-1.1.1t-h7f8727e_0 
  pip                pkgs/main/linux-64::pip-23.0.1-py311h06a4308_0 
  python             pkgs/main/linux-64::python-3.11.3-h7a1cb2a_0 
  readline           pkgs/main/linux-64::readline-8.2-h5eee18b_0 
  setuptools         pkgs/main/linux-64::setuptools-67.8.0-py311h06a4308_0 
  sqlite             pkgs/main/linux-64::sqlite-3.41.2-h5eee18b_0 
  tbb                pkgs/main/linux-64::tbb-2021.8.0-hdb19cb5_0 
  tk                 pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0 
  tzdata             pkgs/main/noarch::tzdata-2023c-h04d1e81_0 
  wheel              pkgs/main/linux-64::wheel-0.38.4-py311h06a4308_0 
  xz                 pkgs/main/linux-64::xz-5.4.2-h5eee18b_0 
  zlib               pkgs/main/linux-64::zlib-1.2.13-h5eee18b_0 


Proceed ([y]/n)? 

CondaSystemExit: Exiting.

Looking at the output above, we see a number of dependencies. Most of these are required for python (which is required for numpy) But some are those C/C++ libraries needed for linear algebra. So, how will a generated SBOM reflect these?

microsoft/sbom-tool

The sbom-tool queries PyPi for all package dependencies. Since numpy is uploaded to PyPi, we can query it directly before running the tool on the repo:

import requests
response = requests.get('https://pypi.org/pypi/numpy/json')
data = response.json()
print(data['info']['requires_dist'])
None

PyPi shows no deps for numpy, which makes sense as there are no python dependencies. This means when using the sbom-tool on a pacakge that depends on numpy, no child dependencies will be found for numpy.

Running the tool against the repo cloned as is produced 0 packages. As established the sbom-tool uses microsoft/component-detection to detect all packages used in the repository. This tool looks for either setup.py or requirements.txt.

In the numpy repo there are:

  • setup.py
  • build_requirements.txt
  • doc_requirements.txt
  • linter_requirements.txt
  • release_requirements.txt
  • test_requirments.txt

Github’s Dependency Graph

Now let’s examine the dependency graph from github.

This can be obtained by clicking insights > dependency graph > export SBOM

kg = kglab.KnowledgeGraph()
kg.load_rdf("../../sboms/rdf/numpy.rdf.xml", format="xml")
<kglab.kglab.KnowledgeGraph>
packages = get_package_data(kg)
packages
package annotations attributionTexts checksums downloadLocation externalRefs hasFiles licenseConcluded licenseDeclared licenseInfoFromFiles name supplier versionInfo relationships
0 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N726dcd71a340470685ef364943e7b132 spdx:noassertion spdx:noassertion actions:actions/setup-python NOASSERTION 57ded4d7d5e986d7296eab16560982c6dd7c923b
1 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion spdx:noassertion spdx:noassertion pip:nose NOASSERTION
2 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N0a62d1ba8e694d50af92ec1f9089922c spdx:noassertion spdx:noassertion actions:github/codeql-action/init NOASSERTION 83f0fe6c4988d98a455712a27f0255212bba9bd4
3 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion spdx:noassertion spdx:noassertion pip:sphinx NOASSERTION >= 4.5.0
4 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N1e74fb5265634c60913b6f514f3f46df <http://spdx.org/licenses/MIT> spdx:noassertion pip:wheel NOASSERTION 0.38.1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
163 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N8401a35c750842c0b8efe7a106016764 spdx:noassertion spdx:noassertion actions:actions/checkout NOASSERTION 3
164 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N26c1f9259f6049029979af6cbb6f9439 spdx:noassertion spdx:noassertion actions:pypa/cibuildwheel NOASSERTION 5e15bb25b428e1bf2daf2215f173d2b40135f56f
165 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N22049996a4bb4a9487f698619f82f9f5 spdx:noassertion spdx:noassertion actions:github/codeql-action/analyze NOASSERTION 0225834cc549ee0ca93cb085b92954821a145866
166 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N295ae896197c49bb80b44e71f7055547 spdx:noassertion spdx:noassertion actions:pypa/cibuildwheel NOASSERTION 2.3.1
167 <https://github.com/numpy/numpy/dependency_gra... spdx:noassertion N3a63a0c04d8b4ee9a34437a084b3b03f spdx:noassertion spdx:noassertion actions:ossf/scorecard-action NOASSERTION e38b1902ae4f44df626f11ba0734b14fb91f8f86

168 rows × 14 columns

There are many packages identified in this SBOM despite numpy having 0 dependencies. Looking through the numpy repo we see they come from various sources not identified in this knowledge graph.

  1. the various *_requirments.txt files for various source code functions
  2. github actions

It would be useful in the Knowledge graph to identify which dependencies are for what (distribution, testing, CI, etc.)