Frameworks-Getting-Started SBOM

SBOM Source: ndcrane/frameworks-getting-started generated using microsoft/sbom-tool

RDF Source: Generated using pyspdxtools

NOTICE: For ease of viewing some cell inputs are hidden. Please view inputs here for further explinations.

This SBOM was generated with microsoft/sbom-tool from the ndcrane/frameworks-getting-started repo. This page analyzes the performance of the sbom-tool from an extremely simple ML workflow.

Importing Graph

Here we import the graph to be analyzed as an XML with kglab to the variable kg. This will be our main graph throughout the entirety of this notebook and will always be referred to as kg. From it we will query data and create subgraphs for analysis.

kg = kglab.KnowledgeGraph()
kg.load_rdf("../../sboms/rdf/frameworks-getting-started.rdf.xml", format="xml")
<kglab.kglab.KnowledgeGraph>

Graph Overview

First let’s get a general overview of the graph we are working with. Let’s visualize it as a whole and look at some metadata.

Under default settings, orange represnets spdx: elements, red represents ptr: elements and blue represents all others. These can be changed as wished.

Let’s also take a look at basic graph metadata:

show_metadata(kg)
Total Triples: 48835
Distinct Entities: 11235
Distinct Properties: 29
show_measures(kg)
edges 48835
nodes 11252

Here’s some more advanced metadata:

First let’s look at a count of each entity type to get a general idea of what our graph represents

show_entity_types(kg)
http://spdx.org/rdf/terms#Checksum : 5306
http://spdx.org/rdf/terms#Relationship : 2860
http://spdx.org/rdf/terms#File : 2653
http://spdx.org/rdf/terms#Package : 207
http://spdx.org/rdf/terms#ExternalRef : 206
http://spdx.org/rdf/terms#SpdxDocument : 1
http://spdx.org/rdf/terms#CreationInfo : 1
http://spdx.org/rdf/terms#PackageVerificationCode : 1

We can also view the top 10 properties of all elements:

show_top_n_props(kg)
http://www.w3.org/1999/02/22-rdf-syntax-ns#type : 11235
http://spdx.org/rdf/terms#checksumValue : 5306
http://spdx.org/rdf/terms#checksum : 5306
http://spdx.org/rdf/terms#algorithm : 5306
http://spdx.org/rdf/terms#copyrightText : 2860
http://spdx.org/rdf/terms#relationshipType : 2860
http://spdx.org/rdf/terms#licenseConcluded : 2860
http://spdx.org/rdf/terms#relatedSpdxElement : 2860
http://spdx.org/rdf/terms#relationship : 2860
http://spdx.org/rdf/terms#licenseInfoInFile : 2653

SPDX schemas generally represent three main items (in addition to project metadata)

  1. Files in the project
  2. Dependencies (or packages) used in the project
  3. Relationships between everything

Let’s start by examining how files are represented in this KG

Files

From the graph let’s look at all properties that are present for files

file_schema(kg)
property
0 spdx:checksum
1 spdx:copyrightText
2 spdx:fileName
3 spdx:licenseConcluded
4 spdx:licenseInfoInFile
5 rdf:type

Already we see there is less information included from this generated file compared to the SPDX example sbom

And also a dataframe of what is present for files

files = get_files_data(kg)
files.head(5)
fileID fileName licenseInFile contributors licenseConcluded checksum
0 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... ./quarto/quarto-1.2.335/share/deno_std/cache/g... spdx:noassertion http://spdx.org/rdf/terms#noassertion, http://... _:Nc63269a9d9f940f09e60dcb0962d9c1a
1 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... ./.git/objects/cf/e2ee6b560afa0ca0a8866a64afa6... spdx:noassertion http://spdx.org/rdf/terms#noassertion, http://... _:Ne386f3ee788f4538bbc99dfdc805636a
2 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... ./quarto/quarto-1.2.335/share/formats/pdf/pdfj... spdx:noassertion http://spdx.org/rdf/terms#noassertion, http://... _:N0f1bf711e52d4037a91b8624e939cce6
3 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... ./quarto/quarto-1.2.335/share/deno_std/cache/d... spdx:noassertion http://spdx.org/rdf/terms#noassertion, http://... _:N8321128f77034949bdd12f89ebb12adf
4 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... ./quarto/quarto-1.2.335/share/deno_std/cache/g... spdx:noassertion http://spdx.org/rdf/terms#noassertion, http://... _:N8a5f86d58e344d63b6ddf23c067b4e05
files.describe()
fileID fileName licenseInFile contributors licenseConcluded checksum
count 2653 2653 2653 2653 2653 2653
unique 2653 2653 1 1 1 2653
top <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... ./quarto/quarto-1.2.335/share/deno_std/cache/g... spdx:noassertion http://spdx.org/rdf/terms#noassertion, http://... _:Nc63269a9d9f940f09e60dcb0962d9c1a
freq 1 1 2653 2653 2653 1

Looking at a basic description of the files dataframe there are a few important items:

  1. All fileIDs and names are unique (this is good)
  2. All checksum’s are unique (this is good)
    • The checksums point to another node in the KG
  3. There is no file license information or contributor information

Here’s a representation of all files in a graph form:

subgraph = get_files_graph(kg)
show_measures(subgraph)
edges 7959
nodes 2655

Packages

package_schema(kg)
property
0 spdx:copyrightText
1 spdx:downloadLocation
2 spdx:externalRef
3 spdx:filesAnalyzed
4 spdx:licenseConcluded
5 spdx:licenseDeclared
6 spdx:licenseInfoFromFiles
7 spdx:name
8 spdx:packageVerificationCode
9 spdx:relationship
10 spdx:supplier
11 spdx:versionInfo
12 rdf:type
packages = get_package_data(kg)
packages
package annotations attributionTexts checksums copyrightText downloadLocation externalRefs hasFiles licenseConcluded licenseDeclared licenseInfoFromFiles name packageVerificationCode supplier versionInfo relationships
0 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion spdx:noassertion spdx:noassertion test _:Nbdae0320e9a543f9abbcd2d0375575ed Organization: NDCRC 1.0.0 N16e091841fce4e519fd269480cb271f9, N8c855a366f...
1 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N62856b0f717a48f992594b918de26914 spdx:noassertion spdx:noassertion argon2-cffi NaN NOASSERTION 21.3.0
2 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion Nef311854f4214014a59074bd404d2d0b spdx:noassertion spdx:noassertion click NaN NOASSERTION 8.1.3
3 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N3936863eac864a87968cdd60a3311a20 spdx:noassertion spdx:noassertion jupyterlab-widgets NaN NOASSERTION 3.0.7
4 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N0f6420517f534c3486e00f9989ae6e19 spdx:noassertion spdx:noassertion mpmath NaN NOASSERTION 1.3.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
202 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N88c7496d1ab34af5ae1fbb3c8cc73b1c spdx:noassertion spdx:noassertion dvc-objects NaN NOASSERTION 0.22.0
203 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N5fcb69ed42b644e8a52dbfc94eb6b08d spdx:noassertion spdx:noassertion requests NaN NOASSERTION 2.30.0
204 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion Ncd07f9a6fc6e43f292c36aea077a00b6 spdx:noassertion spdx:noassertion entrypoints NaN NOASSERTION 0.4
205 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N4d00915d97324d12846ed70e4860f304 spdx:noassertion spdx:noassertion tornado NaN NOASSERTION 6.3.2
206 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N8dd4504ce3d64a5b8f7fb5b069006224 spdx:noassertion spdx:noassertion python-json-logger NaN NOASSERTION 2.0.7

207 rows × 16 columns

packages[packages['name'].str.contains('python')]
package annotations attributionTexts checksums copyrightText downloadLocation externalRefs hasFiles licenseConcluded licenseDeclared licenseInfoFromFiles name packageVerificationCode supplier versionInfo relationships
28 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion Nb8b58c02aec8464e889e5d288274ceea spdx:noassertion spdx:noassertion ipython NaN NOASSERTION 8.13.2
42 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion Nbf6fef54fe684ecb816361be6f1db245 spdx:noassertion spdx:noassertion gitpython NaN NOASSERTION 3.1.31
58 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N0677aaed34ef45d6ac08b3dc62630642 spdx:noassertion spdx:noassertion python-dateutil NaN NOASSERTION 2.8.2
99 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion Nfacdd7bb5da34df0b41c24fc32e16144 spdx:noassertion spdx:noassertion ipython-genutils NaN NOASSERTION 0.2.0
179 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N8a33454ef54044eca6fa50dc0d527c59 spdx:noassertion spdx:noassertion antlr4-python3-runtime NaN NOASSERTION 4.9.3
206 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion N8dd4504ce3d64a5b8f7fb5b069006224 spdx:noassertion spdx:noassertion python-json-logger NaN NOASSERTION 2.0.7
packages.describe()
package annotations attributionTexts checksums copyrightText downloadLocation externalRefs hasFiles licenseConcluded licenseDeclared licenseInfoFromFiles name packageVerificationCode supplier versionInfo relationships
count 207 207 207 207 207 207 207 207 207 207 207 207 1 207 207 207
unique 207 1 1 1 1 1 207 1 1 1 1 207 1 2 186 2
top <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... NOASSERTION spdx:noassertion spdx:noassertion spdx:noassertion test _:Nbdae0320e9a543f9abbcd2d0375575ed NOASSERTION 1.0.0
freq 1 207 207 207 207 207 1 207 207 207 207 1 1 206 3 206

Here we see we are missing even more information then the files section.

Relationships

relationship_schema(kg)
property
0 rdf:type
1 spdx:relationshipType
2 spdx:relatedSpdxElement
rels = get_relationship_data(kg)
rels
element elementType relationshipType relatedElement relatedElementType
0 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
1 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
2 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
3 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
4 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
... ... ... ... ... ...
2855 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
2856 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
2857 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
2858 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
2859 <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:SpdxDocument spdx:relationshipType_describes <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package

2860 rows × 5 columns

rels.describe()
element elementType relationshipType relatedElement relatedElementType
count 2860 2860 2860 2860 2860
unique 2 2 3 2860 2
top <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:Package spdx:relationshipType_contains <https://spdx.org/spdxdocs/sbom-tool-1.1.1-098... spdx:File
freq 2859 2859 2653 1 2653

Lastly our relationshipsare only limited to 3 types and are mostly between Packages and Files

subgraph = get_relationship_graph(kg)
show_measures(subgraph)
edges 11441
nodes 5727

Relationship graph visualization

That relationship graph has a large number of nodes, making it difficult to load the visualization. To make the visualization possible, we can exclude nodes of the type SPDX:File. This can be achieved by passing the hideTypeFile=True flag to the visualize_relationship_graph() function, as shown below:

# get the relationship graph to be visualized
graph = visualize_relationship_graph(kg, hideTypeFile=True)

# optional: set the physics layout of the network
graph.force_atlas_2based()
graph.set_edge_smooth('dynamic')

# show graph
graph.show("../figs/fig02.relationship.html")
../figs/fig02.relationship.html

The color of the nodes in the graph refer to the element type in the spdx specification:

display_relationship_graph_legend()
SPDX Type Node Color
0 File Yellow
1 Package Blue
2 SPDXDocument Red