Analyzing SPDX Example SBOM

SBOM Source: spdx/spdx-spec/examples

RDF Source: Generated using pyspdxtools

NOTICE: For ease of viewing some cell inputs are hidden. Please view inputs here for further explinations.

Importing Graph

Here we import the graph to be analyzed as an XML with kglab to the variable kg. This will be our main graph throughout the entirety of this notebook and will always be referred to as kg. From it we will query data and create subgraphs for analysis.

import kglab
kg = kglab.KnowledgeGraph()
kg.load_rdf("../../sboms/rdf/model.rdf.xml", format="xml")
<kglab.kglab.KnowledgeGraph>

Graph Overview

First let’s get a general overview of the graph we are working with. Let’s visualize it as a whole and look at some metadata.

The filename below specifies the path which the graph is saved to. This can also be viewed in GitHub.

Under default settings, orange represnets spdx: elements, red represents ptr: elements and blue represents all others. These can be changed as wished.

Let’s also take a look at basic graph metadata:

show_metadata(kg)
Total Triples: 306
Distinct Entities: 56
Distinct Properties: 62
show_measures(kg)
edges 306
nodes 99

Here’s some more advanced metadata:

First let’s look at a count of each entity type to get a general idea of what our graph represents

show_entity_types(kg)
http://spdx.org/rdf/terms#Relationship : 11
http://spdx.org/rdf/terms#Checksum : 10
http://spdx.org/rdf/terms#ExtractedLicensingInfo : 5
http://spdx.org/rdf/terms#Annotation : 5
http://spdx.org/rdf/terms#Package : 4
http://spdx.org/rdf/terms#File : 4
http://spdx.org/rdf/terms#ExternalRef : 3
http://spdx.org/rdf/terms#DisjunctiveLicenseSet : 2
http://www.w3.org/2009/pointers#StartEndPointer : 2
http://www.w3.org/2009/pointers#ByteOffsetPointer : 2
http://www.w3.org/2009/pointers#LineCharPointer : 2
http://spdx.org/rdf/terms#SpdxDocument : 1
http://spdx.org/rdf/terms#PackageVerificationCode : 1
http://spdx.org/rdf/terms#ConjunctiveLicenseSet : 1
http://spdx.org/rdf/terms#CreationInfo : 1
http://spdx.org/rdf/terms#ExternalDocumentRef : 1
http://spdx.org/rdf/terms#Snippet : 1

We can also view the top 10 properties of all elements:

show_top_n_props(kg)
http://www.w3.org/1999/02/22-rdf-syntax-ns#type : 56
http://www.w3.org/2000/01/rdf-schema#comment : 14
http://spdx.org/rdf/terms#fileContributor : 11
http://spdx.org/rdf/terms#relatedSpdxElement : 11
http://spdx.org/rdf/terms#relationship : 11
http://spdx.org/rdf/terms#relationshipType : 11
http://spdx.org/rdf/terms#algorithm : 10
http://spdx.org/rdf/terms#checksumValue : 10
http://spdx.org/rdf/terms#checksum : 10
http://spdx.org/rdf/terms#copyrightText : 9

SPDX schemas generally represent three main items (in addition to project metadata)

  1. Files in the project
  2. Dependencies (or packages) used in the project
  3. Relationships between everything

Let’s start by examining how files are represented in this KG

Files

From the graph let’s look at all properties that are present for files

file_schema(kg)
property
0 spdx:annotation
1 spdx:checksum
2 spdx:copyrightText
3 spdx:fileContributor
4 spdx:fileName
5 spdx:fileType
6 spdx:licenseComments
7 spdx:licenseConcluded
8 spdx:licenseInfoInFile
9 spdx:noticeText
10 spdx:relationship
11 rdf:type
12 rdfs:comment

And also a dataframe of what is present for files

df = get_files_data(kg)
df
fileID fileName fileType licenseInFile contributors licenseConcluded checksum relationship comment licenseComments noticeText annotation
0 <http://spdx.org/spdxdocs/spdx-example-444504E... ./lib-source/jena-2.6.3-sources.jar spdx:fileType_archive <http://spdx.org/spdxdocs/spdx-example-444504E... Apache Software Foundation, Hewlett Packard Inc. http://spdx.org/spdxdocs/spdx-example-444504E0... _:N7d6e508df7f240f7b8e0f250d70d75ae _:N94b74a1297954445943a04e2fd998218 This file belongs to Jena This license is used by Jena NaN NaN
1 <http://spdx.org/spdxdocs/spdx-example-444504E... ./lib-source/commons-lang3-3.1-sources.jar spdx:fileType_archive <http://spdx.org/licenses/Apache-2.0> Apache Software Foundation http://spdx.org/licenses/Apache-2.0 _:N103471638ff84c5d9d1c78f27855f653 _:N0c54e685f8e344dea045daf638265480 This file is used by Jena NaN Apache Commons Lang\nCopyright 2001-2011 The A... NaN
2 <http://spdx.org/spdxdocs/spdx-example-444504E... ./src/org/spdx/parser/DOAPProject.java spdx:fileType_source <http://spdx.org/licenses/Apache-2.0> Source Auditor Inc., Black Duck Software In.c,... http://spdx.org/licenses/Apache-2.0, http://sp... _:Nc8a3be78c9bc4cafbbf9269b928d4582 NaN NaN NaN NaN NaN
3 <http://spdx.org/spdxdocs/spdx-example-444504E... ./package/foo.c spdx:fileType_source <http://spdx.org/spdxdocs/spdx-example-444504E... IBM Corporation, IBM Corporation, IBM Corporat... N7b52bfe85cd34337af6baf9e87250d07, N7b52bfe85c... _:Na72d642b09804fc995174f1c25167e8c _:N42d1b1ab56344cda89a26dbb4f593871 The concluded license was taken from the packa... The concluded license was taken from the packa... Copyright (c) 2001 Aaron Lehmann aaroni@vitelu... _:Nd3e4b45de96b4f3fbaaf8db45a164a2d

Relationship

# get the relationship graph to be visualized
graph = visualize_relationship_graph(kg)

# optional: set the physics layout of the network
graph.force_atlas_2based()
graph.set_edge_smooth('dynamic')

# show graph
graph.show("../figs/fig01.relationship_full.html")
../figs/fig01.relationship_full.html

The color of the nodes in the graph refer to the element type in the spdx specification:

display_relationship_graph_legend()
SPDX Type Node Color
0 File Yellow
1 Package Blue
2 SPDXDocument Red