import kglab
= kglab.KnowledgeGraph()
kg "../../sboms/rdf/model.rdf.xml", format="xml") kg.load_rdf(
<kglab.kglab.KnowledgeGraph>
SBOM Source: spdx/spdx-spec/examples
RDF Source: Generated using pyspdxtools
NOTICE: For ease of viewing some cell inputs are hidden. Please view inputs here for further explinations.
Here we import the graph to be analyzed as an XML with kglab to the variable kg
. This will be our main graph throughout the entirety of this notebook and will always be referred to as kg
. From it we will query data and create subgraphs for analysis.
First let’s get a general overview of the graph we are working with. Let’s visualize it as a whole and look at some metadata.
The filename below specifies the path which the graph is saved to. This can also be viewed in GitHub.
Under default settings, orange represnets spdx: elements, red represents ptr: elements and blue represents all others. These can be changed as wished.
Let’s also take a look at basic graph metadata:
Here’s some more advanced metadata:
First let’s look at a count of each entity type to get a general idea of what our graph represents
http://spdx.org/rdf/terms#Relationship : 11
http://spdx.org/rdf/terms#Checksum : 10
http://spdx.org/rdf/terms#ExtractedLicensingInfo : 5
http://spdx.org/rdf/terms#Annotation : 5
http://spdx.org/rdf/terms#Package : 4
http://spdx.org/rdf/terms#File : 4
http://spdx.org/rdf/terms#ExternalRef : 3
http://spdx.org/rdf/terms#DisjunctiveLicenseSet : 2
http://www.w3.org/2009/pointers#StartEndPointer : 2
http://www.w3.org/2009/pointers#ByteOffsetPointer : 2
http://www.w3.org/2009/pointers#LineCharPointer : 2
http://spdx.org/rdf/terms#SpdxDocument : 1
http://spdx.org/rdf/terms#PackageVerificationCode : 1
http://spdx.org/rdf/terms#ConjunctiveLicenseSet : 1
http://spdx.org/rdf/terms#CreationInfo : 1
http://spdx.org/rdf/terms#ExternalDocumentRef : 1
http://spdx.org/rdf/terms#Snippet : 1
We can also view the top 10 properties of all elements:
http://www.w3.org/1999/02/22-rdf-syntax-ns#type : 56
http://www.w3.org/2000/01/rdf-schema#comment : 14
http://spdx.org/rdf/terms#fileContributor : 11
http://spdx.org/rdf/terms#relatedSpdxElement : 11
http://spdx.org/rdf/terms#relationship : 11
http://spdx.org/rdf/terms#relationshipType : 11
http://spdx.org/rdf/terms#algorithm : 10
http://spdx.org/rdf/terms#checksumValue : 10
http://spdx.org/rdf/terms#checksum : 10
http://spdx.org/rdf/terms#copyrightText : 9
SPDX schemas generally represent three main items (in addition to project metadata)
Let’s start by examining how files are represented in this KG
From the graph let’s look at all properties that are present for files
property | |
---|---|
0 | spdx:annotation |
1 | spdx:checksum |
2 | spdx:copyrightText |
3 | spdx:fileContributor |
4 | spdx:fileName |
5 | spdx:fileType |
6 | spdx:licenseComments |
7 | spdx:licenseConcluded |
8 | spdx:licenseInfoInFile |
9 | spdx:noticeText |
10 | spdx:relationship |
11 | rdf:type |
12 | rdfs:comment |
And also a dataframe of what is present for files
fileID | fileName | fileType | licenseInFile | contributors | licenseConcluded | checksum | relationship | comment | licenseComments | noticeText | annotation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | <http://spdx.org/spdxdocs/spdx-example-444504E... | ./lib-source/jena-2.6.3-sources.jar | spdx:fileType_archive | <http://spdx.org/spdxdocs/spdx-example-444504E... | Apache Software Foundation, Hewlett Packard Inc. | http://spdx.org/spdxdocs/spdx-example-444504E0... | _:N7d6e508df7f240f7b8e0f250d70d75ae | _:N94b74a1297954445943a04e2fd998218 | This file belongs to Jena | This license is used by Jena | NaN | NaN |
1 | <http://spdx.org/spdxdocs/spdx-example-444504E... | ./lib-source/commons-lang3-3.1-sources.jar | spdx:fileType_archive | <http://spdx.org/licenses/Apache-2.0> | Apache Software Foundation | http://spdx.org/licenses/Apache-2.0 | _:N103471638ff84c5d9d1c78f27855f653 | _:N0c54e685f8e344dea045daf638265480 | This file is used by Jena | NaN | Apache Commons Lang\nCopyright 2001-2011 The A... | NaN |
2 | <http://spdx.org/spdxdocs/spdx-example-444504E... | ./src/org/spdx/parser/DOAPProject.java | spdx:fileType_source | <http://spdx.org/licenses/Apache-2.0> | Source Auditor Inc., Black Duck Software In.c,... | http://spdx.org/licenses/Apache-2.0, http://sp... | _:Nc8a3be78c9bc4cafbbf9269b928d4582 | NaN | NaN | NaN | NaN | NaN |
3 | <http://spdx.org/spdxdocs/spdx-example-444504E... | ./package/foo.c | spdx:fileType_source | <http://spdx.org/spdxdocs/spdx-example-444504E... | IBM Corporation, IBM Corporation, IBM Corporat... | N7b52bfe85cd34337af6baf9e87250d07, N7b52bfe85c... | _:Na72d642b09804fc995174f1c25167e8c | _:N42d1b1ab56344cda89a26dbb4f593871 | The concluded license was taken from the packa... | The concluded license was taken from the packa... | Copyright (c) 2001 Aaron Lehmann aaroni@vitelu... | _:Nd3e4b45de96b4f3fbaaf8db45a164a2d |
# get the relationship graph to be visualized
graph = visualize_relationship_graph(kg)
# optional: set the physics layout of the network
graph.force_atlas_2based()
graph.set_edge_smooth('dynamic')
# show graph
graph.show("../figs/fig01.relationship_full.html")
../figs/fig01.relationship_full.html
The color of the nodes in the graph refer to the element type in the spdx specification: