SPDX 3.0 AI Package

Brief Introduction

AIPackage provides information about the fields in the AI package profile.

It refers to metadata information that can be added to a package to describe an AI application or trained AI model.

The AIPackage is instanciable class wichi is a SubclassOf /Software/Package.

Here is an example of the package class in jsonld serialization on the 3.0 SPDX version https://github.com/spdx/spdx-3-model/blob/main/serialization/json_ld/examples/package.jsonld

Properties

  • energyConsumption
    • type: xsd:string
    • minCount: 0
    • maxCount: 1
  • standardCompliance
    • type: xsd:string
    • minCount: 0
  • limitation
    • type: xsd:string
    • minCount: 0
    • maxCount: 1
  • typeOfModel
    • type: xsd:string
    • minCount: 0
  • informationAboutTraining
    • type: xsd:string
    • minCount: 0
    • maxCount: 1
  • informationAboutApplication
    • type: xsd:string
    • minCount: 0
    • maxCount: 1
  • hyperparameter
    • type: /Core/DictionaryEntry
    • minCount: 0
  • modelDataPreprocessing
    • type: xsd:string
    • minCount: 0
  • modelExplainability
    • type: xsd:string
    • minCount: 0
  • sensitivePersonalInformation
    • type: PresenceType
    • minCount: 0
    • maxCount: 1
  • metricDecisionThreshold
    • type: /Core/DictionaryEntry
    • minCount: 0
  • metric
    • type: /Core/DictionaryEntry
    • minCount: 0
  • domain
    • type: xsd:string
    • minCount: 0
  • autonomyType
    • type: PresenceType
    • minCount: 0
    • maxCount: 1
  • safetyRiskAssessment
    • type: SafetyRiskAssessmentType
    • minCount: 0
    • maxCount: 1

External properties restrictions

  • /Core/Artifact/suppliedBy
    • minCount: 1
  • /Software/Package/downloadLocation
    • minCount: 1
  • /Software/Package/packageVersion
    • minCount: 1
  • /Software/SoftwareArtifact/purpose
    • minCount: 1
  • /Core/Artifact/releaseTime
    • minCount: 1

In the SPDX repo we can find a code to generate a object from the AIPackage Class https://github.com/spdx/tools-python/blob/main/src/spdx_tools/spdx3/model/ai/ai_package.py.

SPDX3.0 AI Package and Hugging Face Model Cards (HFMC)

The idea is to compare the information in the SPDX 3.0 AI package with the information in the Hugging Face Model Cards to verify their consistency or alignment.

energy_consumption

  • SPDX3: energy_consumption captures the amount of energy needed to train and operate the AI model. This value is also known as training energy consumption or inference energy consumption.
  • HFMC: Environmental Impact Section in HFMC summarizes the information necessary to calculate environmental impacts such as electricity usage and carbon emissions.

standardCompliance

  • SPDX3: StandardCompliance captures a standard that the AI software complies with. This includes both published and unpublished standards, for example ISO, IEEE, ETSI etc. The standard could (but not necessarily have to) be used to satisfy a legal or regulatory requirement.

Limitation

  • SPDX3: limitation captures a limitation of the AI Package (or of the AI models present in the AI package), expressed as free form text. Note that this is not guaranteed to be exhaustive. For instance, a limitation might be that the AI package cannot be used on datasets from a certain demography.
  • HFMC: bias_risks_limitations. This section identifies foreseeable harms, misunderstandings, and technical and sociotechnical limitations. It also provides information on warnings and potential mitigations.

TypeOfModel

  • SPDX3: typeOfModel records the type of the AI model(s) used in the software. For instance, if it is a supervised model, unsupervised model, reinforcement learning model or a combination of those.
  • HFMC: model_description:model_type You can name the “type” as Supervision/Learning Method, Machine Learning Type, and Modality.

InformationAboutTraining

  • SPDX3: informationAboutTraining describes the specific steps involved in the training of the AI model. For example, it can be specified whether supervised fine-tuning or active learning is used as part of training the model.

  • HFMC: finetuned_from If this model has another model as its base, link to that model here.

  • HFMC: Training Details is a section that provides information to describe and replicate training, including the training data, the speed and size of training elements, and the environmental impact of training. This relates heavily to the Technical Specifications as well, and content here should link to that section when it is relevant to the training procedure. It is useful for people who want to learn more about the model inputs and training footprint. It is relevant for anyone who wants to know the basics of what the model is learning. Some keys related to training can be used in the metadata YAML section:

    • training_data Write 1-2 sentences on what the training data is. Ideally this links to a Dataset Card for further information. Links to documentation related to data pre-processing or additional filtering may go here as well as in More Information.
    • preprocessing Detail tokenization, resizing/rewriting (depending on the modality), etc.
    • speeds_sizes_times Detail throughput, start/end time, checkpoint sizes, etc.

InformationAboutApplication

  • SPDX3: informationAboutApplication describes any relevant information in free form text about how the AI model is used inside the software, as well as any relevant pre-processing steps, third party APIs etc.
  • HFMC: Technical Specifications This section includes details about the model objective and architecture, and the compute infrastructure. It is useful for people interested in model development. Writing this section usually requires the model developer to be directly involved.
  • HFMC: software is a key inside the Technical Specifications section that may be used to describe the application.

hyperparameter

  • SPDX3: hyperparameter Records a hyperparameter used to build the AI model contained in the AI package.
  • HFMC: Training Details is a section that provides information to describe and replicate training, including the training data, the speed and size of training elements, and the environmental impact of training. This relates heavily to the Technical Specifications as well, and content here should link to that section when it is relevant to the training procedure. It is useful for people who want to learn more about the model inputs and training footprint. It is relevant for anyone who wants to know the basics of what the model is learning. Some keys related to training can be used in the metadata YAML section:

modelDataPreprocessing

  • SPDX3: modelDataPreprocessing Describes all the preprocessing steps applied to the training data before the model training.
  • HFMC: preprocessing Detail tokenization, resizing/rewriting (depending on the modality), etc.

ModelExplainability

  • SPDX3: modelExplainability is a free form text that lists the different explainability mechanisms (such as SHAP, or other model specific explainability mechanisms) that can be used to explain the model.
  • HFMC: model_examination This is an experimental section some developers are beginning to add, where work on explainability/interpretability may go.

SensitivePersonalInformation

  • SPDX3: sensitivePersonalInformation notes if sensitive personal information is used in the training or inference of the AI models. This might include biometric data, addresses or other data that can be used to infer a person’s identity.
  • HFMC: None

metricDecisionThreshold

  • SPDX3: metricDecisionThreshold Each metric might be computed based on a decision threshold. For instance, precision or recall is typically computed by checking if the probability of the outcome is larger than 0.5. Each decision threshold should match with a metric field defined in the AI Package.
  • HFMC: None

metric

  • SPDX3: metric Metric records the measurement with which the AI model was evaluated. This makes statements about the prediction quality including uncertainty, accuracy, characteristics of the tested population, quality, fairness, explainability, robustness etc.
  • HFMC: testing_metrics What metrics will be used for evaluation in light of tradeoffs between different errors?

domain

  • SPDX3: Domain describes the domain in which the AI model contained in the AI software can be expected to operate successfully. Examples include computer vision, natural language etc.
  • HFMC: ?? testing_factors What are the foreseeable characteristics that will influence how the model behaves? This includes domain and context, as well as population subgroups. Evaluation should ideally be disaggregated across factors in order to uncover disparities in performance.

AutonomyType

  • SPDX3: autonomyType indicates if a human is involved in any of the decisions of the AI software or if that software is fully automatic.
  • HFMC: NOASSERTION

SafetyRiskAssessment

  • SPDX3: safetyRiskAssessment categorizes the safety risk impact of the AI software in accordance with Article 20 of EC Regulation No 765/2008.
  • HFMC: bias_risks_limitations. This section identifies foreseeable harms, misunderstandings, and technical and sociotechnical limitations. It also provides information on warnings and potential mitigations.