Quality Assessment of SBOM Generation Tools and Standards on Open Source Projects
Quality Assessment of SBOM Generation Tools and Standards on Open Source Projects
With the increasing complexity of modern software composition and an ever-growing software supply chain, where numerous resources are sourced from open-source projects, the need to keep track of these resources has arisen. Following the Log4j incident [29], the American Government passed Executive Order (EO) 14028, mandating a SBOM for all software products sold to federal government agencies. Similarly, the European Union passed the Cyber Resiliance Act (CRA), which requires an SBOM for all digital products in the European market. For these reasons, machine-readable SBOM formats have emerged in recent years, implemented by a wide variety of projects that produce and consume such SBOMs.
This thesis investigates a collection of projects that generate SBOMs at various stages of the software development lifecycle. Each generator is applied to open-source projects to produce SBOMs. This study examines and compares the features provided by these SBOMs. Additionally, it assesses the completeness of the enumerated packages/components by analyzing the overlap among the SBOMs generated for each project.
The thesis highlights the distinctions between various tools and phases, highlighting potential bugs in the implementation of the tools investigated. It elucidates the variances in the SBOMs generated at different phases of the software development lifecycle. An analysis was conducted to identify which components of the CycloneDX and SPDX schemas were enriched with data during the SBOMs generation process. Furthermore, the research reveals that the tooling examined produces results of varying quality and depth. A metric was introduced to quantify the overlap among the diverse SBOMs, yielding mixed results. It is concluded that the quality and applicability of a produced SBOM can vary drastically depending on the use case. This variation is partly attributable to the different methodologies implemented by the investigated tools but also partly based on divergent results in the quality or depth of the generated SBOMs, where identifiers are produced in different ways or values are not sufficiently enriched.
The thesis aims to propose initial methods for validating the enrichment of a SBOMand assessing its completeness. This was done by testing the implemented generators on real-world projects.