[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: doc: Add comment to CISA-2023-0026-0001 on softwa
From: |
Ludovic Courtès |
Subject: |
branch master updated: doc: Add comment to CISA-2023-0026-0001 on software identification. |
Date: |
Mon, 05 Feb 2024 11:48:07 -0500 |
This is an automated email from the git hooks/post-receive script.
civodul pushed a commit to branch master
in repository maintenance.
The following commit(s) were added to refs/heads/master by this push:
new 5274429 doc: Add comment to CISA-2023-0026-0001 on software
identification.
5274429 is described below
commit 5274429f940408926cc71f38ba81c248f1bd1aee
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Mon Feb 5 16:21:14 2024 +0100
doc: Add comment to CISA-2023-0026-0001 on software identification.
* doc/cisa-2023-0026-0001: New directory.
---
doc/cisa-2023-0026-0001/channels.scm | 11 +
doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org | 268 ++++++++++++++++++++++++
doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf | Bin 0 -> 278782 bytes
doc/cisa-2023-0026-0001/manifest.scm | 17 ++
4 files changed, 296 insertions(+)
diff --git a/doc/cisa-2023-0026-0001/channels.scm
b/doc/cisa-2023-0026-0001/channels.scm
new file mode 100644
index 0000000..003e1e0
--- /dev/null
+++ b/doc/cisa-2023-0026-0001/channels.scm
@@ -0,0 +1,11 @@
+(list (channel
+ (name 'guix)
+ (url "https://git.savannah.gnu.org/git/guix.git")
+ (branch #f)
+ (commit
+ "65dc2d40cb113382fb98796f1d04099f28cab355")
+ (introduction
+ (make-channel-introduction
+ "9edb3f66fd807b096b48283debdcddccfea34bad"
+ (openpgp-fingerprint
+ "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
diff --git a/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org
b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org
new file mode 100644
index 0000000..d6b51c9
--- /dev/null
+++ b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org
@@ -0,0 +1,268 @@
+#+TITLE: Public Comment to CISA-2023-0026-0001
+#+AUTHOR: Maxim Cournoyer, Ludovic Courtès, Jan Nieuwenhuizen, Simon Tournier
+#+DATE: January 2024
+#+SUBTITLE: Perspective from Developers of GNU Guix
+#+STARTUP: content hidestars
+#+LANGUAGE: fr
+#+LATEX_CLASS: article
+#+LATEX_CLASS_OPTIONS: [letterpaper]
+#+LATEX_HEADER: \usepackage{xcolor}
+#+LATEX_HEADER: \usepackage[T1]{fontenc}
+#+LATEX_HEADER: \definecolor{darkblue}{rgb}{0.0, 0.0, 0.55}
+#+LATEX_HEADER: \definecolor{cobalt}{rgb}{0.0, 0.28, 0.67}
+#+LATEX_HEADER: \definecolor{coolblack}{rgb}{0.0, 0.18, 0.39}
+#+LATEX_HEADER: \usepackage{libertine}
+#+LATEX_HEADER: \usepackage{inconsolata}
+#+OPTIONS: toc:nil
+
+#+begin_quote
+This is the answer to a request for information from the Cybersecurity
+and Infrastructure Security Agency (CISA) identified as
[[https://www.regulations.gov/document/CISA-2023-0026-0001][CISA–2023–0026]].
+#+end_quote
+
+#+latex: \vspace{15mm}
+
+#+latex: \noindent
+Dear CISA team,
+
+#+latex: \vspace{6mm}
+#+latex: \noindent
+Please find below our contribution to the work of CISA regarding the
+merits and challenges of the software identifier ecosystems as
+discussed in CISA’s
[[https://www.cisa.gov/resources-tools/resources/software-identification-ecosystem-option-analysis][October
2023 white paper]].
+
+* About the Authors
+
+This document was written by core developers of [[https://guix.gnu.org][GNU
Guix]][fn:1:https://guix.gnu.org], a software
+project we believe provides useful insight for the software
+identification goals defined by CISA.
+
+Maxim Cournoyer (Canada) is currently co-maintainer of Guix, a long-time
+Guix developer, with years of experience developing free and open source
+software.
+
+Ludovic Courtès (France) is founder of Guix, Guix contributor and former
+[[https://nixos.org][Nix]] contributor, working as a research software
engineer at Inria, the
+French research institute in computer science.
+
+Jan Nieuwenhuizen (The Netherlands) is founder of
+[[https://www.gnu.org/software/mes][GNU
Mes]][fn:2:https://www.gnu.org/software/mes], leader of the
+full-source bootstrap effort discussed thereafter, recognized for his
+many contributions to free software over more than twenty years.
+
+Simon Tournier (France) is a long-time contributor to Guix, leading
+integration with [[https://www.softwareheritage.org][Software
+Heritage]][fn:3:https://www.softwareheritage.org], working as a research
+software engineer at Université Paris-Cité.
+
+* About GNU Guix
+
+The authors draw their experience from the design and development of
+[[https://guix.gnu.org][GNU Guix]], a package manager, software deployment
tool, and GNU/Linux
+distribution. Guix today is the fifth largest Linux distribution
+according to [[https://repology.org][Repology]][fn:4:https://repology.org].
Since its inception in 2012, it has received
+source code contributions from almost 1,000 people.
+
+* On Software Identification
+
+The /Software Identification Ecosystem Option Analysis/ white paper
+released by CISA in October 2023 studies options towards the definition
+of /a software identification ecosystem that can be used across the
+complete, global software space for all key cybersecurity use cases/.
+
+Our experience lies in the design and development of
[[https://guix.gnu.org][GNU Guix]], a package
+manager, software deployment tool, and GNU/Linux distribution, which
+emphasizes three key elements: *reproducibility, provenance tracking,
+and auditability*. We explain in the following sections our approach
+and how it relates to the goal stated in the aforementioned white paper.
+
+Guix produces binary artifacts of varying complexity from source code:
+package binaries, application bundles (container images to be consumed
+by Docker and related tools), system installations, system bundles
+(container and virtual machine images).
+
+All these artifacts qualify as “software” and so does source code. Some
+of this “software” comes from well-identified upstream packages,
+sometimes with modifications added downstream by packagers (patches); binary
+artifacts themselves are the byproduct of a build process where the
+package manager uses /other/ binary artifacts it previously built
+(compilers, libraries, etc.) along with more source code (the package
+definition) to build them. How can one identify “software” in that
+sense?
+
+Software is dual: it exists in /source/ form and in /binary/,
+machine-executable form. The latter is the outcome of a complex
+computational process taking source code and intermediary binaries as
+input.
+
+Our thesis can be summarized as follows:
+
+#+begin_quote
+*We consider that the requirements for source code identifiers differ
+ from the requirements to identify binary artifacts.*
+
+Our view, embodied in GNU Guix, is that:
+
+ 1. *Source code* can be identified in an unambiguous and distributed
+ fashion through /inherent identifiers/ such as cryptographic
+ hashes.
+
+ 2. *Binary artifacts*, instead, need to be the byproduct of a
+ /comprehensive and verifiable build process itself available as
+ source code/.
+#+end_quote
+
+In the next sections, to clarify the context of this statement, we show
+how Guix identifies source code, how it defines the /source-to-binary/
+path and ensures its verifiability, and how it provides provenance
+tracking.
+
+* Source Code Identification
+
+Guix includes
[[https://guix.gnu.org/manual/en/html_node/Defining-Packages.html][package
definitions]][fn:5:https://guix.gnu.org/manual/en/html_node/Defining-Packages.html]
for almost 30,000 packages. Each
+package definition identifies its
[[https://guix.gnu.org/manual/en/html_node/origin-Reference.html][origin]][fn:6:https://guix.gnu.org/manual/en/html_node/origin-Reference.html]—its
“main” source code as well
+as patches. The origin is *content-addressed*: it includes a SHA256
+cryptographic hash of the code (an /inherent identifier/), along with a
+primary URL to download it.
+
+Since source is content-addressed, the URL can be thought of as a hint.
+Indeed, *we connected Guix to the [[https://www.softwareheritage.org][Software
Heritage]] source code
+archive*: when source code vanishes from its original URL, Guix falls
+back to downloading it from the archive. This is made possible thanks
+to the use of inherent (or intrinsic) identifiers both by Guix and
+Software Heritage.
+
+More information can be found
[[https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/][2019
blog
post]][fn:7:https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/]
and in the documents of the
+[[https://www.swhid.org/][Software Hash Identifiers
(SWHID)]][fn:8:https://www.swhid.org/] working group.
+
+* Reproducible Builds
+
+Guix provides a *verifiable path from source code to binaries* by
+ensuring [[https://reproducible-builds.org][reproducible builds]]. To achieve
that, Guix builds upon the
+pioneering research work of Eelco Dolstra that led to the design of the
+[[https://nixos.org][Nix package manager]], with which it shares the same
conceptual
+foundation.
+
+Namely, Guix relies on /hermetic builds/: builds are performed in
+isolated environments that contain nothing but explicitly-declared
+dependencies—where a “dependency” can be the output of another build
+process or source code, including build scripts and patches.
+
+An implication is that *builds can be verified independently*. For
+instance, for a given version of Guix, =guix build gcc= should produce
+the exact same binary, bit-for-bit. To facilitate independent
+verification, =guix challenge gcc= compares the binary artifacts of the
+GNU Compiler Collection (GCC) as built and published by different
+parties. Users can also compare to a local build with =guix build gcc
+--check=.
+
+As with Nix, build processes are identified by /derivations/, which are
+low-level, content-addressed build instructions; derivations may refer
+to other derivations and to source code. For instance,
+=/gnu/store/c9fqrmabz5nrm2arqqg4ha8jzmv0kc2f-gcc-11.3.0.drv= uniquely
+identifies the derivation to build a specific variant of version 11.3.0
+of the GNU Compiler Collection (GCC). Changing the package
+definition—patches being applied, build flags, set of dependencies—, or
+similarly changing one of the packages it depends on, leads to a
+different derivation (more information can be found in
[[https://edolstra.github.io/pubs/phd-thesis.pdf][Eelco Dolstra’s
+PhD thesis]]).
+
+Derivations form a graph that *captures the entirety of the build
+processes leading to a binary artifact*. In contrast, mere package
+name/version pairs such as =gcc 11.3.0= fail to capture the breadth and
+depth elements that lead to a binary artifact. This is a shortcoming of
+systems such as the *Common Platform Enumeration* (CPE) standard: it
+fails to express whether a vulnerability that applies to =gcc 11.3.0=
+applies to it regardless of how it was built, patched, and configured,
+or whether certain conditions are required.
+
+* Full-Source Bootstrap
+
+Reproducible builds alone cannot ensure the source-to-binary
+correspondence: the compiler could contain a backdoor, as demonstrated
+by Ken Thompson in /Reflections on Trusting Trust/. To address that,
+Guix goes further by implementing so-called *full-source bootstrap*: for
+the first time, literally every package in the distribution is built
+from source code,
[[https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/][starting
from a very small binary
+seed]][fn:9:https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/].
+This gives an unprecedented level of transparency, allowing code to be
+audited at all levels, and improving robustness against the
+“trusting-trust attack” described by Ken Thompson.
+
+The European Union recognized the importance of this work through an
+[[https://nlnet.nl/project/GNUMes-fullsource/][NLnet Privacy & Trust Enhancing
Technologies (NGI0 PET)
+grant]][fn:13:https://nlnet.nl/project/GNUMes-fullsource/] allocated in
+2021 to Jan Nieuwenhuizen to further work on full-source bootstrap in
+GNU Guix, GNU Mes, and related projects, followed by
[[https://nlnet.nl/project/GNUMes-ARM_RISC-V/][another grant]] in
+2022 to expand support to the Arm and RISC-V CPU architectures.
+
+* Provenance Tracking
+
+We define provenance tracking as the ability *to map a binary artifact
+back to its complete corresponding source*. Provenance tracking is
+necessary to allow the recipient of a binary artifact to access the
+corresponding source code and to verify the source/binary correspondence
+if they wish to do so.
+
+The [[https://guix.gnu.org/manual/en/html_node/Invoking-guix-pack.html][=guix
pack=]] command can be used to build, for instance, containers
+images. Running =guix pack -f docker python --save-provenance= produces
+a /self-describing Docker image/ containing the binaries of Python and
+its run-time dependencies. The image is self-describing because
+=--save-provenance= flag leads to the inclusion of a /manifest/ that
+describes which revision of Guix was used to produce this binary. A
+third party can retrieve this revision of Guix and from there view the
+entire build dependency graph of Python, view its source code and any patches
+that were applied, and recursively for its dependencies.
+
+To summarize, capturing the revision of Guix that was used is all it
+takes to /reproduce/ a specific binary artifact. This is illustrated by
+[[https://guix.gnu.org/manual/en/html_node/Invoking-guix-time_002dmachine.html][the
+=time-machine=
+command]][fn:11:https://guix.gnu.org/manual/en/html_node/Invoking-guix-time_002dmachine.html].
+The example below deploys, /at any time on any machine/, the specific
+build artifact of the =python= package as it was defined in this Guix
+commit:
+
+#+begin_example
+guix time-machine -q --commit=d3c3922a8f5d50855165941e19a204d32469006f \
+ -- install python
+#+end_example
+
+#+latex: \noindent
+In other words, because Guix itself defines how artifacts are built,
+**the revision of the Guix source coupled with the package name
+unambiguously identify the package's binary artifact**. As scientists,
+we build on this property to achieve reproducible research workflows, as
+explained in this [[https://doi.org/10.1038/s41597-022-01720-9][2022 article
in /Nature/]][fn:12:/Toward practical
+transparent verifiable and long-term reproducible research using Guix/,
+https://doi.org/10.1038/s41597-022-01720-9]; as engineers, we value this
+property to analyze the systems we are running and determine which known
+vulnerabilities and bugs apply.
+
+Again, a software bill of materials (SBOM) written as a mere list of
+package name/version pairs would fail to capture as much information.
+The *Artifact Dependency Graph (ADG) of OmniBOR*, while less ambiguous,
+falls short in two ways: it is too fine-grained for typical cybersecurity
+applications (at the level of individual source files), and it only
+captures the alleged source/binary correspondence of individual files
+but not the process to go from source to binary.
+
+* Conclusions
+
+Inherent identifiers lend themselves well to unambiguous source code
+identification, as demonstrated by Software Heritage, Guix, and Nix.
+
+However, we believe binary artifacts should instead be treated as the
+result of a computational process; it is that process that needs to be
+fully captured to support *independent verification of the source/binary
+correspondence*. For cybersecurity purposes, recipients of a binary
+artifact must be able to be map it back to its source code (/provenance
+tracking/), with the additional guarantee that they must be able to
+reproduce the entire build process to verify the source/binary
+correspondence (/reproducible builds and full-source bootstrap/). As
+long as binary artifacts result from a reproducible build process,
+itself described as source code, *identifying binary artifacts boils
+down to identifying the source code of their build process*.
+
+These ideas are developed in the 2022 scientific paper
[[https://doi.org/10.22152/programming-journal.org/2023/7/1][/Building a
+Secure Software Supply Chain with GNU
Guix/]][fn:10:https://doi.org/10.22152/programming-journal.org/2023/7/1].
diff --git a/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf
b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf
new file mode 100644
index 0000000..2eef0da
Binary files /dev/null and b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf
differ
diff --git a/doc/cisa-2023-0026-0001/manifest.scm
b/doc/cisa-2023-0026-0001/manifest.scm
new file mode 100644
index 0000000..b672d7d
--- /dev/null
+++ b/doc/cisa-2023-0026-0001/manifest.scm
@@ -0,0 +1,17 @@
+;; Manifest for Org-generated LaTeX.
+
+(specifications->manifest
+ '("rubber"
+
+ "texlive-scheme-basic"
+ "texlive-collection-latexrecommended"
+ "texlive-collection-fontsrecommended"
+
+ "texlive-libertine"
+ "texlive-inconsolata"
+
+ "texlive-wrapfig"
+ "texlive-ulem"
+ "texlive-capt-of"
+ "texlive-hyperref"
+ "texlive-upquote"))
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: doc: Add comment to CISA-2023-0026-0001 on software identification.,
Ludovic Courtès <=