summaryrefslogtreecommitdiffstats
path: root/documentation
diff options
context:
space:
mode:
authorMichael Opdenacker <michael.opdenacker@bootlin.com>2021-05-21 19:07:35 +0200
committerRichard Purdie <richard.purdie@linuxfoundation.org>2022-01-10 23:12:43 +0000
commit76ab2eab312371294e1a2f550504f3a9d5cef3b1 (patch)
treeb6634cc973d8ed9d7e2954735bd4e26ac5301ff0 /documentation
parent35a14725d67e497a5f68baa7cf860a69a884c642 (diff)
downloadpoky-76ab2eab312371294e1a2f550504f3a9d5cef3b1.tar.gz
manuals: document hash equivalence
(From yocto-docs rev: 7fad0873207980a747f79b2ce29ec0dc6c6c3cdf) Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'documentation')
-rw-r--r--documentation/overview-manual/concepts.rst132
-rw-r--r--documentation/test-manual/reproducible-builds.rst8
2 files changed, 136 insertions, 4 deletions
diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst
index 6f8a3def69..5698d93759 100644
--- a/documentation/overview-manual/concepts.rst
+++ b/documentation/overview-manual/concepts.rst
@@ -1938,6 +1938,138 @@ another reason why a task-based approach is preferred over a
1938recipe-based approach, which would have to install the output from every 1938recipe-based approach, which would have to install the output from every
1939task. 1939task.
1940 1940
1941Hash Equivalence
1942----------------
1943
1944The above section explained how BitBake skips the execution of tasks
1945whose output can already be found in the Shared State cache.
1946
1947During a build, it may often be the case that the output / result of a task might
1948be unchanged despite changes in the task's input values. An example might be
1949whitespace changes in some input C code. In project terms, this is what we define
1950as "equivalence".
1951
1952To keep track of such equivalence, BitBake has to manage three hashes
1953for each task:
1954
1955- The *task hash* explained earlier: computed from the recipe metadata,
1956 the task code and the task hash values from its dependencies.
1957 When changes are made, these task hashes are therefore modified,
1958 causing the task to re-execute. The task hashes of tasks depending on this
1959 task are therefore modified too, causing the whole dependency
1960 chain to re-execute.
1961
1962- The *output hash*, a new hash computed from the output of Shared State tasks,
1963 tasks that save their resulting output to a Shared State tarball.
1964 The mapping between the task hash and its output hash is reported
1965 to a new *Hash Equivalence* server. This mapping is stored in a database
1966 by the server for future reference.
1967
1968- The *unihash*, a new hash, initially set to the task hash for the task.
1969 This is used to track the *unicity* of task output, and we will explain
1970 how its value is maintained.
1971
1972When Hash Equivalence is enabled, BitBake computes the task hash
1973for each task by using the unihash of its dependencies, instead
1974of their task hash.
1975
1976Now, imagine that a Shared State task is modified because of a change in
1977its code or metadata, or because of a change in its dependencies.
1978Since this modifies its task hash, this task will need re-executing.
1979Its output hash will therefore be computed again.
1980
1981Then, the new mapping between the new task hash and its output hash
1982will be reported to the Hash Equivalence server. The server will
1983let BitBake know whether this output hash is the same as a previously
1984reported output hash, for a different task hash.
1985
1986If the output hash is already known, BitBake will update the task's
1987unihash to match the original task hash that generated that output.
1988Thanks to this, the depending tasks will keep a previously recorded
1989task hash, and BitBake will be able to retrieve their output from
1990the Shared State cache, instead of re-executing them. Similarly, the
1991output of further downstream tasks can also be retrieved from Shared
1992Shate.
1993
1994If the output hash is unknown, a new entry will be created on the Hash
1995Equivalence server, matching the task hash to that output.
1996The depending tasks, still having a new task hash because of the
1997change, will need to re-execute as expected. The change propagates
1998to the depending tasks.
1999
2000To summarize, when Hash Equivalence is enabled, a change in one of the
2001tasks in BitBake's run queue doesn't have to propagate to all the
2002downstream tasks that depend on the output of this task, causing a
2003full rebuild of such tasks, and so on with the next depending tasks.
2004Instead, when the output of this task remains identical to previously
2005recorded output, BitBake can safely retrieve all the downstream
2006task output from the Shared State cache.
2007
2008.. note::
2009
2010 Having :doc:`/test-manual/reproducible-builds` is a key ingredient for
2011 the stability of the task's output hash. Therefore, the effectiveness
2012 of Hash Equivalence strongly depends on it.
2013
2014This applies to multiple scenarios:
2015
2016- A "trivial" change to a recipe that doesn't impact its generated output,
2017 such as whitespace changes, modifications to unused code paths or
2018 in the ordering of variables.
2019
2020- Shared library updates, for example to fix a security vulnerability.
2021 For sure, the programs using such a library should be rebuilt, but
2022 their new binaries should remain identical. The corresponding tasks should
2023 have a different output hash because of the change in the hash of their
2024 library dependency, but thanks to their output being identical, Hash
2025 Equivalence will stop the propagation down the dependency chain.
2026
2027- Native tool updates. Though the depending tasks should be rebuilt,
2028 it's likely that they will generate the same output and be marked
2029 as equivalent.
2030
2031This mechanism is enabled by default in Poky, and is controlled by three
2032variables:
2033
2034- :term:`bitbake:BB_HASHSERVE`, specifying a local or remote Hash
2035 Equivalence server to use.
2036
2037- :term:`BB_HASHSERVE_UPSTREAM`, when ``BB_HASHSERVE = "auto"``,
2038 allowing to connect the local server to an upstream one.
2039
2040- :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set to ``OEEquivHash``.
2041
2042Therefore, the default configuration in Poky corresponds to the
2043below settings::
2044
2045 BB_HASHSERVE = "auto"
2046 BB_SIGNATURE_HANDLER = "OEEquivHash"
2047
2048Rather than starting a local server, another possibility is to rely
2049on a Hash Equivalence server on a network, by setting::
2050
2051 BB_HASHSERVE = "<HOSTNAME>:<PORT>"
2052
2053.. note::
2054
2055 The shared Hash Equivalence server needs to be maintained together with the
2056 Shared State cache. Otherwise, the server could report Shared State hashes
2057 that only exist on specific clients.
2058
2059 We therefore recommend that one Hash Equivalence server be set up to
2060 correspond with a given Shared State cache, and to start this server
2061 in *read-only mode*, so that it doesn't store equivalences for
2062 Shared State caches that are local to clients.
2063
2064 See the :term:`BB_HASHSERVE` reference for details about starting
2065 a Hash Equivalence server.
2066
2067See the `video <https://www.youtube.com/watch?v=zXEdqGS62Wc>`__
2068of Joshua Watt's `Hash Equivalence and Reproducible Builds
2069<https://elinux.org/images/3/37/Hash_Equivalence_and_Reproducible_Builds.pdf>`__
2070presentation at ELC 2020 for a very synthetic introduction to the
2071Hash Equivalence implementation in the Yocto Project.
2072
1941Automatically Added Runtime Dependencies 2073Automatically Added Runtime Dependencies
1942======================================== 2074========================================
1943 2075
diff --git a/documentation/test-manual/reproducible-builds.rst b/documentation/test-manual/reproducible-builds.rst
index 349cd1953e..5977366c9e 100644
--- a/documentation/test-manual/reproducible-builds.rst
+++ b/documentation/test-manual/reproducible-builds.rst
@@ -33,10 +33,10 @@ need to rebuild to add a security fix. If this happens, only the components that
33have been modified should change at the binary level. This would lead to much 33have been modified should change at the binary level. This would lead to much
34easier and clearer bounds on where validation is needed. 34easier and clearer bounds on where validation is needed.
35 35
36This also gives an additional benefit to the project builds themselves, our hash 36This also gives an additional benefit to the project builds themselves, our
37equivalence for :ref:`Shared State <overview-manual/concepts:Shared State>` 37:ref:`overview-manual/concepts:Hash Equivalence` for
38object reuse works much more effectively when the binary output remains the 38:ref:`overview-manual/concepts:Shared State` object reuse works much more
39same. 39effectively when the binary output remains the same.
40 40
41.. note:: 41.. note::
42 42