diff options
author | Michael Opdenacker <michael.opdenacker@bootlin.com> | 2021-05-21 19:07:35 +0200 |
---|---|---|
committer | Richard Purdie <richard.purdie@linuxfoundation.org> | 2022-01-10 23:12:43 +0000 |
commit | 76ab2eab312371294e1a2f550504f3a9d5cef3b1 (patch) | |
tree | b6634cc973d8ed9d7e2954735bd4e26ac5301ff0 /documentation | |
parent | 35a14725d67e497a5f68baa7cf860a69a884c642 (diff) | |
download | poky-76ab2eab312371294e1a2f550504f3a9d5cef3b1.tar.gz |
manuals: document hash equivalence
(From yocto-docs rev: 7fad0873207980a747f79b2ce29ec0dc6c6c3cdf)
Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'documentation')
-rw-r--r-- | documentation/overview-manual/concepts.rst | 132 | ||||
-rw-r--r-- | documentation/test-manual/reproducible-builds.rst | 8 |
2 files changed, 136 insertions, 4 deletions
diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst index 6f8a3def69..5698d93759 100644 --- a/documentation/overview-manual/concepts.rst +++ b/documentation/overview-manual/concepts.rst | |||
@@ -1938,6 +1938,138 @@ another reason why a task-based approach is preferred over a | |||
1938 | recipe-based approach, which would have to install the output from every | 1938 | recipe-based approach, which would have to install the output from every |
1939 | task. | 1939 | task. |
1940 | 1940 | ||
1941 | Hash Equivalence | ||
1942 | ---------------- | ||
1943 | |||
1944 | The above section explained how BitBake skips the execution of tasks | ||
1945 | whose output can already be found in the Shared State cache. | ||
1946 | |||
1947 | During a build, it may often be the case that the output / result of a task might | ||
1948 | be unchanged despite changes in the task's input values. An example might be | ||
1949 | whitespace changes in some input C code. In project terms, this is what we define | ||
1950 | as "equivalence". | ||
1951 | |||
1952 | To keep track of such equivalence, BitBake has to manage three hashes | ||
1953 | for each task: | ||
1954 | |||
1955 | - The *task hash* explained earlier: computed from the recipe metadata, | ||
1956 | the task code and the task hash values from its dependencies. | ||
1957 | When changes are made, these task hashes are therefore modified, | ||
1958 | causing the task to re-execute. The task hashes of tasks depending on this | ||
1959 | task are therefore modified too, causing the whole dependency | ||
1960 | chain to re-execute. | ||
1961 | |||
1962 | - The *output hash*, a new hash computed from the output of Shared State tasks, | ||
1963 | tasks that save their resulting output to a Shared State tarball. | ||
1964 | The mapping between the task hash and its output hash is reported | ||
1965 | to a new *Hash Equivalence* server. This mapping is stored in a database | ||
1966 | by the server for future reference. | ||
1967 | |||
1968 | - The *unihash*, a new hash, initially set to the task hash for the task. | ||
1969 | This is used to track the *unicity* of task output, and we will explain | ||
1970 | how its value is maintained. | ||
1971 | |||
1972 | When Hash Equivalence is enabled, BitBake computes the task hash | ||
1973 | for each task by using the unihash of its dependencies, instead | ||
1974 | of their task hash. | ||
1975 | |||
1976 | Now, imagine that a Shared State task is modified because of a change in | ||
1977 | its code or metadata, or because of a change in its dependencies. | ||
1978 | Since this modifies its task hash, this task will need re-executing. | ||
1979 | Its output hash will therefore be computed again. | ||
1980 | |||
1981 | Then, the new mapping between the new task hash and its output hash | ||
1982 | will be reported to the Hash Equivalence server. The server will | ||
1983 | let BitBake know whether this output hash is the same as a previously | ||
1984 | reported output hash, for a different task hash. | ||
1985 | |||
1986 | If the output hash is already known, BitBake will update the task's | ||
1987 | unihash to match the original task hash that generated that output. | ||
1988 | Thanks to this, the depending tasks will keep a previously recorded | ||
1989 | task hash, and BitBake will be able to retrieve their output from | ||
1990 | the Shared State cache, instead of re-executing them. Similarly, the | ||
1991 | output of further downstream tasks can also be retrieved from Shared | ||
1992 | Shate. | ||
1993 | |||
1994 | If the output hash is unknown, a new entry will be created on the Hash | ||
1995 | Equivalence server, matching the task hash to that output. | ||
1996 | The depending tasks, still having a new task hash because of the | ||
1997 | change, will need to re-execute as expected. The change propagates | ||
1998 | to the depending tasks. | ||
1999 | |||
2000 | To summarize, when Hash Equivalence is enabled, a change in one of the | ||
2001 | tasks in BitBake's run queue doesn't have to propagate to all the | ||
2002 | downstream tasks that depend on the output of this task, causing a | ||
2003 | full rebuild of such tasks, and so on with the next depending tasks. | ||
2004 | Instead, when the output of this task remains identical to previously | ||
2005 | recorded output, BitBake can safely retrieve all the downstream | ||
2006 | task output from the Shared State cache. | ||
2007 | |||
2008 | .. note:: | ||
2009 | |||
2010 | Having :doc:`/test-manual/reproducible-builds` is a key ingredient for | ||
2011 | the stability of the task's output hash. Therefore, the effectiveness | ||
2012 | of Hash Equivalence strongly depends on it. | ||
2013 | |||
2014 | This applies to multiple scenarios: | ||
2015 | |||
2016 | - A "trivial" change to a recipe that doesn't impact its generated output, | ||
2017 | such as whitespace changes, modifications to unused code paths or | ||
2018 | in the ordering of variables. | ||
2019 | |||
2020 | - Shared library updates, for example to fix a security vulnerability. | ||
2021 | For sure, the programs using such a library should be rebuilt, but | ||
2022 | their new binaries should remain identical. The corresponding tasks should | ||
2023 | have a different output hash because of the change in the hash of their | ||
2024 | library dependency, but thanks to their output being identical, Hash | ||
2025 | Equivalence will stop the propagation down the dependency chain. | ||
2026 | |||
2027 | - Native tool updates. Though the depending tasks should be rebuilt, | ||
2028 | it's likely that they will generate the same output and be marked | ||
2029 | as equivalent. | ||
2030 | |||
2031 | This mechanism is enabled by default in Poky, and is controlled by three | ||
2032 | variables: | ||
2033 | |||
2034 | - :term:`bitbake:BB_HASHSERVE`, specifying a local or remote Hash | ||
2035 | Equivalence server to use. | ||
2036 | |||
2037 | - :term:`BB_HASHSERVE_UPSTREAM`, when ``BB_HASHSERVE = "auto"``, | ||
2038 | allowing to connect the local server to an upstream one. | ||
2039 | |||
2040 | - :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set to ``OEEquivHash``. | ||
2041 | |||
2042 | Therefore, the default configuration in Poky corresponds to the | ||
2043 | below settings:: | ||
2044 | |||
2045 | BB_HASHSERVE = "auto" | ||
2046 | BB_SIGNATURE_HANDLER = "OEEquivHash" | ||
2047 | |||
2048 | Rather than starting a local server, another possibility is to rely | ||
2049 | on a Hash Equivalence server on a network, by setting:: | ||
2050 | |||
2051 | BB_HASHSERVE = "<HOSTNAME>:<PORT>" | ||
2052 | |||
2053 | .. note:: | ||
2054 | |||
2055 | The shared Hash Equivalence server needs to be maintained together with the | ||
2056 | Shared State cache. Otherwise, the server could report Shared State hashes | ||
2057 | that only exist on specific clients. | ||
2058 | |||
2059 | We therefore recommend that one Hash Equivalence server be set up to | ||
2060 | correspond with a given Shared State cache, and to start this server | ||
2061 | in *read-only mode*, so that it doesn't store equivalences for | ||
2062 | Shared State caches that are local to clients. | ||
2063 | |||
2064 | See the :term:`BB_HASHSERVE` reference for details about starting | ||
2065 | a Hash Equivalence server. | ||
2066 | |||
2067 | See the `video <https://www.youtube.com/watch?v=zXEdqGS62Wc>`__ | ||
2068 | of Joshua Watt's `Hash Equivalence and Reproducible Builds | ||
2069 | <https://elinux.org/images/3/37/Hash_Equivalence_and_Reproducible_Builds.pdf>`__ | ||
2070 | presentation at ELC 2020 for a very synthetic introduction to the | ||
2071 | Hash Equivalence implementation in the Yocto Project. | ||
2072 | |||
1941 | Automatically Added Runtime Dependencies | 2073 | Automatically Added Runtime Dependencies |
1942 | ======================================== | 2074 | ======================================== |
1943 | 2075 | ||
diff --git a/documentation/test-manual/reproducible-builds.rst b/documentation/test-manual/reproducible-builds.rst index 349cd1953e..5977366c9e 100644 --- a/documentation/test-manual/reproducible-builds.rst +++ b/documentation/test-manual/reproducible-builds.rst | |||
@@ -33,10 +33,10 @@ need to rebuild to add a security fix. If this happens, only the components that | |||
33 | have been modified should change at the binary level. This would lead to much | 33 | have been modified should change at the binary level. This would lead to much |
34 | easier and clearer bounds on where validation is needed. | 34 | easier and clearer bounds on where validation is needed. |
35 | 35 | ||
36 | This also gives an additional benefit to the project builds themselves, our hash | 36 | This also gives an additional benefit to the project builds themselves, our |
37 | equivalence for :ref:`Shared State <overview-manual/concepts:Shared State>` | 37 | :ref:`overview-manual/concepts:Hash Equivalence` for |
38 | object reuse works much more effectively when the binary output remains the | 38 | :ref:`overview-manual/concepts:Shared State` object reuse works much more |
39 | same. | 39 | effectively when the binary output remains the same. |
40 | 40 | ||
41 | .. note:: | 41 | .. note:: |
42 | 42 | ||