diff options
| author | Michael Opdenacker <michael.opdenacker@bootlin.com> | 2021-05-21 19:07:35 +0200 |
|---|---|---|
| committer | Richard Purdie <richard.purdie@linuxfoundation.org> | 2022-01-10 23:12:43 +0000 |
| commit | 76ab2eab312371294e1a2f550504f3a9d5cef3b1 (patch) | |
| tree | b6634cc973d8ed9d7e2954735bd4e26ac5301ff0 /documentation/overview-manual | |
| parent | 35a14725d67e497a5f68baa7cf860a69a884c642 (diff) | |
| download | poky-76ab2eab312371294e1a2f550504f3a9d5cef3b1.tar.gz | |
manuals: document hash equivalence
(From yocto-docs rev: 7fad0873207980a747f79b2ce29ec0dc6c6c3cdf)
Signed-off-by: Michael Opdenacker <michael.opdenacker@bootlin.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'documentation/overview-manual')
| -rw-r--r-- | documentation/overview-manual/concepts.rst | 132 |
1 files changed, 132 insertions, 0 deletions
diff --git a/documentation/overview-manual/concepts.rst b/documentation/overview-manual/concepts.rst index 6f8a3def69..5698d93759 100644 --- a/documentation/overview-manual/concepts.rst +++ b/documentation/overview-manual/concepts.rst | |||
| @@ -1938,6 +1938,138 @@ another reason why a task-based approach is preferred over a | |||
| 1938 | recipe-based approach, which would have to install the output from every | 1938 | recipe-based approach, which would have to install the output from every |
| 1939 | task. | 1939 | task. |
| 1940 | 1940 | ||
| 1941 | Hash Equivalence | ||
| 1942 | ---------------- | ||
| 1943 | |||
| 1944 | The above section explained how BitBake skips the execution of tasks | ||
| 1945 | whose output can already be found in the Shared State cache. | ||
| 1946 | |||
| 1947 | During a build, it may often be the case that the output / result of a task might | ||
| 1948 | be unchanged despite changes in the task's input values. An example might be | ||
| 1949 | whitespace changes in some input C code. In project terms, this is what we define | ||
| 1950 | as "equivalence". | ||
| 1951 | |||
| 1952 | To keep track of such equivalence, BitBake has to manage three hashes | ||
| 1953 | for each task: | ||
| 1954 | |||
| 1955 | - The *task hash* explained earlier: computed from the recipe metadata, | ||
| 1956 | the task code and the task hash values from its dependencies. | ||
| 1957 | When changes are made, these task hashes are therefore modified, | ||
| 1958 | causing the task to re-execute. The task hashes of tasks depending on this | ||
| 1959 | task are therefore modified too, causing the whole dependency | ||
| 1960 | chain to re-execute. | ||
| 1961 | |||
| 1962 | - The *output hash*, a new hash computed from the output of Shared State tasks, | ||
| 1963 | tasks that save their resulting output to a Shared State tarball. | ||
| 1964 | The mapping between the task hash and its output hash is reported | ||
| 1965 | to a new *Hash Equivalence* server. This mapping is stored in a database | ||
| 1966 | by the server for future reference. | ||
| 1967 | |||
| 1968 | - The *unihash*, a new hash, initially set to the task hash for the task. | ||
| 1969 | This is used to track the *unicity* of task output, and we will explain | ||
| 1970 | how its value is maintained. | ||
| 1971 | |||
| 1972 | When Hash Equivalence is enabled, BitBake computes the task hash | ||
| 1973 | for each task by using the unihash of its dependencies, instead | ||
| 1974 | of their task hash. | ||
| 1975 | |||
| 1976 | Now, imagine that a Shared State task is modified because of a change in | ||
| 1977 | its code or metadata, or because of a change in its dependencies. | ||
| 1978 | Since this modifies its task hash, this task will need re-executing. | ||
| 1979 | Its output hash will therefore be computed again. | ||
| 1980 | |||
| 1981 | Then, the new mapping between the new task hash and its output hash | ||
| 1982 | will be reported to the Hash Equivalence server. The server will | ||
| 1983 | let BitBake know whether this output hash is the same as a previously | ||
| 1984 | reported output hash, for a different task hash. | ||
| 1985 | |||
| 1986 | If the output hash is already known, BitBake will update the task's | ||
| 1987 | unihash to match the original task hash that generated that output. | ||
| 1988 | Thanks to this, the depending tasks will keep a previously recorded | ||
| 1989 | task hash, and BitBake will be able to retrieve their output from | ||
| 1990 | the Shared State cache, instead of re-executing them. Similarly, the | ||
| 1991 | output of further downstream tasks can also be retrieved from Shared | ||
| 1992 | Shate. | ||
| 1993 | |||
| 1994 | If the output hash is unknown, a new entry will be created on the Hash | ||
| 1995 | Equivalence server, matching the task hash to that output. | ||
| 1996 | The depending tasks, still having a new task hash because of the | ||
| 1997 | change, will need to re-execute as expected. The change propagates | ||
| 1998 | to the depending tasks. | ||
| 1999 | |||
| 2000 | To summarize, when Hash Equivalence is enabled, a change in one of the | ||
| 2001 | tasks in BitBake's run queue doesn't have to propagate to all the | ||
| 2002 | downstream tasks that depend on the output of this task, causing a | ||
| 2003 | full rebuild of such tasks, and so on with the next depending tasks. | ||
| 2004 | Instead, when the output of this task remains identical to previously | ||
| 2005 | recorded output, BitBake can safely retrieve all the downstream | ||
| 2006 | task output from the Shared State cache. | ||
| 2007 | |||
| 2008 | .. note:: | ||
| 2009 | |||
| 2010 | Having :doc:`/test-manual/reproducible-builds` is a key ingredient for | ||
| 2011 | the stability of the task's output hash. Therefore, the effectiveness | ||
| 2012 | of Hash Equivalence strongly depends on it. | ||
| 2013 | |||
| 2014 | This applies to multiple scenarios: | ||
| 2015 | |||
| 2016 | - A "trivial" change to a recipe that doesn't impact its generated output, | ||
| 2017 | such as whitespace changes, modifications to unused code paths or | ||
| 2018 | in the ordering of variables. | ||
| 2019 | |||
| 2020 | - Shared library updates, for example to fix a security vulnerability. | ||
| 2021 | For sure, the programs using such a library should be rebuilt, but | ||
| 2022 | their new binaries should remain identical. The corresponding tasks should | ||
| 2023 | have a different output hash because of the change in the hash of their | ||
| 2024 | library dependency, but thanks to their output being identical, Hash | ||
| 2025 | Equivalence will stop the propagation down the dependency chain. | ||
| 2026 | |||
| 2027 | - Native tool updates. Though the depending tasks should be rebuilt, | ||
| 2028 | it's likely that they will generate the same output and be marked | ||
| 2029 | as equivalent. | ||
| 2030 | |||
| 2031 | This mechanism is enabled by default in Poky, and is controlled by three | ||
| 2032 | variables: | ||
| 2033 | |||
| 2034 | - :term:`bitbake:BB_HASHSERVE`, specifying a local or remote Hash | ||
| 2035 | Equivalence server to use. | ||
| 2036 | |||
| 2037 | - :term:`BB_HASHSERVE_UPSTREAM`, when ``BB_HASHSERVE = "auto"``, | ||
| 2038 | allowing to connect the local server to an upstream one. | ||
| 2039 | |||
| 2040 | - :term:`bitbake:BB_SIGNATURE_HANDLER`, which must be set to ``OEEquivHash``. | ||
| 2041 | |||
| 2042 | Therefore, the default configuration in Poky corresponds to the | ||
| 2043 | below settings:: | ||
| 2044 | |||
| 2045 | BB_HASHSERVE = "auto" | ||
| 2046 | BB_SIGNATURE_HANDLER = "OEEquivHash" | ||
| 2047 | |||
| 2048 | Rather than starting a local server, another possibility is to rely | ||
| 2049 | on a Hash Equivalence server on a network, by setting:: | ||
| 2050 | |||
| 2051 | BB_HASHSERVE = "<HOSTNAME>:<PORT>" | ||
| 2052 | |||
| 2053 | .. note:: | ||
| 2054 | |||
| 2055 | The shared Hash Equivalence server needs to be maintained together with the | ||
| 2056 | Shared State cache. Otherwise, the server could report Shared State hashes | ||
| 2057 | that only exist on specific clients. | ||
| 2058 | |||
| 2059 | We therefore recommend that one Hash Equivalence server be set up to | ||
| 2060 | correspond with a given Shared State cache, and to start this server | ||
| 2061 | in *read-only mode*, so that it doesn't store equivalences for | ||
| 2062 | Shared State caches that are local to clients. | ||
| 2063 | |||
| 2064 | See the :term:`BB_HASHSERVE` reference for details about starting | ||
| 2065 | a Hash Equivalence server. | ||
| 2066 | |||
| 2067 | See the `video <https://www.youtube.com/watch?v=zXEdqGS62Wc>`__ | ||
| 2068 | of Joshua Watt's `Hash Equivalence and Reproducible Builds | ||
| 2069 | <https://elinux.org/images/3/37/Hash_Equivalence_and_Reproducible_Builds.pdf>`__ | ||
| 2070 | presentation at ELC 2020 for a very synthetic introduction to the | ||
| 2071 | Hash Equivalence implementation in the Yocto Project. | ||
| 2072 | |||
| 1941 | Automatically Added Runtime Dependencies | 2073 | Automatically Added Runtime Dependencies |
| 1942 | ======================================== | 2074 | ======================================== |
| 1943 | 2075 | ||
