The shared state code uses a checksum, which is a unique signature of a task's inputs, to determine if a task needs to be run again. Because it is a change in a task's inputs that triggers a rerun, the process needs to detect all the inputs to a given task. For shell tasks, this turns out to be fairly easy because the build process generates a "run" shell script for each task and it is possible to create a checksum that gives you a good idea of when the task's data changes.
To complicate the problem, there are things that should not be
included in the checksum.
First, there is the actual specific build path of a given
task - the
WORKDIR
.
It does not matter if the work directory changes because it
should not affect the output for target packages.
Also, the build process has the objective of making native
or cross packages relocatable.
The checksum therefore needs to exclude
WORKDIR
.
The simplistic approach for excluding the work directory is to
set WORKDIR
to some fixed value and
create the checksum for the "run" script.
Another problem results from the "run" scripts containing functions that might or might not get called. The incremental build solution contains code that figures out dependencies between shell functions. This code is used to prune the "run" scripts down to the minimum set, thereby alleviating this problem and making the "run" scripts much more readable as a bonus.
So far we have solutions for shell scripts. What about Python tasks? The same approach applies even though these tasks are more difficult. The process needs to figure out what variables a Python function accesses and what functions it calls. Again, the incremental build solution contains code that first figures out the variable and function dependencies, and then creates a checksum for the data used as the input to the task.
Like the WORKDIR
case, situations exist
where dependencies should be ignored.
For these cases, you can instruct the build process to
ignore a dependency by using a line like the following:
PACKAGE_ARCHS[vardepsexclude] = "MACHINE"
This example ensures that the
PACKAGE_ARCHS
variable does not depend on the value of
MACHINE
,
even if it does reference it.
Equally, there are cases where we need to add dependencies BitBake is not able to find. You can accomplish this by using a line like the following:
PACKAGE_ARCHS[vardeps] = "MACHINE"
This example explicitly adds the MACHINE
variable as a dependency for
PACKAGE_ARCHS
.
Consider a case with in-line Python, for example, where
BitBake is not able to figure out dependencies.
When running in debug mode (i.e. using
-DDD
), BitBake produces output when it
discovers something for which it cannot figure out dependencies.
The Yocto Project team has currently not managed to cover
those dependencies in detail and is aware of the need to fix
this situation.
Thus far, this section has limited discussion to the direct inputs into a task. Information based on direct inputs is referred to as the "basehash" in the code. However, there is still the question of a task's indirect inputs - the things that were already built and present in the Build Directory. The checksum (or signature) for a particular task needs to add the hashes of all the tasks on which the particular task depends. Choosing which dependencies to add is a policy decision. However, the effect is to generate a master checksum that combines the basehash and the hashes of the task's dependencies.
At the code level, there are a variety of ways both the basehash and the dependent task hashes can be influenced. Within the BitBake configuration file, we can give BitBake some extra information to help it construct the basehash. The following statement effectively results in a list of global variable dependency excludes - variables never included in any checksum:
BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH DL_DIR \ SSTATE_DIR THISDIR FILESEXTRAPATHS FILE_DIRNAME HOME LOGNAME SHELL TERM \ USER FILESPATH STAGING_DIR_HOST STAGING_DIR_TARGET COREBASE PRSERV_HOST \ PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \ CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_DISABLE LICENSE_PATH SDKPKGSUFFIX"
The previous example excludes
WORKDIR
since that variable is actually constructed as a path within
TMPDIR
,
which is on the whitelist.
The rules for deciding which hashes of dependent tasks to
include through dependency chains are more complex and are
generally accomplished with a Python function.
The code in meta/lib/oe/sstatesig.py
shows
two examples of this and also illustrates how you can insert
your own policy into the system if so desired.
This file defines the two basic signature generators
OE-Core
uses: "OEBasic" and "OEBasicHash".
By default, there is a dummy "noop" signature handler enabled
in BitBake.
This means that behavior is unchanged from previous versions.
OE-Core uses the "OEBasicHash" signature handler by default
through this setting in the bitbake.conf
file:
BB_SIGNATURE_HANDLER ?= "OEBasicHash"
The "OEBasicHash" BB_SIGNATURE_HANDLER
is the same as the "OEBasic" version but adds the task hash to
the stamp files.
This results in any
Metadata
change that changes the task hash, automatically
causing the task to be run again.
This removes the need to bump
PR
values, and changes to Metadata automatically ripple across
the build.
It is also worth noting that the end result of these signature generators is to make some dependency and hash information available to the build. This information includes:
BB_BASEHASH_task-
taskname
:
The base hashes for each task in the recipe.
BB_BASEHASH_
filename
:
taskname
:
The base hashes for each dependent task.
BBHASHDEPS_
filename
:
taskname
:
The task dependencies for each task.
BB_TASKHASH
:
The hash of the currently running task.