summaryrefslogtreecommitdiffstats
path: root/documentation
diff options
context:
space:
mode:
authorScott Rifenbark <scott.m.rifenbark@intel.com>2011-12-13 08:53:45 -0800
committerRichard Purdie <richard.purdie@linuxfoundation.org>2011-12-16 16:58:40 +0000
commit2ce852ad7b068ac27e589a5204fc0dca036ebebe (patch)
tree39d59a141d9ee2787ad34cda7cb940accd7d47f0 /documentation
parent4378fd205c4544aa2eac02b4e294f154fd7116f8 (diff)
downloadpoky-2ce852ad7b068ac27e589a5204fc0dca036ebebe.tar.gz
documentation/poky-ref-manual/technical-details.xml: more on YOCTO #1500
More work on this bug for sstate. This commit represents the third pass through the new chapter four (Technical Details) that is dedicated to YP components and sstate at the moment. The material is unreviewed by Richard as of yet. (From yocto-docs rev: 3c0e5bac288c05ea3fd93b1d1d5866895c5c2d1e) Signed-off-by: Scott Rifenbark <scott.m.rifenbark@intel.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'documentation')
-rw-r--r--documentation/poky-ref-manual/technical-details.xml381
1 files changed, 378 insertions, 3 deletions
diff --git a/documentation/poky-ref-manual/technical-details.xml b/documentation/poky-ref-manual/technical-details.xml
index b34179533b..1657431495 100644
--- a/documentation/poky-ref-manual/technical-details.xml
+++ b/documentation/poky-ref-manual/technical-details.xml
@@ -151,12 +151,386 @@
151 151
152 <para> 152 <para>
153 By design, the Yocto Project builds everything from scratch unless it can determine that 153 By design, the Yocto Project builds everything from scratch unless it can determine that
154 a given task's inputs have not changed. 154 parts don't need to be rebuilt.
155 While building from scratch ensures that everything is current, it does also 155 Fundamentally, building from scratch is an attraction as it means all parts are
156 mean that a lot of time could be spent rebuiding things that don't necessarily need built. 156 built fresh and there is no possibility of stale data causing problems.
157 When developers hit problems, they typically default back to building from scratch
158 so they know the state of things from the start.
159 </para>
160
161 <para>
162 Building an image from scratch is both an advantage and a disadvantage to the process.
163 As mentioned in the previous paragraph, building from scratch ensures that
164 everything is current and starts from a known state.
165 However, building from scratch also takes much longer as it generally means
166 rebuiding things that don't necessarily need rebuilt.
167 </para>
168
169 <para>
170 The Yocto Project implements shared state code that supports incremental builds.
171 The implementation of the shared state code answers the following questions that
172 were fundamental roadblocks within the Yocto Project incremental build support system:
173 <itemizedlist>
174 <listitem>What pieces of the system have changed and what pieces have not changed?</listitem>
175 <listitem>How are changed pieces of software removed and replaced?</listitem>
176 <listitem>How are pre-built components that don't need to be rebuilt from scratch
177 used when they are available?</listitem>
178 </itemizedlist>
157 </para> 179 </para>
158 180
159 <para> 181 <para>
182 For the first question, the build system detects changes in the "inputs" to a given task by
183 creating a checksum (or signature) of the task's inputs.
184 If the checksum changes, the system assumes the inputs have changed and the task needs to be
185 rerun.
186 For the second question, the shared state (sstate) code tracks which tasks add which output
187 to the build process.
188 This means the output from a given task can be removed, upgraded or otherwise manipulated.
189 The third question is partly addressed by the solution for the second question
190 assuming the build system can fetch the sstate objects from remote locations and
191 install them if they are deemed to be valid.
192 </para>
193
194 <para>
195 The rest of this section goes into detail about the overall incremental build
196 architecture, the checksums (signatures), shared state, and some tips and tricks.
197 </para>
198
199 <section id='overall-architecture'>
200 <title>Overall Architecture</title>
201
202 <para>
203 When determining what parts of the system need to be built, the Yocto Project
204 uses a per-task basis and does not use a per-recipe basis.
205 You might wonder why using a per-task basis is preferred over a per-recipe basis.
206 To help explain, consider having the IPK packaging backend enabled and then switching to DEB.
207 In this case, <filename>do_install</filename> and <filename>do_package</filename>
208 output are still valid.
209 However, with a per-recipe approach, the build would not include the
210 <filename>.deb</filename> files.
211 Consequently, you would have to invalidate the whole build and rerun it.
212 Rerunning everything is not the best situation.
213 Also in this case, the core must be "taught" much about specific tasks.
214 This methodology does not scale well and does not allow users to easily add new tasks
215 in layers or as external recipes without touching the packaged-staging core.
216 </para>
217 </section>
218
219 <section id='checksums'>
220 <title>Checksums (Signatures)</title>
221
222 <para>
223 The Yocto Project uses a checksum, which is a unique signature of a task's
224 inputs, to determine if a task needs to be run again.
225 Because it is a change in a task's inputs that trigger a rerun, the process
226 needs to detect all the inputs to a given task.
227 For shell tasks, this turns out to be fairly easy because
228 the build process generates a "run" shell script for each task and
229 it is possible to create a checksum that gives you a good idea of when
230 the task's data changes.
231 </para>
232
233 <para>
234 To complicate the problem, there are things that should not be included in
235 the checksum.
236 First, there is the actual specific build path of a given task -
237 the <filename>WORKDIR</filename>.
238 It does not matter if the working directory changes because it should not
239 affect the output for target packages.
240 Also, the build process has the objective of making native/cross packages relocatable.
241 The checksum therefore needs to exclude <filename>WORKDIR</filename>.
242 The simplistic approach for excluding the worknig directory is to set
243 <filename>WORKDIR</filename> to some fixed value and create the checksum
244 for the "run" script.
245 </para>
246
247 <para>
248 Another problem results from the "run" scripts containing functions that
249 might or might not get called.
250 The Yocto Project contains code that figures out dependencies between shell
251 functions.
252 This code is used to prune the "run" scripts down to the minimum set,
253 thereby alleviating this problem and making the "run" scripts much more
254 readable as a bonus.
255 </para>
256
257 <para>
258 So far we have solutions for shell scripts.
259 What about python tasks?
260 Handling these tasks are more difficult but the the same approach
261 applies.
262 The process needs to figure out what variables a python function accesses
263 and what functions it calls.
264 Again, the Yocto Project contains code that first figures out the variable and function
265 dependencies, and then creates a checksum for the data used as the input to
266 the task.
267 </para>
268
269 <para>
270 Like the <filename>WORKDIR</filename> case, situations exist where dependencies
271 should be ignored.
272 For these cases, you can instruct the build process to ignore a dependency
273 by using a line like the following:
274 <literallayout class='monospaced'>
275 PACKAGE_ARCHS[vardepsexclude] = "MACHINE"
276 </literallayout>
277 This example ensures that the <filename>PACKAGE_ARCHS</filename> variable does not
278 depend on the value of <filename>MACHINE</filename>, even if it does reference it.
279 </para>
280
281 <para>
282 Equally, there are cases where we need to add in dependencies
283 BitBake is not able to find.
284 You can accomplish this by using a line like the following:
285 <literallayout class='monospaced'>
286 PACKAGE_ARCHS[vardeps] = "MACHINE"
287 </literallayout>
288 This example explicitly adds the <filename>MACHINE</filename> variable as a
289 dependency for <filename>PACKAGE_ARCHS</filename>.
290 </para>
291
292 <para>
293 Consider a case with inline python, for example, where BitBake is not
294 able to figure out dependencies.
295 When running in debug mode (i.e. using <filename>-DDD</filename>), BitBake
296 produces output when it discovers something for which it cannot figure out
297 dependencies.
298 The Yocto Project team has currently not managed to cover those dependencies
299 in detail and is aware of the need to fix this situation.
300 </para>
301
302 <para>
303 Thus far, this section has limited discussion to the direct inputs into a
304 task.
305 Information based on direct inputs is referred to as the "basehash" in the code.
306 However, there is still the question of a task's indirect inputs, the things that
307 were already built and present in the build directory.
308 The checksum (or signature) for a particular task needs to add the hashes of all the
309 tasks the particular task depends upon.
310 Choosing which dependencies to add is a policy decision.
311 However, the effect is to generate a master checksum that combines the
312 basehash and the hashes of the task's dependencies.
313 </para>
314
315 <para>
316 While figuring out the dependencies and creating these checksums is good,
317 what does the Yocto Project build system do with the checksum information?
318 The build system uses a signature handler that is responsible for
319 processing the checksum information.
320 By default, there is a dummy "noop" signature handler enabled in BitBake.
321 This means that behaviour is unchanged from previous versions.
322 OECore uses the "basic" signature handler through this setting in the
323 <filename>bitbake.conf</filename> file:
324 <literallayout class='monospaced'>
325 BB_SIGNATURE_HANDLER ?= "basic"
326 </literallayout>
327 Also within the BitBake configuration file, we can give BitBake
328 some extra information to help it handle this information.
329 The following statements effectively result in a list of global
330 list of variable dependency excludes - variables never included in
331 any checksum:
332 <literallayout class='monospaced'>
333 BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH"
334 BB_HASHBASE_WHITELIST += "DL_DIR SSTATE_DIR THISDIR FILESEXTRAPATHS"
335 BB_HASHBASE_WHITELIST += "FILE_DIRNAME HOME LOGNAME SHELL TERM USER"
336 BB_HASHBASE_WHITELIST += "FILESPATH USERNAME STAGING_DIR_HOST STAGING_DIR_TARGET"
337 BB_HASHTASK_WHITELIST += "(.*-cross$|.*-native$|.*-cross-initial$| \
338 .*-cross-intermediate$|^virtual:native:.*|^virtual:nativesdk:.*)"
339 </literallayout>
340 This example is actually where <filename>WORKDIR</filename>
341 is excluded since <filename>WORKDIR</filename> is constructed as a
342 path within <filename>TMPDIR</filename>, which is on the whitelist.
343 </para>
344
345 <para>
346 The <filename>BB_HASHTASK_WHITELIST</filename> covers dependent tasks and
347 excludes certain kinds of tasks from the dependency chains.
348 The effect of the previous example is to isolate the native, target,
349 and cross components.
350 So, for example, toolchain changes do not force a rebuild of the whole system.
351 </para>
352
353 <para>
354 The end result of the "basic" handler is to make some dependency and
355 hash information available to the build.
356 This includes:
357 <literallayout class='monospaced'>
358 BB_BASEHASH_task-&lt;taskname&gt; - the base hashes for each task in the recipe
359 BB_BASEHASH_&lt;filename:taskname&gt; - the base hashes for each dependent task
360 BBHASHDEPS_&lt;filename:taskname&gt; - The task dependencies for each task
361 BB_TASKHASH - the hash of the currently running task
362 </literallayout>
363 There is also a "basichash" <filename>BB_SIGNATURE_HANDLER</filename>,
364 which is the same as the basic version but adds the task hash to the stamp files.
365 This results in any metadata change that changes the task hash,
366 automatically causing the task to be run again.
367 This removes the need to bump <filename>PR</filename>
368 values and changes to metadata automatically ripple across the build.
369 Currently, this behavior is not the default behavior.
370 However, it is likely that the Yocto Project team will go forward with this
371 behavior in the future since all the functionality exists.
372 The reason for the delay is the potential impact to the distribution feed
373 creation as they need increasing <filename>PR</filename> fields
374 and the Yocto Project currently lacks a mechanism to automate incrementing
375 this field.
376 </para>
377 </section>
378
379 <section id='shared-state'>
380 <title>Shared State</title>
381
382 <para>
383 Checksums and dependencies as discussed in the previous section solves half the
384 problem.
385 The other part of the problem is being able to use checksum information during the build
386 and being able to reuse or rebuild specific components.
387 </para>
388
389 <para>
390 The shared state class (<filename>sstate.bbclass</filename>)
391 is a relatively generic implementation of how to
392 "capture" a snapshot of a given task.
393 The idea is that the build process does not care about the source of a
394 task's output.
395 Output could be freshly built or it could be downloaded and unpacked from
396 somewhere - the build process doesn't need to worry about its source.
397 </para>
398
399 <para>
400 There are two types of output, one is just about creating a directory
401 in <filename>WORKDIR</filename>.
402 A good example is the output of either <filename>do_install</filename> or
403 <filename>do_package</filename>.
404 The other type of output occurs when a set of data is merged into a shared directory
405 tree such as the sysroot.
406 </para>
407
408 <para>
409 The Yocto Project team has tried to keep the details of the implementation hidden in
410 <filename>sstate.bbclass</filename>.
411 From a user's perspective, adding shared state wrapping to a task
412 is as simple as this <filename>do_deploy</filename> example taken from
413 <filename>do_deploy.bbclass</filename>:
414 <literallayout class='monospaced'>
415 DEPLOYDIR = "${WORKDIR}/deploy-${PN}"
416 SSTATETASKS += "do_deploy"
417 do_deploy[sstate-name] = "deploy"
418 do_deploy[sstate-inputdirs] = "${DEPLOYDIR}"
419 do_deploy[sstate-outputdirs] = "${DEPLOY_DIR_IMAGE}"
420
421 python do_deploy_setscene () {
422 sstate_setscene(d)
423 }
424 addtask do_deploy_setscene
425 </literallayout>
426 In the example, we add some extra flags to the task, a name field ("deploy"), an
427 input directory where the task sends data, and the output
428 directory where the data from the task should eventually be copied.
429 We also add a <filename>_setscene</filename> variant of the task and add the task
430 name to the <filename>SSTATETASKS</filename> list.
431 </para>
432
433 <para>
434 If you have a directory whose contents you need to preserve,
435 you can do this with a line like the following:
436 <literallayout class='monospaced'>
437 do_package[sstate-plaindirs] = "${PKGD} ${PKGDEST}"
438 </literallayout>
439 This method, as well as the following example, also works for mutliple directories.
440 <literallayout class='monospaced'>
441 do_package[sstate-inputdirs] = "${PKGDESTWORK} ${SHLIBSWORKDIR}"
442 do_package[sstate-outputdirs] = "${PKGDATA_DIR} ${SHLIBSDIR}"
443 do_package[sstate-lockfile] = "${PACKAGELOCK}"
444 </literallayout>
445 These methods also include the ability to take a lockfile when manipulating
446 shared state directory structures since some cases are sensitive to file
447 additions or removals.
448 </para>
449
450 <para>
451 Behind the scenes, the shared state code works by looking in
452 <filename>SSTATE_DIR</filename> and
453 <filename>SSTATE_MIRRORS</filename> for shared state files.
454 Here is an example:
455 <literallayout class='monospaced'>
456 SSTATE_MIRRORS ?= "\
457 file://.* http://someserver.tld/share/sstate/ \n \
458 file://.* file:///some/local/dir/sstate/"
459 </literallayout>
460 </para>
461
462 <para>
463 The shared state package validity can be detected just by looking at the
464 filename since the filename contains the task checksum (or signature) as
465 described earlier in this section.
466 If a valid shared state package is found, the build process downloads it
467 and uses it to accelerate the task.
468 </para>
469
470 <para>
471 The build processes uses the <filename>*_setscene</filename> tasks
472 for the task acceleration phase.
473 BitBake goes through this phase before the main execution code and tries
474 to accelerate any tasks for which it can find shared state packages.
475 If a shared state package for a task is available, the shared state
476 package is used.
477 This means the task and any tasks on which it is dependent are not
478 executed.
479 </para>
480
481 <para>
482 As a real world example, the aim is when building an IPK-based image,
483 only the <filename>do_package_write_ipk</filename> tasks would have their
484 shared state packages fetched and extracted.
485 Since the sysroot is not used, it would never get extracted.
486 This is another reason to prefer the task-based approach over a
487 recipe-based approach, which would have to install the output from every task.
488 </para>
489 </section>
490
491 <section id='tips-and-tricks'>
492 <title>Tips and Tricks</title>
493
494 <para>
495 The code in the Yocto Project that supports incremental builds is not
496 simple code.
497 Consequently, when things go wrong, debugging needs to be straightforward.
498 Because of this, the Yocto Project team included strong debugging
499 tools.
500 </para>
501
502 <para>
503 First, whenever a shared state package is written out, so is a
504 corresponding <filename>.siginfo</filename> file.
505 This practice results in a pickled python database of all
506 the metadata that went into creating the hash for a given shared state
507 package.
508 </para>
509
510 <para>
511 Second, if BitBake is run with the <filename>--dump-signatures</filename>
512 (or <filename>-S</filename>) option, BitBake dumps out
513 <filename>.siginfo</filename> files in
514 the stamp directory for every task it would have executed instead of
515 building the target package specified.
516 </para>
517
518 <para>
519 Finally, there is a <filename>bitbake-diffsigs</filename> command that
520 can process these <filename>.siginfo</filename> files.
521 If one file is specified, it will dump out the dependency
522 information in the file.
523 If two files are specified, it will compare the
524 two files and dump out the differences between the two.
525 This allows the question of "What changed between X and Y?" to be
526 answered easily.
527 </para>
528 </section>
529</section>
530
531<!--
532
533 <para>
160 The Yocto Project build process uses a shared state caching scheme to avoid having to 534 The Yocto Project build process uses a shared state caching scheme to avoid having to
161 rebuild software when it is not necessary. 535 rebuild software when it is not necessary.
162 Because the build time for a Yocto image can be significant, it is helpful to try and 536 Because the build time for a Yocto image can be significant, it is helpful to try and
@@ -222,6 +596,7 @@
222 <ulink url='http://git.yoctoproject.org/cgit.cgi/poky/commit/meta/classes/package.bbclass?id=737f8bbb4f27b4837047cb9b4fbfe01dfde36d54'>commit</ulink>. 596 <ulink url='http://git.yoctoproject.org/cgit.cgi/poky/commit/meta/classes/package.bbclass?id=737f8bbb4f27b4837047cb9b4fbfe01dfde36d54'>commit</ulink>.
223 </note> 597 </note>
224</section> 598</section>
599-->
225 600
226</chapter> 601</chapter>
227<!-- 602<!--