File Download Support BitBake's fetch module is a standalone piece of library code that deals with the intricacies of downloading source code and files from remote systems. Fetching source code is one of the corner stones of building software. As such, this module forms an important part of BitBake. The current fetch module is called "fetch2" and refers to the fact that it is the second major version of the API. The original version is obsolete and removed from the codebase. Thus, in all cases, "fetch" refers to "fetch2" in this manual.
The Download (Fetch) BitBake takes several steps when fetching source code or files. The fetcher codebase deals with two distinct processes in order: obtaining the files from somewhere (cached or otherwise) and then unpacking those files into a specific location and perhaps in a specific way. Getting and unpacking the files is often optionally followed by patching. Patching, however, is not covered by this module. The code to execute the first part of this process, a fetch, looks something like the following: src_uri = (d.getVar('SRC_URI', True) or "").split() fetcher = bb.fetch2.Fetch(src_uri, d) fetcher.download() This code sets up an instance of the fetch class. The instance uses a space-separated list of URLs from the SRC_URI variable and then calls the download method to download the files. The instantiation of the fetch class is usually followed by: rootdir = l.getVar('WORKDIR', True) fetcher.unpack(rootdir) This code unpacks the downloaded files to the specified by WORKDIR. For convenience, the naming in these examples matches the variables used by OpenEmbedded. The SRC_URI and WORKDIR variables are not coded into the fetcher. They variables can (and are) called with different variable names. In OpenEmbedded for example, the shared state (sstate) code uses the fetch module to fetch the sstate files. When the download() method is called, BitBake tries to fulfill the URLs by looking for source files in a specific search order: Pre-mirror Sites: BitBake first uses pre-mirrors to try and find source files. These locations are defined using the PREMIRRORS variable. Source URI: If pre-mirrors fail, BitBake uses the original URL (e.g from SRC_URI). Mirror Sites: If fetch failures occur, BitBake next uses mirror location as defined by the MIRRORS variable. For each URL passed to the fetcher, the fetcher calls the submodule that handles that particular URL type. This behavior can be the source of some confusion when you are providing URLs for the SRC_URI variable. Consider the following two URLs: http://git.yoctoproject.org/git/poky;protocol=git git://git.yoctoproject.org/git/poky;protocol=http In the former case, the URL is passed to the wget fetcher, which does not understand "git". Therefore, the latter case is the correct form since the Git fetcher does know how to use HTTP as a transport. Here are some examples that show commonly used mirror definitions: PREMIRRORS ?= "\ bzr://.*/.* http://somemirror.org/sources/ \n \ cvs://.*/.* http://somemirror.org/sources/ \n \ git://.*/.* http://somemirror.org/sources/ \n \ hg://.*/.* http://somemirror.org/sources/ \n \ osc://.*/.* http://somemirror.org/sources/ \n \ p4://.*/.* http://somemirror.org/sources/ \n \ svn://.*/.* http://somemirror.org/sources/ \n" MIRRORS =+ "\ ftp://.*/.* http://somemirror.org/sources/ \n \ http://.*/.* http://somemirror.org/sources/ \n \ https://.*/.* http://somemirror.org/sources/ \n" It is useful to note that BitBake supports cross-URLs. It is possible to mirror a Git repository on an HTTP server as a tarball. This is what the git:// mapping in the previous example does. Since network accesses are slow, Bitbake maintains a cache of files downloaded from the network. Any source files that are not local (i.e. downloaded from the Internet) are placed into the download directory, which is specified by the DL_DIR variable. File integrity is of key importance for reproducing builds. For non-local archive downloads, the fetcher code can verify sha256 and md5 checksums to ensure the archives have been downloaded correctly. You can specify these checksums by using the SRC_URI variable with the appropriate varflags as follows: SRC_URI[md5sum] = "value" SRC_URI[sha256sum] = "value" You can also specify the checksums as parameters on the SRC_URI as shown below: SRC_URI = "http://example.com/foobar.tar.bz2;md5sum=4a8e0f237e961fd7785d19d07fdb994d" If multiple URIs exist, you can specify the checksums either directly as in the previous example, or you can name the URLs. The following syntax shows how you name the URIs: SRC_URI = "http://example.com/foobar.tar.bz2;name=foo" SRC_URI[foo.md5sum] = 4a8e0f237e961fd7785d19d07fdb994d After a file has been downloaded and has had its checksum checked, a ".done" stamp is placed in DL_DIR. BitBake uses this stamp during subsequent builds to avoid downloading or comparing a checksum for the file again. It is assumed that local storage is safe from data corruption. If this were not the case, there would be bigger issues to worry about. If BB_STRICT_CHECKSUM is set, any download without a checksum triggers an error message. The BB_NO_NETWORK variable can be used to make any attempted network access a fatal error, which is useful for checking that mirrors are complete as well as other things.
The Unpack The unpack process usually immediately follows the download. For all URLs except Git URLs, BitBake uses the common unpack method. A number of parameters exist that you can specify within the URL to govern the behavior of the unpack stage: unpack: Controls whether the URL components are unpacked. If set to "1", which is the default, the components are unpacked. If set to "0", the unpack stage leaves the file alone. This parameter is useful when you want an archive to be copied in and not be unpacked. dos: Applies to .zip and .jar files and specifies whether to use DOS line ending conversion on text files. basepath: Instructs the unpack stage to strip the specified directories from the source path when unpacking. subdir: Unpacks the specific URL to the specified subdirectory within the root directory. The unpack call automatically decompresses and extracts files with ".Z", ".z", ".gz", ".xz", ".zip", ".jar", ".ipk", ".rpm". ".srpm", ".deb" and ".bz2" extensions as well as various combinations of tarball extensions. As mentioned, the Git fetcher has its own unpack method that is optimized to work with Git trees. Basically, this method works by cloning the tree into the final directory. The process is completed using references so that there is only one central copy of the Git metadata needed.
Fetchers As mentioned earlier, the URL prefix determines which fetcher submodule BitBake uses. Each submodule can support different URL parameters, which are described in the following sections.
Local file fetcher (<filename>file://</filename>) This submodule handles URLs that begin with file://. The filename you specify with in the URL can either be an absolute or relative path to a file. If the filename is relative, the contents of the FILESPATH variable is used in the same way PATH is used to find executables. Failing that, FILESDIR is used to find the appropriate relative file. FILESDIR is deprecated and can be replaced with FILESPATH. Because FILESDIR is likely to be removed, you should not use this variable in any new code. If the file cannot be found, it is assumed that it is available in DL_DIR by the time the download() method is called. If you specify a directory, the entire directory is unpacked. Here are some example URLs: SRC_URI = "file://relativefile.patch" SRC_URI = "file://relativefile.patch;this=ignored" SRC_URI = "file:///Users/ich/very_important_software"
CVS fetcher (<filename>(cvs://</filename>) This submodule handles checking out files from the CVS version control system. You can configure it using a number of different variables: FETCHCMD_cvs: The name of the executable to use when running the cvs command. This name is usually "cvs". SRCDATE: The date to use when fetching the CVS source code. A special value of "now" causes the checkout to be updated on every build. CVSDIR: Specifies where a temporary checkout is saved. The location is often DL_DIR/cvs. CVS_PROXY_HOST: The name to use as a "proxy=" parameter to the cvs command. CVS_PROXY_PORT: The port number to use as a "proxyport=" parameter to the cvs command. As well as the standard username and password URL syntax, you can also configure the fetcher with various URL parameters: The supported parameters are as follows: "method": The protocol over which to communicate with the cvs server. By default, this protocol is "pserver". If "method" is set to "ext", BitBake examines the "rsh" parameter and sets CVS_RSH. You can use "dir" for local directories. "module": Specifies the module to check out. You must supply this parameter. "tag": Describes which CVS TAG should be used for the checkout. By default, the TAG is empty. "date": Specifies a date. If no "date" is specified, the SRCDATE of the configuration is used to checkout a specific date. The special value of "now" causes the checkout to be updated on every build. "localdir": Used to rename the module. Effectively, you are renaming the output directory to which the module is unpacked. You are forcing the module into a special directory relative to CVSDIR. "rsh" Used in conjunction with the "method" parameter. "scmdata": Causes the CVS metadata to be maintained in the tarball the fetcher creates when set to "keep". The tarball is expanded into the work directory. By default, the CVS metadata is removed. "fullpath": Controls whether the resulting checkout is at the module level, which is the default, or is at deeper paths. "norecurse": Causes the fetcher to only checkout the specified directory with no recurse into any subdirectories. "port": The port to which the CVS server connects. Some example URLs are as follows: SRC_URI = "cvs://CVSROOT;module=mymodule;tag=some-version;method=ext" SRC_URI = "cvs://CVSROOT;module=mymodule;date=20060126;localdir=usethat"
HTTP/FTP wget fetcher (<filename>http://</filename>, <filename>ftp://</filename>, <filename>https://</filename>) This fetcher obtains files from web and FTP servers. Internally, the fetcher uses the wget utility. The executable and parameters used are specified by the FETCHCMD_wget variable, which defaults to a sensible values. The fetcher supports a parameter "downloadfilename" that allows the name of the downloaded file to be specified. Specifying the name of the downloaded file is useful for avoiding collisions in DL_DIR when dealing with multiple files that have the same name. Some example URLs are as follows: SRC_URI = "http://oe.handhelds.org/not_there.aac" SRC_URI = "ftp://oe.handhelds.org/not_there_as_well.aac" SRC_URI = "ftp://you@oe.handheld.sorg/home/you/secret.plan"
Subversion (SVN) Fetcher (<filename>svn://</filename>) This fetcher submodule fetches code from the Subversion source control system. The executable used is specified by FETCHCMD_svn, which defaults to "svn". The fetcher's temporary working directory is set by SVNDIR, which is usually DL_DIR/svn. The supported parameters are as follows: "module": The name of the svn module to checkout. You must provide this parameter. You can think of this parameter as the top-level directory of the repository data you want. "protocol": The protocol to use, which defaults to "svn". Other options are "svn+ssh" and "rsh". For "rsh", the "rsh" parameter is also used. "rev": The revision of the source code to checkout. "date": The date of the source code to checkout. Specific revisions are generally much safer to checkout rather than by date as they do not involve timezones (e.g. they are much more deterministic). "scmdata": Causes the “.svn” directories to be available during compile-time when set to "keep". By default, these directories are removed. Following are two examples using svn: SRC_URI = "svn://svn.oe.handhelds.org/svn;module=vip;proto=http;rev=667" SRC_URI = "svn://svn.oe.handhelds.org/svn/;module=opie;proto=svn+ssh;date=20060126"
GIT Fetcher (<filename>git://</filename>) This fetcher submodule fetches code from the Git source control system. The fetcher works by creating a bare clone of the remote into GITDIR, which is usually DL_DIR/git. This bare clone is then cloned into the work directory during the unpack stage when a specific tree is checked out. This is done using alternates and by reference to minimize the amount of duplicate data on the disk and make the unpack process fast. The executable used can be set with FETCHCMD_git. This fetcher supports the following parameters: "protocol": The protocol used to fetch the files. The default is "git" when a hostname is set. If a hostname is not set, the Git protocol is "file". You can also use "http", "https", "ssh" and "rsync". "nocheckout": Tells the fetcher to not checkout source code when unpacking when set to "1". Set this option for the URL where there is a custom routine to checkout code. The default is "0". "rebaseable": Indicates that the upstream Git repository can be rebased. You should set this parameter to "1" if revisions can become detached from branches. In this case, the source mirror tarball is done per revision, which has a loss of efficiency. Rebasing the upstream Git repository could cause the current revision to disappear from the upstream repository. This option reminds the fetcher to preserve the local cache carefully for future use. The default value for this parameter is "0". "nobranch": Tells the fetcher to not check the SHA validation for the branch when set to "1". The default is "0". Set this option for the recipe that refers to the commit that is valid for a tag instead of the branch. "bareclone": Tells the fetcher to clone a bare clone into the destination directory without checking out a working tree. Only the raw Git metadata is provided. This parameter implies the "nocheckout" parameter as well. "branch": The branch(es) of the Git tree to clone. If unset, this is assumed to be "master". The number of branch parameters much match the number of name parameters. "rev": The revision to use for the checkout. The default is "master". "tag": Specifies a tag to use for the checkout. To correctly resolve tags, BitBake must access the network. For that reason, tags are often not used. As far as Git is concerned, the "tag" parameter behaves effectively the same as the "revision" parameter. "subpath": Limits the checkout to a specific subpath of the tree. By default, the whole tree is checked out. "destsuffix": The name of the path in which to place the checkout. By default, the path is git/. Here are some example URLs: SRC_URI = "git://git.oe.handhelds.org/git/vip.git;tag=version-1" SRC_URI = "git://git.oe.handhelds.org/git/vip.git;protocol=http"
Other Fetchers Fetch submodules also exist for the following: Bazaar (bzr://) Perforce (p4://) Git Submodules (gitsm://) Trees using Git Annex (gitannex://) Secure FTP (sftp://) Secure Shell (ssh://) Repo (repo://) OSC (osc://) Mercurial (hg://) No documentation currently exists for these lesser used fetcher submodules. However, you might find the code helpful and readable.
Auto Revisions We need to document AUTOREV and SRCREV_FORMAT here.