summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorEmil Ekmečić <eekmecic@snap.com>2023-08-09 02:39:44 -0700
committerRichard Purdie <richard.purdie@linuxfoundation.org>2023-08-11 16:23:01 +0100
commitac5512b0acf3457a4c459d4d1711649053f4e618 (patch)
treedfa8627a0c3c6e2c0c8092c5cf91475dcaed8839
parent71282bbc5331e7768c95d1dd6db94651de504734 (diff)
downloadpoky-ac5512b0acf3457a4c459d4d1711649053f4e618.tar.gz
bitbake: fetch2: add Google Cloud Platform (GCP) fetcher
This fetcher allows BitBake to fetch from a Google Cloud Storage bucket. The fetcher expects a gs:// URI of the following form: SSTATE_MIRRORS = "file://.* gs://<bucket name>/PATH" The fetcher uses the Google Cloud Storage Python Client, and expects it to be installed, configured, and authenticated prior to use. If accepted, this patch should merge in with the corresponding oe-core patch titled "Add GCP fetcher to list of supported protocols". Some comments on the patch: There is also documentation for the fetcher added to the User Manual. I'm still not completely sure about the recommends_checksum() being set to True. As I've noted in the mailing list, it will throw warnings if the fetcher is used in recipes without specifying a checksum. Please let me know if this is intended behavior or if it should be modified. Here is how this fetcher conforms to the fetcher expectations described at this link: https://git.yoctoproject.org/poky/tree/bitbake/lib/bb/fetch2/README a) Yes, network fetching only happens in the fetcher b) The fetcher has nothing to do with the unpack phase so there is no network access there c) This change doesn't affect the behavior of DL_DIR. The GCP fetcher only downloads to the DL_DIR in the same way that other fetchers, namely the S3 and Azure fetchers do. d) The fetcher is identical to the S3 and Azure fetchers in this context e) Yes, the fetcher output is deterministic because it is downloading tarballs from a bucket and not modifying them in any way. f) I set up a local proxy using tinyproxy and set the http_proxy variable to test whether the Python API respected the proxy. It appears that it did as I could see traffic passing through the proxy. I also did some searching online and found posts indicating that the Google Cloud Python APIs supported the classic Linux proxy variables, namely: - https://github.com/googleapis/google-api-python-client/issues/1260 g) Access is minimal, only checking if the file exists and downloading it if it does. h) Not applicable, BitBake already knows which version it wants and the version infomation is encoded in the filename. The fetcher has no concept of versions. i) Not applicable j) Not applicable k) No tests were added as part of this change. I didn't see any tests for the S3 or Azure changes either, is that OK? l) I'm not 100% familiar but I don't believe this fetcher is using any tools during parse time. Please correct me if I'm wrong. (Bitbake rev: 8e7e5719c1de79eb488732818871add3a6fc238b) Signed-off-by: Emil Ekmečić <eekmecic@snap.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
-rw-r--r--bitbake/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst36
-rw-r--r--bitbake/lib/bb/fetch2/__init__.py4
-rw-r--r--bitbake/lib/bb/fetch2/gcp.py98
3 files changed, 137 insertions, 1 deletions
diff --git a/bitbake/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst b/bitbake/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
index c061bd70ea..f5723d6767 100644
--- a/bitbake/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
+++ b/bitbake/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
@@ -688,6 +688,40 @@ Here is an example URL::
688 688
689It can also be used when setting mirrors definitions using the :term:`PREMIRRORS` variable. 689It can also be used when setting mirrors definitions using the :term:`PREMIRRORS` variable.
690 690
691.. _gcp-fetcher:
692
693GCP Fetcher (``gs://``)
694--------------------------
695
696This submodule fetches data from a
697`Google Cloud Storage Bucket <https://cloud.google.com/storage/docs/buckets>`__.
698It uses the `Google Cloud Storage Python Client <https://cloud.google.com/python/docs/reference/storage/latest>`__
699to check the status of objects in the bucket and download them.
700The use of the Python client makes it substantially faster than using command
701line tools such as gsutil.
702
703The fetcher requires the Google Cloud Storage Python Client to be installed, along
704with the gsutil tool.
705
706The fetcher requires that the machine has valid credentials for accessing the
707chosen bucket. Instructions for authentication can be found in the
708`Google Cloud documentation <https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev>`__.
709
710The fetcher can be used for fetching sstate artifacts from a GCS bucket by
711specifying the :term:`SSTATE_MIRRORS` variable as shown below::
712
713 SSTATE_MIRRORS ?= "\
714 file://.* gs://<bucket name>/PATH \
715 "
716
717The fetcher can also be used in recipes::
718
719 SRC_URI = "gs://<bucket name>/<foo_container>/<bar_file>"
720
721However, the checksum of the file should be also be provided::
722
723 SRC_URI[sha256sum] = "<sha256 string>"
724
691.. _crate-fetcher: 725.. _crate-fetcher:
692 726
693Crate Fetcher (``crate://``) 727Crate Fetcher (``crate://``)
@@ -791,6 +825,8 @@ Fetch submodules also exist for the following:
791 825
792- OSC (``osc://``) 826- OSC (``osc://``)
793 827
828- S3 (``s3://``)
829
794- Secure FTP (``sftp://``) 830- Secure FTP (``sftp://``)
795 831
796- Secure Shell (``ssh://``) 832- Secure Shell (``ssh://``)
diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 2428a26fa6..e4c1d20627 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -1290,7 +1290,7 @@ class FetchData(object):
1290 1290
1291 if checksum_name in self.parm: 1291 if checksum_name in self.parm:
1292 checksum_expected = self.parm[checksum_name] 1292 checksum_expected = self.parm[checksum_name]
1293 elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate"]: 1293 elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3", "az", "crate", "gs"]:
1294 checksum_expected = None 1294 checksum_expected = None
1295 else: 1295 else:
1296 checksum_expected = d.getVarFlag("SRC_URI", checksum_name) 1296 checksum_expected = d.getVarFlag("SRC_URI", checksum_name)
@@ -1976,6 +1976,7 @@ from . import npm
1976from . import npmsw 1976from . import npmsw
1977from . import az 1977from . import az
1978from . import crate 1978from . import crate
1979from . import gcp
1979 1980
1980methods.append(local.Local()) 1981methods.append(local.Local())
1981methods.append(wget.Wget()) 1982methods.append(wget.Wget())
@@ -1997,3 +1998,4 @@ methods.append(npm.Npm())
1997methods.append(npmsw.NpmShrinkWrap()) 1998methods.append(npmsw.NpmShrinkWrap())
1998methods.append(az.Az()) 1999methods.append(az.Az())
1999methods.append(crate.Crate()) 2000methods.append(crate.Crate())
2001methods.append(gcp.GCP())
diff --git a/bitbake/lib/bb/fetch2/gcp.py b/bitbake/lib/bb/fetch2/gcp.py
new file mode 100644
index 0000000000..f42c81fda8
--- /dev/null
+++ b/bitbake/lib/bb/fetch2/gcp.py
@@ -0,0 +1,98 @@
1"""
2BitBake 'Fetch' implementation for Google Cloup Platform Storage.
3
4Class for fetching files from Google Cloud Storage using the
5Google Cloud Storage Python Client. The GCS Python Client must
6be correctly installed, configured and authenticated prior to use.
7Additionally, gsutil must also be installed.
8
9"""
10
11# Copyright (C) 2023, Snap Inc.
12#
13# Based in part on bb.fetch2.s3:
14# Copyright (C) 2017 Andre McCurdy
15#
16# SPDX-License-Identifier: GPL-2.0-only
17#
18# Based on functions from the base bb module, Copyright 2003 Holger Schurig
19
20import os
21import bb
22import urllib.parse, urllib.error
23from bb.fetch2 import FetchMethod
24from bb.fetch2 import FetchError
25from bb.fetch2 import logger
26
27class GCP(FetchMethod):
28 """
29 Class to fetch urls via GCP's Python API.
30 """
31 def __init__(self):
32 self.gcp_client = None
33
34 def supports(self, ud, d):
35 """
36 Check to see if a given url can be fetched with GCP.
37 """
38 return ud.type in ['gs']
39
40 def recommends_checksum(self, urldata):
41 return True
42
43 def urldata_init(self, ud, d):
44 if 'downloadfilename' in ud.parm:
45 ud.basename = ud.parm['downloadfilename']
46 else:
47 ud.basename = os.path.basename(ud.path)
48
49 ud.localfile = d.expand(urllib.parse.unquote(ud.basename))
50
51 def get_gcp_client(self):
52 from google.cloud import storage
53 self.gcp_client = storage.Client(project=None)
54
55 def download(self, ud, d):
56 """
57 Fetch urls using the GCP API.
58 Assumes localpath was called first.
59 """
60 logger.debug2(f"Trying to download gs://{ud.host}{ud.path} to {ud.localpath}")
61 if self.gcp_client is None:
62 self.get_gcp_client()
63
64 bb.fetch2.check_network_access(d, "gsutil stat", ud.url)
65
66 # Path sometimes has leading slash, so strip it
67 path = ud.path.lstrip("/")
68 blob = self.gcp_client.bucket(ud.host).blob(path)
69 blob.download_to_filename(ud.localpath)
70
71 # Additional sanity checks copied from the wget class (although there
72 # are no known issues which mean these are required, treat the GCP API
73 # tool with a little healthy suspicion).
74 if not os.path.exists(ud.localpath):
75 raise FetchError(f"The GCP API returned success for gs://{ud.host}{ud.path} but {ud.localpath} doesn't exist?!")
76
77 if os.path.getsize(ud.localpath) == 0:
78 os.remove(ud.localpath)
79 raise FetchError(f"The downloaded file for gs://{ud.host}{ud.path} resulted in a zero size file?! Deleting and failing since this isn't right.")
80
81 return True
82
83 def checkstatus(self, fetch, ud, d):
84 """
85 Check the status of a URL.
86 """
87 logger.debug2(f"Checking status of gs://{ud.host}{ud.path}")
88 if self.gcp_client is None:
89 self.get_gcp_client()
90
91 bb.fetch2.check_network_access(d, "gsutil stat", ud.url)
92
93 # Path sometimes has leading slash, so strip it
94 path = ud.path.lstrip("/")
95 if self.gcp_client.bucket(ud.host).blob(path).exists() == False:
96 raise FetchError(f"The GCP API reported that gs://{ud.host}{ud.path} does not exist")
97 else:
98 return True