diff options
author | Etienne Cordonnier <ecordonnier@snap.com> | 2023-02-01 15:19:00 +0100 |
---|---|---|
committer | Richard Purdie <richard.purdie@linuxfoundation.org> | 2023-02-17 15:05:08 +0000 |
commit | b643d2bc178efab1a192d2db3e2ea99e9c3e5dde (patch) | |
tree | 1c5f6557b4911729df72ecfd7d639aee892f363a /bitbake | |
parent | a6623b496976710bf4ac27227b2f9b797f76f00d (diff) | |
download | poky-b643d2bc178efab1a192d2db3e2ea99e9c3e5dde.tar.gz |
bitbake: siggen: Fix inefficient string concatenation
As discussed in https://stackoverflow.com/a/4435752/1710392 , CPython
has an optimization for statements in the form "a = a + b" or "a += b".
It seems that this line does not get optimized, because it has a form a = a + b + c:
data = data + "./" + f.split("/./")[1]
For that reason, it does a copy of data for each iteration, potentially copying megabytes
of data for each iteration.
Changing this line causes SignatureGeneratorBasic::get_taskhash to take 0.06 seconds
instead of 45 seconds on my test setup where SRC_URI points to a big directory.
Note that PEP8 recommends explicitely not to use this optimization which is specific to CPython:
"do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b"
However, the PEP8 recommended form using "join()" also does not avoid the copy and takes 45 seconds in my test setup:
data = ''.join((data, "./", f.split("/./")[1]))
I have changed the other lines to also use += for consistency only, however those were in the form a = a + b
and were optimized already.
Co-authored-by: JJ Robertson <jrobertson@snap.com>
(Bitbake rev: 590ae6fde9da75db3a368e5c0d47920696c33ebf)
Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 195750f2ca355e29d51219c58ecb2c1d83692717)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'bitbake')
-rw-r--r-- | bitbake/lib/bb/siggen.py | 10 |
1 files changed, 5 insertions, 5 deletions
diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py index 9a20fc8e5f..cea3a5380b 100644 --- a/bitbake/lib/bb/siggen.py +++ b/bitbake/lib/bb/siggen.py | |||
@@ -329,19 +329,19 @@ class SignatureGeneratorBasic(SignatureGenerator): | |||
329 | 329 | ||
330 | data = self.basehash[tid] | 330 | data = self.basehash[tid] |
331 | for dep in self.runtaskdeps[tid]: | 331 | for dep in self.runtaskdeps[tid]: |
332 | data = data + self.get_unihash(dep) | 332 | data += self.get_unihash(dep) |
333 | 333 | ||
334 | for (f, cs) in self.file_checksum_values[tid]: | 334 | for (f, cs) in self.file_checksum_values[tid]: |
335 | if cs: | 335 | if cs: |
336 | if "/./" in f: | 336 | if "/./" in f: |
337 | data = data + "./" + f.split("/./")[1] | 337 | data += "./" + f.split("/./")[1] |
338 | data = data + cs | 338 | data += cs |
339 | 339 | ||
340 | if tid in self.taints: | 340 | if tid in self.taints: |
341 | if self.taints[tid].startswith("nostamp:"): | 341 | if self.taints[tid].startswith("nostamp:"): |
342 | data = data + self.taints[tid][8:] | 342 | data += self.taints[tid][8:] |
343 | else: | 343 | else: |
344 | data = data + self.taints[tid] | 344 | data += self.taints[tid] |
345 | 345 | ||
346 | h = hashlib.sha256(data.encode("utf-8")).hexdigest() | 346 | h = hashlib.sha256(data.encode("utf-8")).hexdigest() |
347 | self.taskhash[tid] = h | 347 | self.taskhash[tid] = h |