summaryrefslogtreecommitdiffstats
path: root/bitbake
diff options
context:
space:
mode:
authorEtienne Cordonnier <ecordonnier@snap.com>2023-02-01 15:19:00 +0100
committerRichard Purdie <richard.purdie@linuxfoundation.org>2023-02-17 15:05:08 +0000
commitb643d2bc178efab1a192d2db3e2ea99e9c3e5dde (patch)
tree1c5f6557b4911729df72ecfd7d639aee892f363a /bitbake
parenta6623b496976710bf4ac27227b2f9b797f76f00d (diff)
downloadpoky-b643d2bc178efab1a192d2db3e2ea99e9c3e5dde.tar.gz
bitbake: siggen: Fix inefficient string concatenation
As discussed in https://stackoverflow.com/a/4435752/1710392 , CPython has an optimization for statements in the form "a = a + b" or "a += b". It seems that this line does not get optimized, because it has a form a = a + b + c: data = data + "./" + f.split("/./")[1] For that reason, it does a copy of data for each iteration, potentially copying megabytes of data for each iteration. Changing this line causes SignatureGeneratorBasic::get_taskhash to take 0.06 seconds instead of 45 seconds on my test setup where SRC_URI points to a big directory. Note that PEP8 recommends explicitely not to use this optimization which is specific to CPython: "do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b" However, the PEP8 recommended form using "join()" also does not avoid the copy and takes 45 seconds in my test setup: data = ''.join((data, "./", f.split("/./")[1])) I have changed the other lines to also use += for consistency only, however those were in the form a = a + b and were optimized already. Co-authored-by: JJ Robertson <jrobertson@snap.com> (Bitbake rev: 590ae6fde9da75db3a368e5c0d47920696c33ebf) Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> (cherry picked from commit 195750f2ca355e29d51219c58ecb2c1d83692717) Signed-off-by: Steve Sakoman <steve@sakoman.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'bitbake')
-rw-r--r--bitbake/lib/bb/siggen.py10
1 files changed, 5 insertions, 5 deletions
diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index 9a20fc8e5f..cea3a5380b 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -329,19 +329,19 @@ class SignatureGeneratorBasic(SignatureGenerator):
329 329
330 data = self.basehash[tid] 330 data = self.basehash[tid]
331 for dep in self.runtaskdeps[tid]: 331 for dep in self.runtaskdeps[tid]:
332 data = data + self.get_unihash(dep) 332 data += self.get_unihash(dep)
333 333
334 for (f, cs) in self.file_checksum_values[tid]: 334 for (f, cs) in self.file_checksum_values[tid]:
335 if cs: 335 if cs:
336 if "/./" in f: 336 if "/./" in f:
337 data = data + "./" + f.split("/./")[1] 337 data += "./" + f.split("/./")[1]
338 data = data + cs 338 data += cs
339 339
340 if tid in self.taints: 340 if tid in self.taints:
341 if self.taints[tid].startswith("nostamp:"): 341 if self.taints[tid].startswith("nostamp:"):
342 data = data + self.taints[tid][8:] 342 data += self.taints[tid][8:]
343 else: 343 else:
344 data = data + self.taints[tid] 344 data += self.taints[tid]
345 345
346 h = hashlib.sha256(data.encode("utf-8")).hexdigest() 346 h = hashlib.sha256(data.encode("utf-8")).hexdigest()
347 self.taskhash[tid] = h 347 self.taskhash[tid] = h