summaryrefslogtreecommitdiffstats
path: root/meta
diff options
context:
space:
mode:
authorVictor Kamensky <kamensky@cisco.com>2020-10-07 13:38:37 -0700
committerRichard Purdie <richard.purdie@linuxfoundation.org>2020-10-08 11:28:58 +0100
commitf6f5d092053e4c47e36f54a468b1d8d9b0e9118b (patch)
treec8a51f99d645ad99f8f0907eadd8d23091c1179b /meta
parentfe74a4edd2db299557c9c0ffb1a804da444c47bd (diff)
downloadpoky-f6f5d092053e4c47e36f54a468b1d8d9b0e9118b.tar.gz
qemu: add 34Kf-64tlb fictitious cpu type
In Yocto Project PR 13992 it was reported that qemumips in autobuilder runs almost twice slower then qemumips64 and some times hit time out. Upon investigations of qemu-system with perf, gdb, and SystemTap and comparing qemumips and qemumips64 machines behavior it was noticed that qemu soft mmu code behaves quite different and in case if qemumips tlbwr instruction called 16 times more oftern. It happens that in qemumips64 case qemu runs with cpu type that contains 64 TLB, but in case of qemumips qemu runs with cpu type that contains only 16 TLBs. The idea of proposed qemu patch is to introduce fictitious 34Kf-64tlb cpu type that defined exactly as 34Kf but has 64 TLBs, instead of original 16 TLBs. Testing of core-image-full-cmdline:do_testimage with 34Kf-64tlb shows 40% or so test execution real time improvement. Note for future porters of the patch: easiest way to update the patch and be in sync with 34Kf definition is to copy 34Kf machine definition and apply the following changes to it (just change 15 to 63 of CP0C1_MMU bits value) [kamensky@coreos-lnx2 qemu]$ diff ~/34Kf.c ~/34Kf-64tlb.c 2c2 < .name = "34Kf", > .name = "34Kf-64tlb", 6c6 < .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (15 << CP0C1_MMU) | > .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (63 << CP0C1_MMU) | Fixes https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992 Upstream Status: Inappropriate (From OE-Core rev: 4470a04943352224955f17e004962f0f9e1c9b0c) Signed-off-by: Victor Kamensky <kamensky@cisco.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Diffstat (limited to 'meta')
-rw-r--r--meta/recipes-devtools/qemu/qemu.inc1
-rw-r--r--meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch118
2 files changed, 119 insertions, 0 deletions
diff --git a/meta/recipes-devtools/qemu/qemu.inc b/meta/recipes-devtools/qemu/qemu.inc
index bbb9038961..6c0edcb706 100644
--- a/meta/recipes-devtools/qemu/qemu.inc
+++ b/meta/recipes-devtools/qemu/qemu.inc
@@ -31,6 +31,7 @@ SRC_URI = "https://download.qemu.org/${BPN}-${PV}.tar.xz \
31 file://0001-qemu-Do-not-include-file-if-not-exists.patch \ 31 file://0001-qemu-Do-not-include-file-if-not-exists.patch \
32 file://find_datadir.patch \ 32 file://find_datadir.patch \
33 file://usb-fix-setup_len-init.patch \ 33 file://usb-fix-setup_len-init.patch \
34 file://0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch \
34 " 35 "
35UPSTREAM_CHECK_REGEX = "qemu-(?P<pver>\d+(\.\d+)+)\.tar" 36UPSTREAM_CHECK_REGEX = "qemu-(?P<pver>\d+(\.\d+)+)\.tar"
36 37
diff --git a/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch b/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
new file mode 100644
index 0000000000..b6312e1543
--- /dev/null
+++ b/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
@@ -0,0 +1,118 @@
1From b3fcc7d96523ad8e3ea28c09d495ef08529d01ce Mon Sep 17 00:00:00 2001
2From: Victor Kamensky <kamensky@cisco.com>
3Date: Wed, 7 Oct 2020 10:19:42 -0700
4Subject: [PATCH] mips: add 34Kf-64tlb fictitious cpu type like 34Kf but with
5 64 TLBs
6
7In Yocto Project CI runs it was observed that test run
8of 32 bit mips image takes almost twice longer than 64 bit
9mips image with the same logical load and CI execution
10hits timeout.
11
12See https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
13
14Yocto project uses 34Kf cpu type to run 32 bit mips image,
15and MIPS64R2-generic cpu type to run 64 bit mips64 image.
16
17Upon qemu behavior differences investigation between mips
18and mips64 two prominent observations came up: under
19logically similar load (same definition and configuration
20of user-land image) in case of mips get_physical_address
21function is called almost twice more often, meaning
22twice more memory accesses involved in this case. Also
23number of tlbwr instruction executed (r4k_helper_tlbwr
24qemu function) almost 16 time bigger in mips case than in
25mips64.
26
27It turns out that 34Kf cpu has 16 TLBs, but in case of
28MIPS64R2-generic it is 64 TLBs. So that explains why
29some many more tlbwr had to be execute by kernel TLB refill
30handler in case of 32 bit misp.
31
32The idea of the fix is to come up with new 34Kf-64tlb fictitious
33cpu type, that would behave exactly as 34Kf but it would
34contain 64 TLBs to reduce TLB trashing. After all, adding
35more TLBs to soft mmu is easy.
36
37Experiment with some significant non-trvial load in Yocto
38environment by running do_testimage load shows that 34Kf-64tlb
39cpu performs 40% or so better than original 34Kf cpu wrt test
40execution real time.
41
42It is not ideal to have cpu type that does not exist in the
43wild but given performance gains it seems to be justified.
44
45Signed-off-by: Victor Kamensky <kamensky@cisco.com>
46---
47 target/mips/translate_init.inc.c | 55 ++++++++++++++++++++++++++++++++++++++++
48 1 file changed, 55 insertions(+)
49
50diff --git a/target/mips/translate_init.inc.c b/target/mips/translate_init.inc.c
51index 637caccd89..b73ab48231 100644
52--- a/target/mips/translate_init.inc.c
53+++ b/target/mips/translate_init.inc.c
54@@ -297,6 +297,61 @@ const mips_def_t mips_defs[] =
55 .insn_flags = CPU_MIPS32R2 | ASE_MIPS16 | ASE_DSP | ASE_MT,
56 .mmu_type = MMU_TYPE_R4000,
57 },
58+ /*
59+ * Verbatim copy of "34Kf" cpu, only bumped up number of TLB entries
60+ * from 16 to 64 (see CP0_Config0 value at CP0C1_MMU bits) to improve
61+ * performance by reducing number of TLB refill exceptions and
62+ * eliminating need to run all corresponding TLB refill handling
63+ * instructions.
64+ */
65+ {
66+ .name = "34Kf-64tlb",
67+ .CP0_PRid = 0x00019500,
68+ .CP0_Config0 = MIPS_CONFIG0 | (0x1 << CP0C0_AR) |
69+ (MMU_TYPE_R4000 << CP0C0_MT),
70+ .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (63 << CP0C1_MMU) |
71+ (0 << CP0C1_IS) | (3 << CP0C1_IL) | (1 << CP0C1_IA) |
72+ (0 << CP0C1_DS) | (3 << CP0C1_DL) | (1 << CP0C1_DA) |
73+ (1 << CP0C1_CA),
74+ .CP0_Config2 = MIPS_CONFIG2,
75+ .CP0_Config3 = MIPS_CONFIG3 | (1 << CP0C3_VInt) | (1 << CP0C3_MT) |
76+ (1 << CP0C3_DSPP),
77+ .CP0_LLAddr_rw_bitmask = 0,
78+ .CP0_LLAddr_shift = 0,
79+ .SYNCI_Step = 32,
80+ .CCRes = 2,
81+ .CP0_Status_rw_bitmask = 0x3778FF1F,
82+ .CP0_TCStatus_rw_bitmask = (0 << CP0TCSt_TCU3) | (0 << CP0TCSt_TCU2) |
83+ (1 << CP0TCSt_TCU1) | (1 << CP0TCSt_TCU0) |
84+ (0 << CP0TCSt_TMX) | (1 << CP0TCSt_DT) |
85+ (1 << CP0TCSt_DA) | (1 << CP0TCSt_A) |
86+ (0x3 << CP0TCSt_TKSU) | (1 << CP0TCSt_IXMT) |
87+ (0xff << CP0TCSt_TASID),
88+ .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_L) | (1 << FCR0_W) |
89+ (1 << FCR0_D) | (1 << FCR0_S) | (0x95 << FCR0_PRID),
90+ .CP1_fcr31 = 0,
91+ .CP1_fcr31_rw_bitmask = 0xFF83FFFF,
92+ .CP0_SRSCtl = (0xf << CP0SRSCtl_HSS),
93+ .CP0_SRSConf0_rw_bitmask = 0x3fffffff,
94+ .CP0_SRSConf0 = (1U << CP0SRSC0_M) | (0x3fe << CP0SRSC0_SRS3) |
95+ (0x3fe << CP0SRSC0_SRS2) | (0x3fe << CP0SRSC0_SRS1),
96+ .CP0_SRSConf1_rw_bitmask = 0x3fffffff,
97+ .CP0_SRSConf1 = (1U << CP0SRSC1_M) | (0x3fe << CP0SRSC1_SRS6) |
98+ (0x3fe << CP0SRSC1_SRS5) | (0x3fe << CP0SRSC1_SRS4),
99+ .CP0_SRSConf2_rw_bitmask = 0x3fffffff,
100+ .CP0_SRSConf2 = (1U << CP0SRSC2_M) | (0x3fe << CP0SRSC2_SRS9) |
101+ (0x3fe << CP0SRSC2_SRS8) | (0x3fe << CP0SRSC2_SRS7),
102+ .CP0_SRSConf3_rw_bitmask = 0x3fffffff,
103+ .CP0_SRSConf3 = (1U << CP0SRSC3_M) | (0x3fe << CP0SRSC3_SRS12) |
104+ (0x3fe << CP0SRSC3_SRS11) | (0x3fe << CP0SRSC3_SRS10),
105+ .CP0_SRSConf4_rw_bitmask = 0x3fffffff,
106+ .CP0_SRSConf4 = (0x3fe << CP0SRSC4_SRS15) |
107+ (0x3fe << CP0SRSC4_SRS14) | (0x3fe << CP0SRSC4_SRS13),
108+ .SEGBITS = 32,
109+ .PABITS = 32,
110+ .insn_flags = CPU_MIPS32R2 | ASE_MIPS16 | ASE_DSP | ASE_MT,
111+ .mmu_type = MMU_TYPE_R4000,
112+ },
113 {
114 .name = "74Kf",
115 .CP0_PRid = 0x00019700,
116--
1172.14.5
118