...
QEMU: 9.1.0 (with this patch to fix performance issue we found: https://lore.kernel.org/qemu-devel/20241007172317.1439564-2-pbonzini@redhat.com/- it will be merged available in next stable and qemu 9.2.0 and was already merged on master)
Android aarch64 image: aosp_cf_arm64_only_phone-img-12425990.zip
Android x64 image: aosp_cf_x86_64_phone-img-12426306.zip
...
While performing this investigation, we discovered a recent regression, which resulted in bad performance when booting aarch64 android, making any execution with -smp > 1 slower than -smp 1. In more, an overhead was present when booting x64 android.
In short, QEMU must be built with -mcx16
to ensure we use cmpxchg16 on x64 hosts. Else, any atomic instruction will be serialized, blocking all other vcpus.
Commit introducing the regression: https://gitlab.com/qemu-project/qemu/-/commit/c2bf2ccb266dc9ae4a6da75b845f54535417e109
Series fixing the regression: https://lore.kernel.org/qemu-devel/20241007172317.1439564-2-pbonzini@redhat.com/
This series will be was merged in QEMU soon, and will be available in next-stable and QEMU 9.2.0. Meanwhile, ensure your QEMU is built with this patchbuild QEMU from master branch.
Following results are presented with this fix.
...
qemu-system-x86_64 -accel tcg - cpu max
-smp 1: 1036s (x34)
-smp 2: 410s (x13)
-smp 4: 280s (x9)
-smp 6: 260s (x8)
-smp 8: 260s (x8)
We can see that the speedup compared to -smp 2
is not linear. While booting, we can see that the QEMU process barely reaches 500% of cpu time in top. This is a limitation of Android boot sequence that does not seem able to use more than 4 cores.
...
qemu-system-aarch64 -accel tcg -cpu max,pauth=off
-smp 1: 1034s (x34)
-smp 2: 512s (x17)
-smp 4: 380s (x12)
-smp 6: 360s (x12)
-smp 8: 375s (x12)
We can see that disabling pointer execution results in much faster execution, as expected.
Performance is close from what we observe when booting x64 version, with a small overhead for aarch64.
...
4 cores
-cpu max (,pauth=off on aarch64)
ensure that cmpxchg is used on x64 (massive difference with smp > 1). This series https://lore.kernel.org/qemu-devel/20241007172317.1439564-2-pbonzini@redhat.com/ will be was merged in QEMU and will be available in qemu next stable and 9.2.0. Meanwhile, use a QEMU compiled from master branch.
Performance difference between aarch64 and x64 can be explained by TLB management on aarch64, and some helpers.