lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231108112909.591987-1-m.szyprowski@samsung.com>
Date:   Wed,  8 Nov 2023 12:29:09 +0100
From:   Marek Szyprowski <m.szyprowski@...sung.com>
To:     linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Cc:     Marek Szyprowski <m.szyprowski@...sung.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Andrew Jones <ajones@...tanamicro.com>,
        Heiko Stuebner <heiko@...ech.de>,
        Evan Green <evan@...osinc.com>,
        Jisheng Zhang <jszhang@...nel.org>,
        Clément Léger <cleger@...osinc.com>
Subject: [PATCH] riscv: fix potential panic during CPU hot-plug

Commit 584ea6564bca ("RISC-V: Probe for unaligned access speed") added a
test for unaligned access speed. It is being performed when given CPU is
onlined. Then, another test for misaligned access emulation has been
added in commit 71c54b3d169d ("riscv: report misaligned accesses
emulation to hwprobe"). This unaligned access speed doesn't really change
after the boot, so it is sufficient to do this test only once. This has
been partially added by commit c20d36cc2a20 ("riscv: don't probe
unaligned access speed if already done"), but this optimisation works
only if RISCV_HWPROBE_MISALIGNED_EMULATED is returned by the latter
check. Otherwise the 'misaligned_access_speed' pcpu varliable is
overwritten with RISCV_HWPROBE_MISALIGNED_UNKNOWN value, what makes the
first check to be always performed.

Recently I've noticed that the first check introduced a regression in the
CPU hot-plug mechanism. This can be observed as a following panic on
QEmu:

CPU1: off
cpu1: Ratio of byte access time to unaligned word access is 7.00, unaligned accesses are fast
CPU1: off
CPU1: failed to come online
------------[ cut here ]------------
WARNING: CPU: 0 PID: 215 at kernel/smp.c:437 __flush_smp_call_function_queue+0x90/0x292
Modules linked in:
CPU: 0 PID: 215 Comm: bash Not tainted 6.5.0-rc1+ #7524
Hardware name: riscv-virtio,qemu (DT)
epc : __flush_smp_call_function_queue+0x90/0x292
 ra : smpcfd_dying_cpu+0xe/0x1c
...
[<ffffffff800c904a>] __flush_smp_call_function_queue+0x90/0x292
[<ffffffff800c9ae8>] smpcfd_dying_cpu+0xe/0x1c
[<ffffffff80012b2e>] cpuhp_invoke_callback+0x124/0x322
[<ffffffff800142f8>] _cpu_up+0x218/0x322
[<ffffffff8001445e>] cpu_up+0x5c/0x98
[<ffffffff80014bb8>] cpu_device_up+0x14/0x1c
[<ffffffff8066d68e>] cpu_subsys_online+0x10/0x18
[<ffffffff806680e8>] device_online+0x56/0x72
[<ffffffff80668190>] online_store+0x8c/0xae
[<ffffffff80662dde>] dev_attr_store+0xe/0x1a
[<ffffffff802b53bc>] sysfs_kf_write+0x2e/0x4c
[<ffffffff802b4750>] kernfs_fop_write_iter+0xe8/0x14e
[<ffffffff80230170>] vfs_write+0x2a4/0x3e0
[<ffffffff802303e8>] ksys_write+0x5e/0xc8
[<ffffffff80230460>] sys_write+0xe/0x16
[<ffffffff80a4a8de>] do_trap_ecall_u+0xe0/0xf4
[<ffffffff80003e8c>] ret_from_exception+0x0/0x64
irq event stamp: 12335
hardirqs last  enabled at (12335): [<ffffffff8007e542>] console_unlock+0x156/0x186
hardirqs last disabled at (12334): [<ffffffff8007e52c>] console_unlock+0x140/0x186
softirqs last  enabled at (12154): [<ffffffff80a560d0>] __do_softirq+0x3b0/0x470
softirqs last disabled at (12147): [<ffffffff80019340>] __irq_exit_rcu+0xa6/0xd0
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
kernel BUG at kernel/irq_work.c:245!
Kernel BUG [#1]
Modules linked in:
CPU: 0 PID: 215 Comm: bash Tainted: G        W          6.5.0-rc1+ #7524
Hardware name: riscv-virtio,qemu (DT)
epc : irq_work_run_list+0x30/0x32
 ra : irq_work_run+0x2a/0x4c
...
[<ffffffff8011a346>] irq_work_run_list+0x30/0x32
[<ffffffff8011a372>] irq_work_run+0x2a/0x4c
[<ffffffff800c9aec>] smpcfd_dying_cpu+0x12/0x1c
[<ffffffff80012b2e>] cpuhp_invoke_callback+0x124/0x322
[<ffffffff800142f8>] _cpu_up+0x218/0x322
[<ffffffff8001445e>] cpu_up+0x5c/0x98
[<ffffffff80014bb8>] cpu_device_up+0x14/0x1c
[<ffffffff8066d68e>] cpu_subsys_online+0x10/0x18
[<ffffffff806680e8>] device_online+0x56/0x72
[<ffffffff80668190>] online_store+0x8c/0xae
[<ffffffff80662dde>] dev_attr_store+0xe/0x1a
[<ffffffff802b53bc>] sysfs_kf_write+0x2e/0x4c
[<ffffffff802b4750>] kernfs_fop_write_iter+0xe8/0x14e
[<ffffffff80230170>] vfs_write+0x2a4/0x3e0
[<ffffffff802303e8>] ksys_write+0x5e/0xc8
[<ffffffff80230460>] sys_write+0xe/0x16
[<ffffffff80a4a8de>] do_trap_ecall_u+0xe0/0xf4
[<ffffffff80003e8c>] ret_from_exception+0x0/0x64
Code: 8526 6084 f0ef f71f fce5 60e2 6442 64a2 6105 8082 (9002) 1101
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception in interrupt
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 0-1
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

To avoid calling those checks in the CPU hot-plug paths again and again,
simply move the check at the beginning of the check_unaligned_access()
function and rely on the value determined during the system boot.

Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
Fixes: c20d36cc2a20 ("riscv: don't probe unaligned access speed if already done")
Fixes: 71c54b3d169d ("riscv: report misaligned accesses emulation to hwprobe")
Signed-off-by: Marek Szyprowski <m.szyprowski@...sung.com>
---
 arch/riscv/kernel/cpufeature.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 2f51dc4eaf77..3c502840a548 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -571,13 +571,13 @@ void check_unaligned_access(int cpu)
 	void *src;
 	long speed = RISCV_HWPROBE_MISALIGNED_SLOW;
 
-	if (check_unaligned_access_emulated(cpu))
-		return;
-
 	/* We are already set since the last check */
 	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN)
 		return;
 
+	if (check_unaligned_access_emulated(cpu))
+		return;
+
 	page = alloc_pages(GFP_NOWAIT, get_order(MISALIGNED_BUFFER_SIZE));
 	if (!page) {
 		pr_warn("Can't alloc pages to measure memcpy performance");
-- 
2.34.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ