[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3c7523a3-2de2-3a76-2f46-9e4cf38f40b6@huawei.com>
Date: Thu, 25 Nov 2021 15:23:30 +0800
From: Xiaoming Ni <nixiaoming@...wei.com>
To: Martin Kennedy <hurricos@...il.com>
CC: <Yuantian.Tang@...scale.com>, <benh@...nel.crashing.org>,
<chenhui.zhao@...escale.com>, <chenjianguo3@...wei.com>,
<gregkh@...uxfoundation.org>, <linux-kernel@...r.kernel.org>,
<linuxppc-dev@...ts.ozlabs.org>, <liuwenliang@...wei.com>,
<mpe@...erman.id.au>, <oss@...error.net>,
<paul.gortmaker@...driver.com>, <paulus@...ba.org>,
<stable@...r.kernel.org>, <wangle6@...wei.com>,
"Christian Lamparter" <chunkeey@...il.com>
Subject: Re: [PATCH v2 2/2] powerpc:85xx: fix timebase sync issue when
CONFIG_HOTPLUG_CPU=n
On 2021/11/25 12:20, Martin Kennedy wrote:
> Hi there,
>
> I have bisected OpenWrt master, and then the Linux kernel down to this
> change, to confirm that this change causes a kernel panic on my
> P1020RDB-based, dual-core Aerohive HiveAP 370, at initialization of
> the second CPU:
>
> :
> [ 0.000000] Linux version 5.10.80 (labby@...on)
> (powerpc-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0
> r18111+1-ebb6f9287e) 11.2.0, GNU ld (GNU Binutils) 2.37) #0 SMP Thu
> Nov 25 02:49:35 2021
> [ 0.000000] Using P1020 RDB machine description
> :
> [ 0.627233] smp: Bringing up secondary CPUs ...
> [ 0.681659] kernel tried to execute user page (0) - exploit attempt? (uid: 0)
> [ 0.766618] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
> [ 0.848899] Faulting instruction address: 0x00000000
> [ 0.908273] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 0.972851] BE PAGE_SIZE=4K SMP NR_CPUS=2 P1020 RDB
> [ 1.031179] Modules linked in:
> [ 1.067640] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.80 #0
> [ 1.139507] NIP: 00000000 LR: c0021d2c CTR: 00000000
> [ 1.199921] REGS: c1051cf0 TRAP: 0400 Not tainted (5.10.80)
> [ 1.269705] MSR: 00021000 <CE,ME> CR: 84020202 XER: 00000000
> [ 1.340534]
> [ 1.340534] GPR00: c0021cb8 c1051da8 c1048000 00000001 00029000
> 00000000 00000001 00000000
> [ 1.340534] GPR08: 00000001 00000000 c08b0000 00000040 22000208
> 00000000 c00032c4 00000000
> [ 1.340534] GPR16: 00000000 00000000 00000000 00000000 00000000
> 00000000 00029000 00000001
> [ 1.340534] GPR24: 1ffff240 20000000 dffff240 c080a1f4 00000001
> c08ae0a8 00000001 dffff240
> [ 1.758220] NIP [00000000] 0x0
> [ 1.794688] LR [c0021d2c] smp_85xx_kick_cpu+0xe8/0x568
> [ 1.856126] Call Trace:
> [ 1.885295] [c1051da8] [c0021cb8] smp_85xx_kick_cpu+0x74/0x568 (unreliable)
> [ 1.968633] [c1051de8] [c0011460] __cpu_up+0xc0/0x228
> [ 2.029038] [c1051e18] [c0031bbc] bringup_cpu+0x30/0x224
> [ 2.092572] [c1051e48] [c0031f3c] cpu_up.constprop.0+0x180/0x33c
> [ 2.164443] [c1051e88] [c00322e8] bringup_nonboot_cpus+0x88/0xc8
> [ 2.236326] [c1051eb8] [c07e67bc] smp_init+0x30/0x78
> [ 2.295698] [c1051ed8] [c07d9e28] kernel_init_freeable+0x118/0x2a8
> [ 2.369641] [c1051f18] [c00032d8] kernel_init+0x14/0x124
> [ 2.433176] [c1051f38] [c0010278] ret_from_kernel_thread+0x14/0x1c
> [ 2.507125] Instruction dump:
> [ 2.542541] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX
> [ 2.635242] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX
> [ 2.727952] ---[ end trace 9b796a4bafb6bc14 ]---
> [ 2.783149]
> [ 3.800879] Kernel panic - not syncing: Fatal exception
> [ 3.862353] Rebooting in 1 seconds..
> [ 5.905097] System Halted, OK to turn off power
>
> Without this patch, the kernel no longer panics:
>
> [ 0.627232] smp: Bringing up secondary CPUs ...
> [ 0.681857] smp: Brought up 1 node, 2 CPUs
>
> Here is the kernel configuration for this built kernel:
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob_plain;f=target/linux/mpc85xx/config-5.10;hb=HEAD
>
> In case a force-push is needed for the source repository
> (https://github.com/Hurricos/openwrt/commit/ad19bdfc77d60ee1c52b41bb4345fdd02284c4cf),
> here is the device tree for this board:
> https://paste.c-net.org/TrousersSliced
>
> Martin
> .
>
When CONFIG_FSL_PMC is set to n, cpu_up_prepare is not assigned to
mpc85xx_pm_ops. I suspect that this is the cause of the current null
pointer access.
I do not have the corresponding board test environment. Can you help me
to test whether the following patch solves the problem?
diff --git a/arch/powerpc/platforms/85xx/smp.c
b/arch/powerpc/platforms/85xx/smp.c
index 83f4a6389a28..d7081e9af65c 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -220,7 +220,7 @@ static int smp_85xx_start_cpu(int cpu)
local_irq_save(flags);
hard_irq_disable();
- if (qoriq_pm_ops)
+ if (qoriq_pm_ops && qoriq_pm_ops->cpu_up_prepare)
qoriq_pm_ops->cpu_up_prepare(cpu);
/* if cpu is not spinning, reset it */
@@ -292,7 +292,7 @@ static int smp_85xx_kick_cpu(int nr)
booting_thread_hwid = cpu_thread_in_core(nr);
primary = cpu_first_thread_sibling(nr);
- if (qoriq_pm_ops)
+ if (qoriq_pm_ops && qoriq_pm_ops->cpu_up_prepare)
qoriq_pm_ops->cpu_up_prepare(nr);
/*
Powered by blists - more mailing lists