[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6YPpbRF_U0TxAbf@hovoldconsulting.com>
Date: Fri, 7 Feb 2025 14:50:29 +0100
From: Johan Hovold <johan@...nel.org>
To: Jon Hunter <jonathanh@...dia.com>,
"Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Alan Stern <stern@...land.harvard.edu>,
Bjorn Helgaas <helgaas@...nel.org>,
Linux PCI <linux-pci@...r.kernel.org>,
Ulf Hansson <ulf.hansson@...aro.org>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Kevin Xie <kevin.xie@...rfivetech.com>,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v1] PM: sleep: core: Synchronize runtime PM status of
parents and children
On Fri, Feb 07, 2025 at 01:38:58PM +0000, Jon Hunter wrote:
> On 28/01/2025 19:24, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> > resume phase") overlooked the case in which the parent of a device with
> > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> > suspended before a transition into a system-wide sleep state. In that
> > case, if the child is resumed during the subsequent transition from
> > that state into the working state, its runtime PM status will be set to
> > RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> > accordingly, even though the parent will be resumed too, because of the
> > dev_pm_skip_suspend() check in device_resume_noirq().
> >
> > Address this problem by tracking the need to set the runtime PM status
> > to RPM_ACTIVE during system-wide resume transitions for devices with
> > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
> >
> > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> > Reported-by: Johan Hovold <johan@...nel.org>
> > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> I am seeing the following crash during suspend on a couple of our boards (with mainline/next) and bisect is pointing to this commit ...
>
> [ 216.311009] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> [ 216.468986] Call trace:
> [ 216.471179] simple_pm_bus_runtime_suspend+0x14/0x48 (P)
> [ 216.476775] pm_generic_runtime_suspend+0x2c/0x44
> [ 216.481499] pm_runtime_force_suspend+0x54/0x14c
> [ 216.486049] device_suspend_noirq+0x6c/0x278
> [ 216.490253] dpm_suspend_noirq+0xc0/0x198
> [ 216.494278] suspend_devices_and_enter+0x210/0x4c0
> [ 216.499348] pm_suspend+0x164/0x1c8
> [ 216.503023] state_store+0x8c/0xfc
> [ 216.506260] kobj_attr_store+0x18/0x2c
> [ 216.509940] sysfs_kf_write+0x44/0x54
> [ 216.513699] kernfs_fop_write_iter+0x118/0x1a8
> [ 216.518163] vfs_write+0x2b0/0x35c
> [ 216.521399] ksys_write+0x68/0xfc
> [ 216.524810] __arm64_sys_write+0x1c/0x28
> [ 216.528574] invoke_syscall+0x48/0x110
> [ 216.532253] el0_svc_common.constprop.0+0x40/0xe8
> [ 216.536628] do_el0_svc+0x20/0x2c
> [ 216.540299] el0_svc+0x30/0xd0
> [ 216.543016] el0t_64_sync_handler+0x144/0x168
> [ 216.547736] el0t_64_sync+0x198/0x19c
> [ 216.551327] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
> [ 216.557197] ---[ end trace 0000000000000000 ]---
>
> I have not looked any further, but if you have any thoughts, let me know.
Yeah, I hit something like this yesterday as well and did confirm that
reverting this commit makes the problem go away. Haven't had time to dig
much further.
[ 110.522368] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 110.531751] Mem abort info:
[ 110.534785] ESR = 0x0000000096000004
[ 110.538799] EC = 0x25: DABT (current EL), IL = 32 bits
[ 110.544421] SET = 0, FnV = 0
[ 110.547716] EA = 0, S1PTW = 0
[ 110.551097] FSC = 0x04: level 0 translation fault
[ 110.556274] Data abort info:
[ 110.559385] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 110.565188] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 110.570536] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 110.576157] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000892256000
[ 110.582946] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 110.590348] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
...
[ 110.742358] CPU: 2 UID: 0 PID: 420 Comm: suspend-test.sh Not tainted 6.13.0 #118
[ 110.750067] Hardware name: Qualcomm CRD, BIOS 6.0.231221.BOOT.MXF.2.4-00348.1-HAMOA-1 12/21/2023
[ 110.759198] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 110.766462] pc : simple_pm_bus_runtime_suspend+0x14/0x48
[ 110.772048] lr : pm_generic_runtime_suspend+0x2c/0x44
[ 110.777352] sp : ffff8000839d3a20
[ 110.780866] x29: ffff8000839d3a20 x28: ffff0edf80fff810 x27: ffffa2dc2d1f1d30
[ 110.788303] x26: ffffa2dc2dd82000 x25: 0000000000000002 x24: ffffa2dc2cc7aca0
[ 110.795741] x23: ffffa2dc2dd1e000 x22: 0000000000000000 x21: ffffa2dc2d090e50
[ 110.803177] x20: ffffa2dc2c612498 x19: ffff0edf80fff810 x18: 0000000000000030
[ 110.810615] x17: 0005002c00000000 x16: ffffa2dc2c614ce4 x15: ffffffffffffffff
[ 110.818052] x14: 0000000000000000 x13: ffff0edf80fff980 x12: 705f64706e65675f
[ 110.825490] x11: ffffa2dc2d9c5890 x10: 0000000000000000 x9 : 0000000000000000
[ 110.832927] x8 : ffffa2dc2d2af000 x7 : ffff8000839d3a10 x6 : ffff8000839d39b0
[ 110.840364] x5 : ffff8000839d4000 x4 : 0000000000000004 x3 : ffff0edf953e0000
[ 110.847801] x2 : ffffa2dc2c4e5784 x1 : 0000000000000000 x0 : 0000000000000000
[ 110.855238] Call trace:
[ 110.857861] simple_pm_bus_runtime_suspend+0x14/0x48 (P)
[ 110.863425] pm_generic_runtime_suspend+0x2c/0x44
[ 110.868362] pm_runtime_force_suspend+0x54/0x100
[ 110.873217] dpm_run_callback+0xb4/0x228
[ 110.877347] device_suspend_noirq+0x70/0x2a8
[ 110.881844] dpm_noirq_suspend_devices+0xe0/0x230
[ 110.886778] dpm_suspend_noirq+0x24/0x98
[ 110.890904] suspend_devices_and_enter+0x368/0x678
[ 110.895941] pm_suspend+0x1b4/0x348
[ 110.899627] state_store+0x8c/0xfc
[ 110.903228] kobj_attr_store+0x18/0x2c
[ 110.907195] sysfs_kf_write+0x4c/0x78
[ 110.911074] kernfs_fop_write_iter+0x120/0x1b4
[ 110.915735] vfs_write+0x2ac/0x358
[ 110.919352] ksys_write+0x68/0xfc
[ 110.922873] __arm64_sys_write+0x1c/0x28
[ 110.927002] invoke_syscall+0x48/0x110
[ 110.930969] el0_svc_common.constprop.0+0x40/0xe0
[ 110.935907] do_el0_svc+0x1c/0x28
[ 110.939427] el0_svc+0x48/0x114
[ 110.942769] el0t_64_sync_handler+0xc8/0xcc
[ 110.947180] el0t_64_sync+0x198/0x19c
[ 110.951059] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
[ 110.957428] ---[ end trace 0000000000000000 ]---
Johan
Powered by blists - more mailing lists