[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c2433d4-7e0f-4395-b841-b8eac7c25651@nvidia.com>
Date: Fri, 7 Feb 2025 13:38:58 +0000
From: Jon Hunter <jonathanh@...dia.com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Linux PM <linux-pm@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Alan Stern <stern@...land.harvard.edu>, Bjorn Helgaas <helgaas@...nel.org>,
Linux PCI <linux-pci@...r.kernel.org>, Ulf Hansson <ulf.hansson@...aro.org>,
Johan Hovold <johan@...nel.org>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Kevin Xie <kevin.xie@...rfivetech.com>,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v1] PM: sleep: core: Synchronize runtime PM status of
parents and children
Hi Rafael,
On 28/01/2025 19:24, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>
> Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the
> resume phase") overlooked the case in which the parent of a device with
> DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime-
> suspended before a transition into a system-wide sleep state. In that
> case, if the child is resumed during the subsequent transition from
> that state into the working state, its runtime PM status will be set to
> RPM_ACTIVE, but the runtime PM status of the parent will not be updated
> accordingly, even though the parent will be resumed too, because of the
> dev_pm_skip_suspend() check in device_resume_noirq().
>
> Address this problem by tracking the need to set the runtime PM status
> to RPM_ACTIVE during system-wide resume transitions for devices with
> DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them.
>
> Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase")
> Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@hovoldconsulting.com/
> Reported-by: Johan Hovold <johan@...nel.org>
> Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
> drivers/base/power/main.c | 29 ++++++++++++++++++++---------
> include/linux/pm.h | 1 +
> 2 files changed, 21 insertions(+), 9 deletions(-)
>
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -656,13 +656,15 @@
> * so change its status accordingly.
> *
> * Otherwise, the device is going to be resumed, so set its PM-runtime
> - * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
> - * to avoid confusing drivers that don't use it.
> + * status to "active" unless its power.set_active flag is clear, in
> + * which case it is not necessary to update its PM-runtime status.
> */
> - if (skip_resume)
> + if (skip_resume) {
> pm_runtime_set_suspended(dev);
> - else if (dev_pm_skip_suspend(dev))
> + } else if (dev->power.set_active) {
> pm_runtime_set_active(dev);
> + dev->power.set_active = false;
> + }
>
> if (dev->pm_domain) {
> info = "noirq power domain ";
> @@ -1189,18 +1191,24 @@
> return PMSG_ON;
> }
>
> -static void dpm_superior_set_must_resume(struct device *dev)
> +static void dpm_superior_set_must_resume(struct device *dev, bool set_active)
> {
> struct device_link *link;
> int idx;
>
> - if (dev->parent)
> + if (dev->parent) {
> dev->parent->power.must_resume = true;
> + if (set_active)
> + dev->parent->power.set_active = true;
> + }
>
> idx = device_links_read_lock();
>
> - list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node)
> + list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) {
> link->supplier->power.must_resume = true;
> + if (set_active)
> + link->supplier->power.set_active = true;
> + }
>
> device_links_read_unlock(idx);
> }
> @@ -1278,8 +1286,11 @@
> dev->power.may_skip_resume))
> dev->power.must_resume = true;
>
> - if (dev->power.must_resume)
> - dpm_superior_set_must_resume(dev);
> + if (dev->power.must_resume) {
> + dev->power.set_active = dev->power.set_active ||
> + dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND);
> + dpm_superior_set_must_resume(dev, dev->power.set_active);
> + }
>
> Complete:
> complete_all(&dev->power.completion);
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -683,6 +683,7 @@
> bool no_pm_callbacks:1; /* Owned by the PM core */
> bool async_in_progress:1; /* Owned by the PM core */
> bool must_resume:1; /* Owned by the PM core */
> + bool set_active:1; /* Owned by the PM core */
> bool may_skip_resume:1; /* Set by subsystems */
> #else
> bool should_wakeup:1;
I am seeing the following crash during suspend on a couple of our boards (with mainline/next) and bisect is pointing to this commit ...
[ 216.311009] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 216.311229] Mem abort info:
[ 216.311298] ESR = 0x0000000096000004
[ 216.311393] EC = 0x25: DABT (current EL), IL = 32 bits
[ 216.311515] SET = 0, FnV = 0
[ 216.311593] EA = 0, S1PTW = 0
[ 216.311668] FSC = 0x04: level 0 translation fault
[ 216.311770] Data abort info:
[ 216.311855] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 216.311972] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 216.312081] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 216.312267] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010b905000
[ 216.312411] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 216.312620] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 216.312747] Modules linked in: snd_soc_tegra210_admaif snd_soc_tegra186_asrc snd_soc_tegra_pcm snd_soc_tegra210_mixer snd_soc_tegra210_adx snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_dmic snd_soc_tegra210_amx snd_soc_tegra210_i2s snd_soc_tegra210_sfc tegra_drm drm_dp_aux_bus cec drm_display_helper drm_client_lib drm_kms_helper snd_soc_tegra210_ahub tegra210_adma drm snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_soc_rt5659 backlight snd_soc_simple_card_utils snd_soc_rl6231 ucsi_ccg typec_ucsi typec pwm_tegra ina3221 tegra_aconnect pwm_fan snd_hda_codec_hdmi snd_hda_tegra snd_hda_codec snd_hda_core phy_tegra194_p2u tegra_xudc at24 lm90 tegra_bpmp_thermal host1x pcie_tegra194 ip_tables x_tables ipv6
[ 216.354364] CPU: 1 UID: 0 PID: 14450 Comm: rtcwake Tainted: G W 6.14.0-rc1-next-20250206-g808eb958781e #1
[ 216.365388] Tainted: [W]=WARN
[ 216.368279] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
[ 216.375099] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 216.381928] pc : simple_pm_bus_runtime_suspend+0x14/0x48
[ 216.387698] lr : pm_generic_runtime_suspend+0x2c/0x44
[ 216.392962] sp : ffff80009003baf0
[ 216.396103] x29: ffff80009003baf0 x28: ffff000088043480 x27: 0000000000000000
[ 216.403453] x26: ffff800082c11350 x25: ffff800082e0e000 x24: ffff80008093d634
[ 216.410803] x23: 0000000000000000 x22: 0000000000000002 x21: ffff800082e0e000
[ 216.418411] x20: ffff800080938af8 x19: ffff0000808ec410 x18: ffffffffffff4ac8
[ 216.425328] x17: 2033303a35353a36 x16: 312031312d39302d x15: 000047edca5d49da
[ 216.432937] x14: ffff000088043500 x13: 0000000000000001 x12: 0000000000000000
[ 216.440110] x11: 000000321a7856a0 x10: 0000000000000aa0 x9 : ffff80009003b8e0
[ 216.447219] x8 : ffff000088043f80 x7 : ffff0003fdf08a40 x6 : 0000000000000000
[ 216.454291] x5 : 0000000000000000 x4 : ffff800081ffe8c0 x3 : ffff0000808ec410
[ 216.461723] x2 : ffff8000805d133c x1 : 0000000000000000 x0 : 0000000000000000
[ 216.468986] Call trace:
[ 216.471179] simple_pm_bus_runtime_suspend+0x14/0x48 (P)
[ 216.476775] pm_generic_runtime_suspend+0x2c/0x44
[ 216.481499] pm_runtime_force_suspend+0x54/0x14c
[ 216.486049] device_suspend_noirq+0x6c/0x278
[ 216.490253] dpm_suspend_noirq+0xc0/0x198
[ 216.494278] suspend_devices_and_enter+0x210/0x4c0
[ 216.499348] pm_suspend+0x164/0x1c8
[ 216.503023] state_store+0x8c/0xfc
[ 216.506260] kobj_attr_store+0x18/0x2c
[ 216.509940] sysfs_kf_write+0x44/0x54
[ 216.513699] kernfs_fop_write_iter+0x118/0x1a8
[ 216.518163] vfs_write+0x2b0/0x35c
[ 216.521399] ksys_write+0x68/0xfc
[ 216.524810] __arm64_sys_write+0x1c/0x28
[ 216.528574] invoke_syscall+0x48/0x110
[ 216.532253] el0_svc_common.constprop.0+0x40/0xe8
[ 216.536628] do_el0_svc+0x20/0x2c
[ 216.540299] el0_svc+0x30/0xd0
[ 216.543016] el0t_64_sync_handler+0x144/0x168
[ 216.547736] el0t_64_sync+0x198/0x19c
[ 216.551327] Code: a9be7bfd 910003fd a90153f3 f9403c00 (f9400014)
[ 216.557197] ---[ end trace 0000000000000000 ]---
I have not looked any further, but if you have any thoughts, let me know.
Thanks
Jon
--
nvpublic
Powered by blists - more mailing lists