[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2872d91-039b-399c-af88-c20bf605e172@quicinc.com>
Date: Fri, 11 Oct 2024 16:14:10 +0530
From: Krishna Chaitanya Chundru <quic_krichai@...cinc.com>
To: Bjorn Helgaas <helgaas@...nel.org>, Mayank Rana <quic_mrana@...cinc.com>
CC: <kevin.xie@...rfivetech.com>, <lpieralisi@...nel.org>, <kw@...ux.com>,
<manivannan.sadhasivam@...aro.org>, <robh@...nel.org>,
<bhelgaas@...gle.com>, <linux-pci@...r.kernel.org>,
<linux-kernel@...r.kernel.org>,
Marek Szyprowski
<m.szyprowski@...sung.com>
Subject: Re: [PATCH] PCI: starfive: Enable PCIe controller's PM runtime before
probing host bridge
On 10/11/2024 2:22 AM, Bjorn Helgaas wrote:
> On Thu, Oct 10, 2024 at 01:29:50PM -0700, Mayank Rana wrote:
>> Commit 02787a3b4d10 ("PCI/PM: Enable runtime power management for host
>> bridges") enables runtime PM for host bridge enforcing dependency chain
>> between PCIe controller, host bridge and endpoint devices. With this,
>> Starfive PCIe controller driver's probe enables host bridge (child device)
>> PM runtime before parent's PM runtime (Starfive PCIe controller device)
>> causing below warning and callstack:
>
> I don't want the bisection hole that would result if we kept
> 02787a3b4d10 ("PCI/PM: Enable runtime power management for host
> bridges") and applied this patch on top of it.
>
> If this is the fix, we'll apply it *first*, followed by 02787a3b4d10
> (which will obviously become a different commit), so the locking
> problem below described below should never exist in -next or the
> upstream tree.
>
> So we need to audit other drivers to make sure they don't have theBjorn, I have checked all the drivers in the controller folder where
they are using pm_runtime_enable(), this is the only driver which needs
to be fixed. once this patched was taken can we take "PCI/PM: Enable
runtime power management for host bridges"
- Krishna Chaitanya.
> same problem as starfive, then make a series with this patch and any
> others that need similar fixes, followed by the "PCI/PM: Enable
> runtime power management for host bridges" patch.
>
> Presumably this patch fixes a problem all by itself, even without
> considering 02787a3b4d10, so the commit message should describe that
> problem.
>
>> pcie-starfive 940000000.pcie: Enabling runtime PM for inactive device
>> with active children
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 6.12.0-rc1+ #15438 Not tainted
>> ------------------------------------------------------
>> systemd-udevd/159 is trying to acquire lock:
>> ffffffff81822520 (console_owner){-.-.}-{0:0}, at:
>> console_lock_spinning_enable+0x3a/0x60
>>
>> but task is already holding lock:
>> ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
>> pm_runtime_enable+0x1e/0xb6
>>
>> which lock already depends on the new lock.
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (&dev->power.lock){-...}-{2:2}:
>> lock_acquire.part.0+0xa2/0x1d4
>> lock_acquire+0x44/0x5a
>> _raw_spin_lock_irqsave+0x3a/0x64
>> __pm_runtime_resume+0x40/0x86
>> __uart_start+0x40/0xb2
>> uart_write+0x90/0x220
>> n_tty_write+0x10a/0x40e
>> file_tty_write.constprop.0+0x10c/0x230
>> redirected_tty_write+0x84/0xbc
>> do_iter_readv_writev+0x100/0x166
>> vfs_writev+0xc6/0x398
>> do_writev+0x5c/0xca
>> __riscv_sys_writev+0x16/0x1e
>> do_trap_ecall_u+0x1b6/0x1e2
>> _new_vmalloc_restore_context_a0+0xc2/0xce
>>
>> -> #1 (&port_lock_key){-.-.}-{2:2}:
>> lock_acquire.part.0+0xa2/0x1d4
>> lock_acquire+0x44/0x5a
>> _raw_spin_lock_irqsave+0x3a/0x64
>> serial8250_console_write+0x2a0/0x474
>> univ8250_console_write+0x22/0x2a
>> console_flush_all+0x2f6/0x3c8
>> console_unlock+0x80/0x1a8
>> vprintk_emit+0x10e/0x2e0
>> vprintk_default+0x16/0x1e
>> vprintk+0x1e/0x3c
>> _printk+0x36/0x50
>> register_console+0x292/0x418
>> serial_core_register_port+0x6d6/0x6dc
>> serial_ctrl_register_port+0xc/0x14
>> uart_add_one_port+0xc/0x14
>> serial8250_register_8250_port+0x288/0x428
>> dw8250_probe+0x422/0x518
>> platform_probe+0x4e/0x92
>> really_probe+0x10a/0x2da
>> __driver_probe_device.part.0+0xb2/0xe8
>> driver_probe_device+0x78/0xc4
>> __device_attach_driver+0x66/0xc6
>> bus_for_each_drv+0x5c/0xb0
>> __device_attach+0x84/0x13c
>> device_initial_probe+0xe/0x16
>> bus_probe_device+0x88/0x8a
>> deferred_probe_work_func+0xd4/0xee
>> process_one_work+0x1e0/0x534
>> worker_thread+0x166/0x2cc
>> kthread+0xc4/0xe0
>> ret_from_fork+0xe/0x18
>>
>> -> #0 (console_owner){-.-.}-{0:0}:
>> check_noncircular+0x10e/0x122
>> __lock_acquire+0x105c/0x1f4a
>> lock_acquire.part.0+0xa2/0x1d4
>> lock_acquire+0x44/0x5a
>> console_lock_spinning_enable+0x58/0x60
>> console_flush_all+0x2cc/0x3c8
>> console_unlock+0x80/0x1a8
>> vprintk_emit+0x10e/0x2e0
>> dev_vprintk_emit+0xea/0x112
>> dev_printk_emit+0x2e/0x48
>> __dev_printk+0x40/0x5c
>> _dev_warn+0x46/0x60
>> pm_runtime_enable+0x98/0xb6
>> starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
>> platform_probe+0x4e/0x92
>> really_probe+0x10a/0x2da
>> __driver_probe_device.part.0+0xb2/0xe8
>> driver_probe_device+0x78/0xc4
>> __driver_attach+0x54/0x162
>> bus_for_each_dev+0x58/0xa4
>> driver_attach+0x1a/0x22
>> bus_add_driver+0xec/0x1ce
>> driver_register+0x3e/0xd8
>> __platform_driver_register+0x1c/0x24
>> starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
>> do_one_initcall+0x5e/0x28c
>> do_init_module+0x52/0x1ba
>> load_module+0x1440/0x18f0
>> init_module_from_file+0x76/0xae
>> idempotent_init_module+0x18c/0x24a
>> __riscv_sys_finit_module+0x52/0x82
>> do_trap_ecall_u+0x1b6/0x1e2
>> _new_vmalloc_restore_context_a0+0xc2/0xce
>>
>> other info that might help us debug this:
>>
>> Chain exists of:
>> console_owner --> &port_lock_key --> &dev->power.lock
>>
>> Possible unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(&dev->power.lock);
>> lock(&port_lock_key);
>> lock(&dev->power.lock);
>> lock(console_owner);
>>
>> *** DEADLOCK ***
>>
>> 4 locks held by systemd-udevd/159:
>> #0: ffffffd6c0b3d8f8 (&dev->mutex){....}-{3:3}, at:
>> __driver_attach+0x4c/0x162
>> #1: ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
>> pm_runtime_enable+0x1e/0xb6
>> #2: ffffffff818223b0 (console_lock){+.+.}-{0:0}, at:
>> dev_vprintk_emit+0xea/0x112
>> #3: ffffffff81822448 (console_srcu){....}-{0:0}, at:
>> console_flush_all+0x4e/0x3c8
>>
>> stack backtrace:
>> CPU: 1 UID: 0 PID: 159 Comm: systemd-udevd Not tainted 6.12.0-rc1+ #15438
>> Hardware name: StarFive VisionFive 2 v1.2A (DT)
>> Call Trace:
>> [<ffffffff80006a02>] dump_backtrace+0x1c/0x24
>> [<ffffffff80b70b3e>] show_stack+0x2c/0x38
>> [<ffffffff80b7f8f8>] dump_stack_lvl+0x7a/0xb4
>> [<ffffffff80b7f946>] dump_stack+0x14/0x1c
>> [<ffffffff8007fbc2>] print_circular_bug+0x2aa/0x350
>> [<ffffffff8007fd76>] check_noncircular+0x10e/0x122
>> [<ffffffff80082a3c>] __lock_acquire+0x105c/0x1f4a
>> [<ffffffff80084148>] lock_acquire.part.0+0xa2/0x1d4
>> [<ffffffff800842be>] lock_acquire+0x44/0x5a
>> [<ffffffff8008b3e8>] console_lock_spinning_enable+0x58/0x60
>> [<ffffffff8008c0c2>] console_flush_all+0x2cc/0x3c8
>> [<ffffffff8008c23e>] console_unlock+0x80/0x1a8
>> [<ffffffff8008c710>] vprintk_emit+0x10e/0x2e0
>> [<ffffffff80b79ec8>] dev_vprintk_emit+0xea/0x112
>> [<ffffffff80b79f1e>] dev_printk_emit+0x2e/0x48
>> [<ffffffff80b7a006>] __dev_printk+0x40/0x5c
>> [<ffffffff80b7a2be>] _dev_warn+0x46/0x60
>> [<ffffffff807037ae>] pm_runtime_enable+0x98/0xb6
>> [<ffffffff02763240>] starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
>> [<ffffffff806f83f6>] platform_probe+0x4e/0x92
>> [<ffffffff80b7a680>] really_probe+0x10a/0x2da
>> [<ffffffff80b7a902>] __driver_probe_device.part.0+0xb2/0xe8
>> [<ffffffff806f6112>] driver_probe_device+0x78/0xc4
>> [<ffffffff806f6278>] __driver_attach+0x54/0x162
>> [<ffffffff806f42e6>] bus_for_each_dev+0x58/0xa4
>> [<ffffffff806f5c9e>] driver_attach+0x1a/0x22
>> [<ffffffff806f54ce>] bus_add_driver+0xec/0x1ce
>> [<ffffffff806f7112>] driver_register+0x3e/0xd8
>> [<ffffffff806f80cc>] __platform_driver_register+0x1c/0x24
>> [<ffffffff027ea020>] starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
>> [<ffffffff800027ba>] do_one_initcall+0x5e/0x28c
>> [<ffffffff800bb4d8>] do_init_module+0x52/0x1ba
>> [<ffffffff800bcc8a>] load_module+0x1440/0x18f0
>> [<ffffffff800bd2f0>] init_module_from_file+0x76/0xae
>> [<ffffffff800bd4b4>] idempotent_init_module+0x18c/0x24a
>> [<ffffffff800bd5fc>] __riscv_sys_finit_module+0x52/0x82
>> [<ffffffff80b804a0>] do_trap_ecall_u+0x1b6/0x1e2
>> [<ffffffff80b8c536>] _new_vmalloc_restore_context_a0+0xc2/0xce
>>
>> Fix this issue by enabling starfive pcie controller device's PM runtime
>> status before calling into pci_host_probe() through plda_pcie_host_init().
>>
>> Reported-by: Marek Szyprowski <m.szyprowski@...sung.com>
>> Fixes: 02787a3b4d10 ("PCI/PM: Enable runtime power management for host bridges")
>> Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>
>> Signed-off-by: Mayank Rana <quic_mrana@...cinc.com>
>> ---
>> drivers/pci/controller/plda/pcie-starfive.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/controller/plda/pcie-starfive.c b/drivers/pci/controller/plda/pcie-starfive.c
>> index 0567ec373a3e..e73c1b7bc8ef 100644
>> --- a/drivers/pci/controller/plda/pcie-starfive.c
>> +++ b/drivers/pci/controller/plda/pcie-starfive.c
>> @@ -404,6 +404,9 @@ static int starfive_pcie_probe(struct platform_device *pdev)
>> if (ret)
>> return ret;
>>
>> + pm_runtime_enable(&pdev->dev);
>> + pm_runtime_get_sync(&pdev->dev);
>> +
>> plda->host_ops = &sf_host_ops;
>> plda->num_events = PLDA_MAX_EVENT_NUM;
>> /* mask doorbell event */
>> @@ -413,11 +416,12 @@ static int starfive_pcie_probe(struct platform_device *pdev)
>> plda->events_bitmap <<= PLDA_NUM_DMA_EVENTS;
>> ret = plda_pcie_host_init(&pcie->plda, &starfive_pcie_ops,
>> &stf_pcie_event);
>> - if (ret)
>> + if (ret) {
>> + pm_runtime_put_sync(&pdev->dev);
>> + pm_runtime_disable(&pdev->dev);
>> return ret;
>> + }
>>
>> - pm_runtime_enable(&pdev->dev);
>> - pm_runtime_get_sync(&pdev->dev);
>> platform_set_drvdata(pdev, pcie);
>>
>> return 0;
>> --
>> 2.25.1
>>
Powered by blists - more mailing lists