[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b7b8c9f-48c5-45cd-8366-c8c048eaa757@oss.qualcomm.com>
Date: Thu, 11 Sep 2025 14:04:27 +0530
From: Praveen Talari <praveen.talari@....qualcomm.com>
To: Alexey Klimov <alexey.klimov@...aro.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>,
Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
Praveen Talari <quic_ptalari@...cinc.com>,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-serial@...r.kernel.org
Cc: psodagud@...cinc.com, djaggi@...cinc.com, quic_msavaliy@...cinc.com,
quic_vtanuku@...cinc.com, quic_arandive@...cinc.com,
quic_mnaresh@...cinc.com, quic_shazhuss@...cinc.com, krzk@...nel.org
Subject: Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime
resume
Hi Alexy,
Thank you for update.
On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>
> (adding Krzysztof to c/c)
>
> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>> __synchronize_irq(), conflicting with the active wakeup state and
>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>> leading to system instability.
>>>
>>> The critical call trace leading to the deadlock is:
>>>
>>> Call trace:
>>> __switch_to+0xe0/0x120
>>> __schedule+0x39c/0x978
>>> schedule+0x5c/0xf8
>>> __synchronize_irq+0x88/0xb4
>>> disable_irq+0x3c/0x4c
>>> msm_pinmux_set_mux+0x508/0x644
>>> pinmux_enable_setting+0x190/0x2dc
>>> pinctrl_commit_state+0x13c/0x208
>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>> geni_se_resources_on+0xe8/0x154
>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>> pm_generic_runtime_resume+0x2c/0x44
>>> __genpd_runtime_resume+0x30/0x80
>>> genpd_runtime_resume+0x114/0x29c
>>> __rpm_callback+0x48/0x1d8
>>> rpm_callback+0x6c/0x78
>>> rpm_resume+0x530/0x750
>>> __pm_runtime_resume+0x50/0x94
>>> handle_threaded_wake_irq+0x30/0x94
>>> irq_thread_fn+0x2c/xa8
>>> irq_thread+0x160/x248
>>> kthread+0x110/x114
>>> ret_from_fork+0x10/x20
>>>
>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>> disable_irq_wake() before enabling resources. This preemptively
>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>> interrupt management calls to proceed without conflict. An error path
>>> re-enables the wakeup IRQ if resource enablement fails.
>>>
>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>> are disabled. This ensures the interrupt is configured as a wakeup
>>> source only once the device has fully entered its low-power state. An
>>> error path handles disabling the wakeup IRQ if the suspend operation
>>> fails.
>>>
>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>> Signed-off-by: Praveen Talari <praveen.talari@....qualcomm.com>
>>
>> You forgot:
>>
>> Reported-by: Alexey Klimov <alexey.klimov@...aro.org>
>>
>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>> this should be picked for current -rc cycle since regression is
>> introduced during latest merge window.
>>
>> I also would like to test it on qrb2210 rb1 where this regression is
>> reproduciable.
>
> It doesn't seem that it fixes the regression on RB1 board:
>
> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
> Workqueue: async async_run_entry_fn
> Call trace:
> __switch_to+0xf0/0x1c0 (T)
> __schedule+0x358/0x99c
> schedule+0x34/0x11c
> rpm_resume+0x17c/0x6a0
> rpm_resume+0x2c4/0x6a0
> rpm_resume+0x2c4/0x6a0
> rpm_resume+0x2c4/0x6a0
> __pm_runtime_resume+0x50/0x9c
> __driver_probe_device+0x58/0x120
> driver_probe_device+0x3c/0x154
> __driver_attach_async_helper+0x4c/0xc0
> async_run_entry_fn+0x34/0xe0
> process_one_work+0x148/0x284
> worker_thread+0x2c4/0x3e0
> kthread+0x12c/0x210
> ret_from_fork+0x10/0x20
> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
> Call trace:
> __switch_to+0xf0/0x1c0 (T)
> __schedule+0x358/0x99c
> schedule+0x34/0x11c
> __synchronize_irq+0x90/0xcc
> disable_irq+0x3c/0x4c
> msm_pinmux_set_mux+0x3b4/0x45c
> pinmux_enable_setting+0x1fc/0x2d8
> pinctrl_commit_state+0xa0/0x260
> pinctrl_pm_select_default_state+0x4c/0xa0
> geni_se_resources_on+0xe8/0x154
> geni_serial_resource_state+0x8c/0xbc
> qcom_geni_serial_runtime_resume+0x3c/0x88
> pm_generic_runtime_resume+0x2c/0x44
> __rpm_callback+0x48/0x1e0
> rpm_callback+0x74/0x80
> rpm_resume+0x3bc/0x6a0
> __pm_runtime_resume+0x50/0x9c
> handle_threaded_wake_irq+0x30/0x80
> irq_thread_fn+0x2c/0xb0
> irq_thread+0x170/0x334
> kthread+0x12c/0x210
> ret_from_fork+0x10/0x20
I can see call stack is mostly similar for yours and mine but not
completely at initial calls.
Yours dump:
> qcom_geni_serial_runtime_resume+0x3c/0x88
> pm_generic_runtime_resume+0x2c/0x44
> __rpm_callback+0x48/0x1e0
> rpm_callback+0x74/0x80
> rpm_resume+0x3bc/0x6a0
> __pm_runtime_resume+0x50/0x9c
> handle_threaded_wake_irq+0x30/0x80
Mine:
>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>> pm_generic_runtime_resume+0x2c/0x44
>>> __genpd_runtime_resume+0x30/0x80
>>> genpd_runtime_resume+0x114/0x29c
>>> __rpm_callback+0x48/0x1d8
>>> rpm_callback+0x6c/0x78
>>> rpm_resume+0x530/0x750
Can you please share what is DT file for this Board if possible?
is there any usecase enabled on this SE instance?
Thanks,
Praveen Talari
>
> I see exactly the same behaviour with this changes applied.
>
> root@rb1:~# uname -a
> Linux rb1 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13 SMP PREEMPT Tue Sep 9 20:14:22 BST 2025 aarch64 GNU/Linux
>
> I see the same behaviour with linux-next but my local tree is a bit old,
> maybe there are some dependencies.
>
> Best regards,
> Alexey
Powered by blists - more mailing lists