[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DCT9VWQYD4VM.1NV5FJJCJG4PI@linaro.org>
Date: Mon, 15 Sep 2025 10:39:40 +0100
From: "Alexey Klimov" <alexey.klimov@...aro.org>
To: "Praveen Talari" <praveen.talari@....qualcomm.com>, "Praveen Talari"
<quic_ptalari@...cinc.com>
Cc: "Greg Kroah-Hartman" <gregkh@...uxfoundation.org>, "Jiri Slaby"
<jirislaby@...nel.org>, "Bryan O'Donoghue" <bryan.odonoghue@...aro.org>,
<linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-serial@...r.kernel.org>, <psodagud@...cinc.com>,
<djaggi@...cinc.com>, <quic_msavaliy@...cinc.com>,
<quic_vtanuku@...cinc.com>, <quic_arandive@...cinc.com>,
<quic_shazhuss@...cinc.com>, <krzk@...nel.org>
Subject: Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime
resume
(removing <quic_mnaresh@...cinc.com> from c/c -- too many mail not delivered)
Hi Praveen,
On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
> Hi Alexey,
>
> Really appreciate you waiting!
>
> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>> Hi Praveen,
>>
>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>> Hi Alexy,
>>>
>>> Thank you for update.
>>>
>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>
>>>> (adding Krzysztof to c/c)
>>>>
>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>> leading to system instability.
>>>>>>
>>>>>> The critical call trace leading to the deadlock is:
>>>>>>
>>>>>> Call trace:
>>>>>> __switch_to+0xe0/0x120
>>>>>> __schedule+0x39c/0x978
>>>>>> schedule+0x5c/0xf8
>>>>>> __synchronize_irq+0x88/0xb4
>>>>>> disable_irq+0x3c/0x4c
>>>>>> msm_pinmux_set_mux+0x508/0x644
>>>>>> pinmux_enable_setting+0x190/0x2dc
>>>>>> pinctrl_commit_state+0x13c/0x208
>>>>>> pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>> geni_se_resources_on+0xe8/0x154
>>>>>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>> pm_generic_runtime_resume+0x2c/0x44
>>>>>> __genpd_runtime_resume+0x30/0x80
>>>>>> genpd_runtime_resume+0x114/0x29c
>>>>>> __rpm_callback+0x48/0x1d8
>>>>>> rpm_callback+0x6c/0x78
>>>>>> rpm_resume+0x530/0x750
>>>>>> __pm_runtime_resume+0x50/0x94
>>>>>> handle_threaded_wake_irq+0x30/0x94
>>>>>> irq_thread_fn+0x2c/xa8
>>>>>> irq_thread+0x160/x248
>>>>>> kthread+0x110/x114
>>>>>> ret_from_fork+0x10/x20
>>>>>>
>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>> interrupt management calls to proceed without conflict. An error path
>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>
>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>> source only once the device has fully entered its low-power state. An
>>>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>>>> fails.
>>>>>>
>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>>>> Signed-off-by: Praveen Talari <praveen.talari@....qualcomm.com>
>>>>>
>>>>> You forgot:
>>>>>
>>>>> Reported-by: Alexey Klimov <alexey.klimov@...aro.org>
>>>>>
>>>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>>>> this should be picked for current -rc cycle since regression is
>>>>> introduced during latest merge window.
>>>>>
>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>> reproduciable.
>>>>
>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>
>>>> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
>>>> Workqueue: async async_run_entry_fn
>>>> Call trace:
>>>> __switch_to+0xf0/0x1c0 (T)
>>>> __schedule+0x358/0x99c
>>>> schedule+0x34/0x11c
>>>> rpm_resume+0x17c/0x6a0
>>>> rpm_resume+0x2c4/0x6a0
>>>> rpm_resume+0x2c4/0x6a0
>>>> rpm_resume+0x2c4/0x6a0
>>>> __pm_runtime_resume+0x50/0x9c
>>>> __driver_probe_device+0x58/0x120
>>>> driver_probe_device+0x3c/0x154
>>>> __driver_attach_async_helper+0x4c/0xc0
>>>> async_run_entry_fn+0x34/0xe0
>>>> process_one_work+0x148/0x284
>>>> worker_thread+0x2c4/0x3e0
>>>> kthread+0x12c/0x210
>>>> ret_from_fork+0x10/0x20
>>>> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
>>>> Call trace:
>>>> __switch_to+0xf0/0x1c0 (T)
>>>> __schedule+0x358/0x99c
>>>> schedule+0x34/0x11c
>>>> __synchronize_irq+0x90/0xcc
>>>> disable_irq+0x3c/0x4c
>>>> msm_pinmux_set_mux+0x3b4/0x45c
>>>> pinmux_enable_setting+0x1fc/0x2d8
>>>> pinctrl_commit_state+0xa0/0x260
>>>> pinctrl_pm_select_default_state+0x4c/0xa0
>>>> geni_se_resources_on+0xe8/0x154
>>>> geni_serial_resource_state+0x8c/0xbc
>>>> qcom_geni_serial_runtime_resume+0x3c/0x88
>>>> pm_generic_runtime_resume+0x2c/0x44
>>>> __rpm_callback+0x48/0x1e0
>>>> rpm_callback+0x74/0x80
>>>> rpm_resume+0x3bc/0x6a0
>>>> __pm_runtime_resume+0x50/0x9c
>>>> handle_threaded_wake_irq+0x30/0x80
>>>> irq_thread_fn+0x2c/0xb0
>>>> irq_thread+0x170/0x334
>>>> kthread+0x12c/0x210
>>>> ret_from_fork+0x10/0x20
>>>
>>> I can see call stack is mostly similar for yours and mine but not
>>> completely at initial calls.
>>>
>>> Yours dump:
>>> > qcom_geni_serial_runtime_resume+0x3c/0x88
>>> > pm_generic_runtime_resume+0x2c/0x44
>>> > __rpm_callback+0x48/0x1e0
>>> > rpm_callback+0x74/0x80
>>> > rpm_resume+0x3bc/0x6a0
>>> > __pm_runtime_resume+0x50/0x9c
>>> > handle_threaded_wake_irq+0x30/0x80
>>>
>>> Mine:
>>> >>> qcom_geni_serial_runtime_resume+0x4c/0x88
>>> >>> pm_generic_runtime_resume+0x2c/0x44
>>> >>> __genpd_runtime_resume+0x30/0x80
>>> >>> genpd_runtime_resume+0x114/0x29c
>>> >>> __rpm_callback+0x48/0x1d8
>>> >>> rpm_callback+0x6c/0x78
>>> >>> rpm_resume+0x530/0x750
>>>
>>>
>>> Can you please share what is DT file for this Board if possible?
>>> is there any usecase enabled on this SE instance?
>>
>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>> behaviour was exactly the same. I thought that the purpose was to fix
>> the regression reported earlier.
>>
>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>>
>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>> it is worth checking it as well.
>> For testing here I didn't use anything extra (the only change was wifi fix
>> from Loic); I tested -master and linux-next usually.
>>
>> If you can tell me what is SE instance I may be able to answer. But
>> as far as I know it is not a part of any infrastructure or CI machinery.
>> I just boot the board and see if it works, if it does then I rebuild and
>> test my changes (audio).
>
> I'm actively working on this and experimenting various scenarios with
> wakeup. I’ll share the updated patch as soon as possible.
>
> Should we include fix in V2 or new version(V1) if the fix originates
> from a different subsystem(pinctrol)?
Wait, I am a bit lost. Are there two regresssions? And is this patch only
targets one of the them?
Are there two fixes now for different problems?
If they are not related (independent) then I'd split it but it not something
exceptional -- just standard rules should apply.
Thanks,
Alexey
Powered by blists - more mailing lists