[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMl1YGt91p1rAaG6@trex>
Date: Tue, 16 Sep 2025 16:34:08 +0200
From: Jorge Ramirez <jorge.ramirez@....qualcomm.com>
To: Alexey Klimov <alexey.klimov@...aro.org>
Cc: Praveen Talari <praveen.talari@....qualcomm.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jirislaby@...nel.org>,
Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
Praveen Talari <quic_ptalari@...cinc.com>,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-serial@...r.kernel.org, psodagud@...cinc.com, djaggi@...cinc.com,
quic_msavaliy@...cinc.com, quic_vtanuku@...cinc.com,
quic_arandive@...cinc.com, quic_mnaresh@...cinc.com,
quic_shazhuss@...cinc.com, krzk@...nel.org
Subject: Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime
resume
On 11/09/25 10:00:27, Alexey Klimov wrote:
> Hi Praveen,
>
> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> > Hi Alexy,
> >
> > Thank you for update.
> >
> > On 9/10/2025 1:35 AM, Alexey Klimov wrote:
> >>
> >> (adding Krzysztof to c/c)
> >>
> >> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
> >>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
> >>>> A deadlock is observed in the qcom_geni_serial driver during runtime
> >>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
> >>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
> >>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
> >>>> __synchronize_irq(), conflicting with the active wakeup state and
> >>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
> >>>> leading to system instability.
> >>>>
> >>>> The critical call trace leading to the deadlock is:
> >>>>
> >>>> Call trace:
> >>>> __switch_to+0xe0/0x120
> >>>> __schedule+0x39c/0x978
> >>>> schedule+0x5c/0xf8
> >>>> __synchronize_irq+0x88/0xb4
> >>>> disable_irq+0x3c/0x4c
> >>>> msm_pinmux_set_mux+0x508/0x644
> >>>> pinmux_enable_setting+0x190/0x2dc
> >>>> pinctrl_commit_state+0x13c/0x208
> >>>> pinctrl_pm_select_default_state+0x4c/0xa4
> >>>> geni_se_resources_on+0xe8/0x154
> >>>> qcom_geni_serial_runtime_resume+0x4c/0x88
> >>>> pm_generic_runtime_resume+0x2c/0x44
> >>>> __genpd_runtime_resume+0x30/0x80
> >>>> genpd_runtime_resume+0x114/0x29c
> >>>> __rpm_callback+0x48/0x1d8
> >>>> rpm_callback+0x6c/0x78
> >>>> rpm_resume+0x530/0x750
> >>>> __pm_runtime_resume+0x50/0x94
> >>>> handle_threaded_wake_irq+0x30/0x94
> >>>> irq_thread_fn+0x2c/xa8
> >>>> irq_thread+0x160/x248
> >>>> kthread+0x110/x114
> >>>> ret_from_fork+0x10/x20
> >>>>
> >>>> To resolve this, explicitly manage the wakeup IRQ state within the
> >>>> runtime suspend/resume callbacks. In the runtime resume callback, call
> >>>> disable_irq_wake() before enabling resources. This preemptively
> >>>> removes the "wakeup" capability from the IRQ, allowing subsequent
> >>>> interrupt management calls to proceed without conflict. An error path
> >>>> re-enables the wakeup IRQ if resource enablement fails.
> >>>>
> >>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
> >>>> are disabled. This ensures the interrupt is configured as a wakeup
> >>>> source only once the device has fully entered its low-power state. An
> >>>> error path handles disabling the wakeup IRQ if the suspend operation
> >>>> fails.
> >>>>
> >>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
> >>>> Signed-off-by: Praveen Talari <praveen.talari@....qualcomm.com>
> >>>
> >>> You forgot:
> >>>
> >>> Reported-by: Alexey Klimov <alexey.klimov@...aro.org>
> >>>
> >>> Also, not sure where this change will go, via Greg or Jiri, but ideally
> >>> this should be picked for current -rc cycle since regression is
> >>> introduced during latest merge window.
> >>>
> >>> I also would like to test it on qrb2210 rb1 where this regression is
> >>> reproduciable.
> >>
> >> It doesn't seem that it fixes the regression on RB1 board:
> >>
> >> INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
> >> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> task:kworker/u16:3 state:D stack:0 pid:50 tgid:50 ppid:2 task_flags:0x4208060 flags:0x00000010
> >> Workqueue: async async_run_entry_fn
> >> Call trace:
> >> __switch_to+0xf0/0x1c0 (T)
> >> __schedule+0x358/0x99c
> >> schedule+0x34/0x11c
> >> rpm_resume+0x17c/0x6a0
> >> rpm_resume+0x2c4/0x6a0
> >> rpm_resume+0x2c4/0x6a0
> >> rpm_resume+0x2c4/0x6a0
> >> __pm_runtime_resume+0x50/0x9c
> >> __driver_probe_device+0x58/0x120
> >> driver_probe_device+0x3c/0x154
> >> __driver_attach_async_helper+0x4c/0xc0
> >> async_run_entry_fn+0x34/0xe0
> >> process_one_work+0x148/0x284
> >> worker_thread+0x2c4/0x3e0
> >> kthread+0x12c/0x210
> >> ret_from_fork+0x10/0x20
> >> INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
> >> Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> task:irq/92-4a8c000. state:D stack:0 pid:79 tgid:79 ppid:2 task_flags:0x208040 flags:0x00000010
> >> Call trace:
> >> __switch_to+0xf0/0x1c0 (T)
> >> __schedule+0x358/0x99c
> >> schedule+0x34/0x11c
> >> __synchronize_irq+0x90/0xcc
> >> disable_irq+0x3c/0x4c
> >> msm_pinmux_set_mux+0x3b4/0x45c
> >> pinmux_enable_setting+0x1fc/0x2d8
> >> pinctrl_commit_state+0xa0/0x260
> >> pinctrl_pm_select_default_state+0x4c/0xa0
> >> geni_se_resources_on+0xe8/0x154
> >> geni_serial_resource_state+0x8c/0xbc
> >> qcom_geni_serial_runtime_resume+0x3c/0x88
> >> pm_generic_runtime_resume+0x2c/0x44
> >> __rpm_callback+0x48/0x1e0
> >> rpm_callback+0x74/0x80
> >> rpm_resume+0x3bc/0x6a0
> >> __pm_runtime_resume+0x50/0x9c
> >> handle_threaded_wake_irq+0x30/0x80
> >> irq_thread_fn+0x2c/0xb0
> >> irq_thread+0x170/0x334
> >> kthread+0x12c/0x210
> >> ret_from_fork+0x10/0x20
> >
> > I can see call stack is mostly similar for yours and mine but not
> > completely at initial calls.
> >
> > Yours dump:
> > > qcom_geni_serial_runtime_resume+0x3c/0x88
> > > pm_generic_runtime_resume+0x2c/0x44
> > > __rpm_callback+0x48/0x1e0
> > > rpm_callback+0x74/0x80
> > > rpm_resume+0x3bc/0x6a0
> > > __pm_runtime_resume+0x50/0x9c
> > > handle_threaded_wake_irq+0x30/0x80
> >
> > Mine:
> > >>> qcom_geni_serial_runtime_resume+0x4c/0x88
> > >>> pm_generic_runtime_resume+0x2c/0x44
> > >>> __genpd_runtime_resume+0x30/0x80
> > >>> genpd_runtime_resume+0x114/0x29c
> > >>> __rpm_callback+0x48/0x1d8
> > >>> rpm_callback+0x6c/0x78
> > >>> rpm_resume+0x530/0x750
> >
> >
> > Can you please share what is DT file for this Board if possible?
> > is there any usecase enabled on this SE instance?
>
> Well, yeah, sorry, I didn't really compared backtraces line to line and
> behaviour was exactly the same. I thought that the purpose was to fix
> the regression reported earlier.
>
> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>
> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
> it is worth checking it as well.
> For testing here I didn't use anything extra (the only change was wifi fix
> from Loic); I tested -master and linux-next usually.
>
> If you can tell me what is SE instance I may be able to answer. But
> as far as I know it is not a part of any infrastructure or CI machinery.
> I just boot the board and see if it works, if it does then I rebuild and
> test my changes (audio).
>
> Best regards,
> Alexey
>
will there be a fix any time soon Praveen? reverting "serial: qcom-geni:
Enable PM runtime for serial driver" does fix the problem on RB1.
Otherwise I suggest that we revert this commit on linux-next.
Powered by blists - more mailing lists