lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DCPUJPHR8NUB.1SRB4D7ONSRBY@linaro.org>
Date: Thu, 11 Sep 2025 10:00:27 +0100
From: "Alexey Klimov" <alexey.klimov@...aro.org>
To: "Praveen Talari" <praveen.talari@....qualcomm.com>, "Greg Kroah-Hartman"
 <gregkh@...uxfoundation.org>, "Jiri Slaby" <jirislaby@...nel.org>, "Bryan
 O'Donoghue" <bryan.odonoghue@...aro.org>, "Praveen Talari"
 <quic_ptalari@...cinc.com>, <linux-arm-msm@...r.kernel.org>,
 <linux-kernel@...r.kernel.org>, <linux-serial@...r.kernel.org>
Cc: <psodagud@...cinc.com>, <djaggi@...cinc.com>,
 <quic_msavaliy@...cinc.com>, <quic_vtanuku@...cinc.com>,
 <quic_arandive@...cinc.com>, <quic_mnaresh@...cinc.com>,
 <quic_shazhuss@...cinc.com>, <krzk@...nel.org>
Subject: Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime
 resume

Hi Praveen,

On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
> Hi Alexy,
>
> Thank you for update.
>
> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>> 
>> (adding Krzysztof to c/c)
>> 
>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>> A deadlock is observed in the qcom_geni_serial driver during runtime
>>>> resume. This occurs when the pinctrl subsystem reconfigures device pins
>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>> leading to system instability.
>>>>
>>>> The critical call trace leading to the deadlock is:
>>>>
>>>>      Call trace:
>>>>      __switch_to+0xe0/0x120
>>>>      __schedule+0x39c/0x978
>>>>      schedule+0x5c/0xf8
>>>>      __synchronize_irq+0x88/0xb4
>>>>      disable_irq+0x3c/0x4c
>>>>      msm_pinmux_set_mux+0x508/0x644
>>>>      pinmux_enable_setting+0x190/0x2dc
>>>>      pinctrl_commit_state+0x13c/0x208
>>>>      pinctrl_pm_select_default_state+0x4c/0xa4
>>>>      geni_se_resources_on+0xe8/0x154
>>>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>      pm_generic_runtime_resume+0x2c/0x44
>>>>      __genpd_runtime_resume+0x30/0x80
>>>>      genpd_runtime_resume+0x114/0x29c
>>>>      __rpm_callback+0x48/0x1d8
>>>>      rpm_callback+0x6c/0x78
>>>>      rpm_resume+0x530/0x750
>>>>      __pm_runtime_resume+0x50/0x94
>>>>      handle_threaded_wake_irq+0x30/0x94
>>>>      irq_thread_fn+0x2c/xa8
>>>>      irq_thread+0x160/x248
>>>>      kthread+0x110/x114
>>>>      ret_from_fork+0x10/x20
>>>>
>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>> runtime suspend/resume callbacks. In the runtime resume callback, call
>>>> disable_irq_wake() before enabling resources. This preemptively
>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>> interrupt management calls to proceed without conflict. An error path
>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>
>>>> Conversely, in runtime suspend, call enable_irq_wake() after resources
>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>> source only once the device has fully entered its low-power state. An
>>>> error path handles disabling the wakeup IRQ if the suspend operation
>>>> fails.
>>>>
>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for serial driver")
>>>> Signed-off-by: Praveen Talari <praveen.talari@....qualcomm.com>
>>>
>>> You forgot:
>>>
>>> Reported-by: Alexey Klimov <alexey.klimov@...aro.org>
>>>
>>> Also, not sure where this change will go, via Greg or Jiri, but ideally
>>> this should be picked for current -rc cycle since regression is
>>> introduced during latest merge window.
>>>
>>> I also would like to test it on qrb2210 rb1 where this regression is
>>> reproduciable.
>> 
>> It doesn't seem that it fixes the regression on RB1 board:
>> 
>>   INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    ppid:2      task_flags:0x4208060 flags:0x00000010
>>   Workqueue: async async_run_entry_fn
>>   Call trace:
>>    __switch_to+0xf0/0x1c0 (T)
>>    __schedule+0x358/0x99c
>>    schedule+0x34/0x11c
>>    rpm_resume+0x17c/0x6a0
>>    rpm_resume+0x2c4/0x6a0
>>    rpm_resume+0x2c4/0x6a0
>>    rpm_resume+0x2c4/0x6a0
>>    __pm_runtime_resume+0x50/0x9c
>>    __driver_probe_device+0x58/0x120
>>    driver_probe_device+0x3c/0x154
>>    __driver_attach_async_helper+0x4c/0xc0
>>    async_run_entry_fn+0x34/0xe0
>>    process_one_work+0x148/0x284
>>    worker_thread+0x2c4/0x3e0
>>    kthread+0x12c/0x210
>>    ret_from_fork+0x10/0x20
>>   INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>         Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>   task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    ppid:2      task_flags:0x208040 flags:0x00000010
>>   Call trace:
>>    __switch_to+0xf0/0x1c0 (T)
>>    __schedule+0x358/0x99c
>>    schedule+0x34/0x11c
>>    __synchronize_irq+0x90/0xcc
>>    disable_irq+0x3c/0x4c
>>    msm_pinmux_set_mux+0x3b4/0x45c
>>    pinmux_enable_setting+0x1fc/0x2d8
>>    pinctrl_commit_state+0xa0/0x260
>>    pinctrl_pm_select_default_state+0x4c/0xa0
>>    geni_se_resources_on+0xe8/0x154
>>    geni_serial_resource_state+0x8c/0xbc
>>    qcom_geni_serial_runtime_resume+0x3c/0x88
>>    pm_generic_runtime_resume+0x2c/0x44
>>    __rpm_callback+0x48/0x1e0
>>    rpm_callback+0x74/0x80
>>    rpm_resume+0x3bc/0x6a0
>>    __pm_runtime_resume+0x50/0x9c
>>    handle_threaded_wake_irq+0x30/0x80
>>    irq_thread_fn+0x2c/0xb0
>>    irq_thread+0x170/0x334
>>    kthread+0x12c/0x210
>>    ret_from_fork+0x10/0x20
>
> I can see call stack is mostly similar for yours and mine but not 
> completely at initial calls.
>
> Yours dump:
>  >    qcom_geni_serial_runtime_resume+0x3c/0x88
>  >    pm_generic_runtime_resume+0x2c/0x44
>  >    __rpm_callback+0x48/0x1e0
>  >    rpm_callback+0x74/0x80
>  >    rpm_resume+0x3bc/0x6a0
>  >    __pm_runtime_resume+0x50/0x9c
>  >    handle_threaded_wake_irq+0x30/0x80
>
> Mine:
>  >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>  >>>      pm_generic_runtime_resume+0x2c/0x44
>  >>>      __genpd_runtime_resume+0x30/0x80
>  >>>      genpd_runtime_resume+0x114/0x29c
>  >>>      __rpm_callback+0x48/0x1d8
>  >>>      rpm_callback+0x6c/0x78
>  >>>      rpm_resume+0x530/0x750
>
>
> Can you please share what is DT file for this Board if possible?
> is there any usecase enabled on this SE instance?

Well, yeah, sorry, I didn't really compared backtraces line to line and
behaviour was exactly the same. I thought that the purpose was to fix
the regression reported earlier.

RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.

The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
it is worth checking it as well.
For testing here I didn't use anything extra (the only change was wifi fix
from Loic); I tested -master and linux-next usually.

If you can tell me what is SE instance I may be able to answer. But
as far as I know it is not a part of any infrastructure or CI machinery.
I just boot the board and see if it works, if it does then I rebuild and
test my changes (audio).

Best regards,
Alexey


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ