lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df05da7e-fd9d-48a6-bffc-e84749cd8e96@oss.qualcomm.com>
Date: Tue, 16 Sep 2025 12:20:25 +0530
From: Praveen Talari <praveen.talari@....qualcomm.com>
To: Alexey Klimov <alexey.klimov@...aro.org>,
        Praveen Talari <quic_ptalari@...cinc.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jiri Slaby <jirislaby@...nel.org>,
        Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
        linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-serial@...r.kernel.org, psodagud@...cinc.com, djaggi@...cinc.com,
        quic_msavaliy@...cinc.com, quic_vtanuku@...cinc.com,
        quic_arandive@...cinc.com, quic_shazhuss@...cinc.com, krzk@...nel.org
Subject: Re: [PATCH v1] serial: qcom-geni: Fix pinctrl deadlock on runtime
 resume

Hi Alexey

Thank you for your support.

On 9/15/2025 7:55 PM, Praveen Talari wrote:
> Hi Alexey,
> 
> On 9/15/2025 3:09 PM, Alexey Klimov wrote:
>> (removing <quic_mnaresh@...cinc.com> from c/c -- too many mail not 
>> delivered)
>>
>> Hi Praveen,
>>
>> On Mon Sep 15, 2025 at 7:58 AM BST, Praveen Talari wrote:
>>> Hi Alexey,
>>>
>>> Really appreciate you waiting!
>>>
>>> On 9/11/2025 2:30 PM, Alexey Klimov wrote:
>>>> Hi Praveen,
>>>>
>>>> On Thu Sep 11, 2025 at 9:34 AM BST, Praveen Talari wrote:
>>>>> Hi Alexy,
>>>>>
>>>>> Thank you for update.
>>>>>
>>>>> On 9/10/2025 1:35 AM, Alexey Klimov wrote:
>>>>>>
>>>>>> (adding Krzysztof to c/c)
>>>>>>
>>>>>> On Mon Sep 8, 2025 at 6:43 PM BST, Alexey Klimov wrote:
>>>>>>> On Mon Sep 8, 2025 at 5:45 PM BST, Praveen Talari wrote:
>>>>>>>> A deadlock is observed in the qcom_geni_serial driver during 
>>>>>>>> runtime
>>>>>>>> resume. This occurs when the pinctrl subsystem reconfigures 
>>>>>>>> device pins
>>>>>>>> via msm_pinmux_set_mux() while the serial device's interrupt is an
>>>>>>>> active wakeup source. msm_pinmux_set_mux() calls disable_irq() or
>>>>>>>> __synchronize_irq(), conflicting with the active wakeup state and
>>>>>>>> causing the IRQ thread to enter an uninterruptible (D-state) sleep,
>>>>>>>> leading to system instability.
>>>>>>>>
>>>>>>>> The critical call trace leading to the deadlock is:
>>>>>>>>
>>>>>>>>        Call trace:
>>>>>>>>        __switch_to+0xe0/0x120
>>>>>>>>        __schedule+0x39c/0x978
>>>>>>>>        schedule+0x5c/0xf8
>>>>>>>>        __synchronize_irq+0x88/0xb4
>>>>>>>>        disable_irq+0x3c/0x4c
>>>>>>>>        msm_pinmux_set_mux+0x508/0x644
>>>>>>>>        pinmux_enable_setting+0x190/0x2dc
>>>>>>>>        pinctrl_commit_state+0x13c/0x208
>>>>>>>>        pinctrl_pm_select_default_state+0x4c/0xa4
>>>>>>>>        geni_se_resources_on+0xe8/0x154
>>>>>>>>        qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>>>>        pm_generic_runtime_resume+0x2c/0x44
>>>>>>>>        __genpd_runtime_resume+0x30/0x80
>>>>>>>>        genpd_runtime_resume+0x114/0x29c
>>>>>>>>        __rpm_callback+0x48/0x1d8
>>>>>>>>        rpm_callback+0x6c/0x78
>>>>>>>>        rpm_resume+0x530/0x750
>>>>>>>>        __pm_runtime_resume+0x50/0x94
>>>>>>>>        handle_threaded_wake_irq+0x30/0x94
>>>>>>>>        irq_thread_fn+0x2c/xa8
>>>>>>>>        irq_thread+0x160/x248
>>>>>>>>        kthread+0x110/x114
>>>>>>>>        ret_from_fork+0x10/x20
>>>>>>>>
>>>>>>>> To resolve this, explicitly manage the wakeup IRQ state within the
>>>>>>>> runtime suspend/resume callbacks. In the runtime resume 
>>>>>>>> callback, call
>>>>>>>> disable_irq_wake() before enabling resources. This preemptively
>>>>>>>> removes the "wakeup" capability from the IRQ, allowing subsequent
>>>>>>>> interrupt management calls to proceed without conflict. An error 
>>>>>>>> path
>>>>>>>> re-enables the wakeup IRQ if resource enablement fails.
>>>>>>>>
>>>>>>>> Conversely, in runtime suspend, call enable_irq_wake() after 
>>>>>>>> resources
>>>>>>>> are disabled. This ensures the interrupt is configured as a wakeup
>>>>>>>> source only once the device has fully entered its low-power 
>>>>>>>> state. An
>>>>>>>> error path handles disabling the wakeup IRQ if the suspend 
>>>>>>>> operation
>>>>>>>> fails.
>>>>>>>>
>>>>>>>> Fixes: 1afa70632c39 ("serial: qcom-geni: Enable PM runtime for 
>>>>>>>> serial driver")
>>>>>>>> Signed-off-by: Praveen Talari <praveen.talari@....qualcomm.com>
>>>>>>>
>>>>>>> You forgot:
>>>>>>>
>>>>>>> Reported-by: Alexey Klimov <alexey.klimov@...aro.org>
>>>>>>>
>>>>>>> Also, not sure where this change will go, via Greg or Jiri, but 
>>>>>>> ideally
>>>>>>> this should be picked for current -rc cycle since regression is
>>>>>>> introduced during latest merge window.
>>>>>>>
>>>>>>> I also would like to test it on qrb2210 rb1 where this regression is
>>>>>>> reproduciable.

Since I don't have this board, could you kindly validate the new change 
and run a quick test on your end?

diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c 
b/drivers/pinctrl/qcom/pinctrl-msm.c
index 83eb075b6bfa..3d6601dc6fcc 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -215,7 +215,7 @@ static int msm_pinmux_set_mux(struct pinctrl_dev 
*pctldev,
          */
         if (d && i != gpio_func &&
             !test_and_set_bit(d->hwirq, pctrl->disabled_for_mux))
-               disable_irq(irq);
+               disable_irq_nosync(irq);

         raw_spin_lock_irqsave(&pctrl->lock, flags);

Thanks,
Praveen Talari

>>>>>>
>>>>>> It doesn't seem that it fixes the regression on RB1 board:
>>>>>>
>>>>>>     INFO: task kworker/u16:3:50 blocked for more than 120 seconds.
>>>>>>           Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>>>>>> this message.
>>>>>>     task:kworker/u16:3   state:D stack:0     pid:50    tgid:50    
>>>>>> ppid:2      task_flags:0x4208060 flags:0x00000010
>>>>>>     Workqueue: async async_run_entry_fn
>>>>>>     Call trace:
>>>>>>      __switch_to+0xf0/0x1c0 (T)
>>>>>>      __schedule+0x358/0x99c
>>>>>>      schedule+0x34/0x11c
>>>>>>      rpm_resume+0x17c/0x6a0
>>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>>      rpm_resume+0x2c4/0x6a0
>>>>>>      __pm_runtime_resume+0x50/0x9c
>>>>>>      __driver_probe_device+0x58/0x120
>>>>>>      driver_probe_device+0x3c/0x154
>>>>>>      __driver_attach_async_helper+0x4c/0xc0
>>>>>>      async_run_entry_fn+0x34/0xe0
>>>>>>      process_one_work+0x148/0x284
>>>>>>      worker_thread+0x2c4/0x3e0
>>>>>>      kthread+0x12c/0x210
>>>>>>      ret_from_fork+0x10/0x20
>>>>>>     INFO: task irq/92-4a8c000.:79 blocked for more than 120 seconds.
>>>>>>           Not tainted 6.17.0-rc5-00018-g9dd1835ecda5-dirty #13
>>>>>>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>>>>>> this message.
>>>>>>     task:irq/92-4a8c000. state:D stack:0     pid:79    tgid:79    
>>>>>> ppid:2      task_flags:0x208040 flags:0x00000010
>>>>>>     Call trace:
>>>>>>      __switch_to+0xf0/0x1c0 (T)
>>>>>>      __schedule+0x358/0x99c
>>>>>>      schedule+0x34/0x11c
>>>>>>      __synchronize_irq+0x90/0xcc
>>>>>>      disable_irq+0x3c/0x4c
>>>>>>      msm_pinmux_set_mux+0x3b4/0x45c
>>>>>>      pinmux_enable_setting+0x1fc/0x2d8
>>>>>>      pinctrl_commit_state+0xa0/0x260
>>>>>>      pinctrl_pm_select_default_state+0x4c/0xa0
>>>>>>      geni_se_resources_on+0xe8/0x154
>>>>>>      geni_serial_resource_state+0x8c/0xbc
>>>>>>      qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>>>      pm_generic_runtime_resume+0x2c/0x44
>>>>>>      __rpm_callback+0x48/0x1e0
>>>>>>      rpm_callback+0x74/0x80
>>>>>>      rpm_resume+0x3bc/0x6a0
>>>>>>      __pm_runtime_resume+0x50/0x9c
>>>>>>      handle_threaded_wake_irq+0x30/0x80
>>>>>>      irq_thread_fn+0x2c/0xb0
>>>>>>      irq_thread+0x170/0x334
>>>>>>      kthread+0x12c/0x210
>>>>>>      ret_from_fork+0x10/0x20
>>>>>
>>>>> I can see call stack is mostly similar for yours and mine but not
>>>>> completely at initial calls.
>>>>>
>>>>> Yours dump:
>>>>>    >    qcom_geni_serial_runtime_resume+0x3c/0x88
>>>>>    >    pm_generic_runtime_resume+0x2c/0x44
>>>>>    >    __rpm_callback+0x48/0x1e0
>>>>>    >    rpm_callback+0x74/0x80
>>>>>    >    rpm_resume+0x3bc/0x6a0
>>>>>    >    __pm_runtime_resume+0x50/0x9c
>>>>>    >    handle_threaded_wake_irq+0x30/0x80
>>>>>
>>>>> Mine:
>>>>>    >>>      qcom_geni_serial_runtime_resume+0x4c/0x88
>>>>>    >>>      pm_generic_runtime_resume+0x2c/0x44
>>>>>    >>>      __genpd_runtime_resume+0x30/0x80
>>>>>    >>>      genpd_runtime_resume+0x114/0x29c
>>>>>    >>>      __rpm_callback+0x48/0x1d8
>>>>>    >>>      rpm_callback+0x6c/0x78
>>>>>    >>>      rpm_resume+0x530/0x750
>>>>>
>>>>>
>>>>> Can you please share what is DT file for this Board if possible?
>>>>> is there any usecase enabled on this SE instance?
>>>>
>>>> Well, yeah, sorry, I didn't really compared backtraces line to line and
>>>> behaviour was exactly the same. I thought that the purpose was to fix
>>>> the regression reported earlier.
>>>>
>>>> RB1 main dts files are qrb2210-rb1.dts and qcm2290.dtsi.
>>>>
>>>> The similar board RB2 uses qrb4210-rb2.dts and sm4250.dtsi+sm6115.dtsi,
>>>> it is worth checking it as well.
>>>> For testing here I didn't use anything extra (the only change was 
>>>> wifi fix
>>>> from Loic); I tested -master and linux-next usually.
>>>>
>>>> If you can tell me what is SE instance I may be able to answer. But
>>>> as far as I know it is not a part of any infrastructure or CI 
>>>> machinery.
>>>> I just boot the board and see if it works, if it does then I rebuild 
>>>> and
>>>> test my changes (audio).
>>>
>>> I'm actively working on this and experimenting various scenarios with
>>> wakeup. I’ll share the updated patch as soon as possible.
>>>
>>> Should we include fix in V2 or new version(V1) if the fix originates
>>> from a different subsystem(pinctrol)?
>>
>> Wait, I am a bit lost. Are there two regresssions? And is this patch only
>> targets one of the them?
> I am simulated on different target(SC7280) and it is same issue only.
>> Are there two fixes now for different problems?
> The problem is same.
>> If they are not related (independent) then I'd split it but it not 
>> something
>> exceptional -- just standard rules should apply.
> I am fixing from this issue from pinctrol subsystem.
> 
> Please guide me on this.
> Should we include fix in V2 or new version(V1) if the fix originates
> from a different subsystem(pinctrol)?
> 
> Thanks,
> Praveen
>>
>> Thanks,
>> Alexey

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ