lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <547a2c5c-fa11-4110-ae6f-17c12d6809f2@oss.qualcomm.com>
Date: Tue, 3 Feb 2026 14:08:24 +0800
From: Baochen Qiang <baochen.qiang@....qualcomm.com>
To: Jayasaikiran Banigallapati <bjsaikiran@...il.com>,
        Baochen Qiang <baochen.qiang@....qualcomm.com>, jjohnson@...nel.org,
        kvalo@...nel.org
Cc: linux-wireless@...r.kernel.org, ath12k@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] wifi: ath12k: fix CMA error and MHI state mismatch during
 resume



On 2/3/2026 1:51 PM, Jayasaikiran Banigallapati wrote:
> 
> On 2/3/26 11:00, Baochen Qiang wrote:
>>
>> On 2/3/2026 1:02 PM, Jayasaikiran Banigallapati wrote:
>>> On 2/3/26 08:21, Baochen Qiang wrote:
>>>> On 2/2/2026 11:17 PM, Saikiran wrote:
>>>>> Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
>>>>> system suspend/resume support but caused a critical regression where
>>>>> CMA pages are corrupted during resume.
>>>>>
>>>>> 1. CMA page corruption:
>>>>>      Calling mhi_unprepare_after_power_down() during suspend (via
>>>>>      ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
>>>>>      DMA buffers. When these pages are accessed during resume, the kernel
>>>>>      detects corruption (Bad page state).
>>>> How, FBC image and RDDM image get re-allocated at resume, no?
>>>>
>>>> To clarify, the BUG: Bad page state crash actually occurs during the suspend phase,
>>>> specifically when ath12k_mhi_stop() calls mhi_unprepare_after_power_down().
>>>>
>>>> The stack trace shows the panic happens inside mhi_free_bhie_table() while trying to
>>>> free the pages:
>>>>
>>>>   mhi_free_bhie_table+0x50/0xa0 [mhi]
>>>>   mhi_unprepare_after_power_down+0x30/0x70 [mhi]
>>>>   ath12k_mhi_stop+0xf8/0x210 [ath12k]
>>>>   ath12k_core_suspend_late+0x94/0xc0 [ath12k]
>>>>
>>>> The kernel reports nonzero _refcount when attempting to free the CMA pages (fbc_image/
>>>> rddm_image). This suggests that something is still holding a reference to these pages
>>>> when DEINIT attempts to free them, causing the kernel to panic before we reach the
>>>> resume stage.
>> this seems like a bug either in MHI stack or in kernel DMA/MM subsystems, rather than in
>> ath12k
>>
>>>> Since the pages cannot be safely freed during suspend, skipping DEINIT (and using
>>>> MHI_POWER_OFF_KEEP_DEV) avoids this invalid free operation. This also aligns with the
>>>> existing comment in ath12k_mhi_stop which suggests using mhi_power_down_keep_dev() for
>>>> suspend.
>> first of all, this is a workaround rather than fix. Ideally we should try to root cause
>> the issue and fix it in the right way.
> 
> 
> The original comment in existing code:
> 
> 
> /* During suspend we need to use mhi_power_down_keep_dev()
>  * workaround, otherwise ath12k_core_resume() will timeout
>  * during resume.
>  */
> 
> This patch aligns the code with this existing intent. The driver was previously
> 
> calling DEINIT (and freeing resources) despite the comment advising to use keep_dev.
> 
> If the intention of the driver authors was to use keep_dev for suspend,
> 
> then my understanding is DEINIT is incorrect here (Correct me if I am wrong)
> 
> regardless of the underlying MM behavior.

keep_dev means not to destroy the mhi_device instance while going to suspend. The purpose
is to get rid of the PROBE_DEFER problem in MHI during resume. You may want to check the
upstream discussion to learn about the history.

> 
>>
>> Secondly the workaround here seems problematic: you skip INIT druing resume. However note
>> several hardware registers need to be re-programmed during this stage, how could the
>> target work if its power is cutoff during suspend and the register context is not restored
>> during resume?
> 
> 
> In my testing, WiFi functionality was fully restored after resume.
> 
> The device associates and passes traffic immediately.

I can imagine two reasons: either WLAN target's power is not cutoff during suspend, or you
did not get into the issue scenario. For the latter, I mean you may need to trigger a
firmware crash to see if RDDM works normally, since you skip RDDM register context restore
during resume.

> 
> My understanding is that:
> 
> ATH12K_MHI_INIT primarily handles host memory allocation (which we preserved by skipping
> DEINIT).

In addition to memory allocation, there is also register programming. See
mhi_prepare_for_power_up() and mhi_rddm_prepare().

> 
> ATH12K_MHI_POWER_ON calls mhi_sync_power_up(). This function triggers the MHI state machine,
> 
> which handles the necessary BHI/BHIE programming and firmware download (SBL) sequence.
> 
> Since mhi_sync_power_up() is still called during resume, the target is correctly re-
> initialized and
> 
> registers are programmed, even if we skip the redundant host memory allocation step (INIT).
> 
> Thanks & Regards,
> Saikiran
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ