[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa252da5-a9fc-4c86-ae78-1de1f7c34bb2@gmail.com>
Date: Tue, 3 Feb 2026 11:21:49 +0530
From: Jayasaikiran Banigallapati <bjsaikiran@...il.com>
To: Baochen Qiang <baochen.qiang@....qualcomm.com>, jjohnson@...nel.org,
kvalo@...nel.org
Cc: quic_bqiang@...cinc.com, linux-wireless@...r.kernel.org,
ath12k@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] wifi: ath12k: fix CMA error and MHI state mismatch during
resume
On 2/3/26 11:00, Baochen Qiang wrote:
>
> On 2/3/2026 1:02 PM, Jayasaikiran Banigallapati wrote:
>> On 2/3/26 08:21, Baochen Qiang wrote:
>>> On 2/2/2026 11:17 PM, Saikiran wrote:
>>>> Commit 8d5f4da8d70b ("wifi: ath12k: support suspend/resume") introduced
>>>> system suspend/resume support but caused a critical regression where
>>>> CMA pages are corrupted during resume.
>>>>
>>>> 1. CMA page corruption:
>>>> Calling mhi_unprepare_after_power_down() during suspend (via
>>>> ATH12K_MHI_DEINIT) prematurely frees the fbc_image and rddm_image
>>>> DMA buffers. When these pages are accessed during resume, the kernel
>>>> detects corruption (Bad page state).
>>> How, FBC image and RDDM image get re-allocated at resume, no?
>>>
>>> To clarify, the BUG: Bad page state crash actually occurs during the suspend phase,
>>> specifically when ath12k_mhi_stop() calls mhi_unprepare_after_power_down().
>>>
>>> The stack trace shows the panic happens inside mhi_free_bhie_table() while trying to
>>> free the pages:
>>>
>>> mhi_free_bhie_table+0x50/0xa0 [mhi]
>>> mhi_unprepare_after_power_down+0x30/0x70 [mhi]
>>> ath12k_mhi_stop+0xf8/0x210 [ath12k]
>>> ath12k_core_suspend_late+0x94/0xc0 [ath12k]
>>>
>>> The kernel reports nonzero _refcount when attempting to free the CMA pages (fbc_image/
>>> rddm_image). This suggests that something is still holding a reference to these pages
>>> when DEINIT attempts to free them, causing the kernel to panic before we reach the
>>> resume stage.
> this seems like a bug either in MHI stack or in kernel DMA/MM subsystems, rather than in
> ath12k
>
>>> Since the pages cannot be safely freed during suspend, skipping DEINIT (and using
>>> MHI_POWER_OFF_KEEP_DEV) avoids this invalid free operation. This also aligns with the
>>> existing comment in ath12k_mhi_stop which suggests using mhi_power_down_keep_dev() for
>>> suspend.
> first of all, this is a workaround rather than fix. Ideally we should try to root cause
> the issue and fix it in the right way.
The original comment in existing code:
/* During suspend we need to use mhi_power_down_keep_dev()
* workaround, otherwise ath12k_core_resume() will timeout
* during resume.
*/
This patch aligns the code with this existing intent. The driver was
previously
calling DEINIT (and freeing resources) despite the comment advising to
use keep_dev.
If the intention of the driver authors was to use keep_dev for suspend,
then my understanding is DEINIT is incorrect here (Correct me if I am
wrong)
regardless of the underlying MM behavior.
>
> Secondly the workaround here seems problematic: you skip INIT druing resume. However note
> several hardware registers need to be re-programmed during this stage, how could the
> target work if its power is cutoff during suspend and the register context is not restored
> during resume?
In my testing, WiFi functionality was fully restored after resume.
The device associates and passes traffic immediately.
My understanding is that:
ATH12K_MHI_INIT primarily handles host memory allocation (which we
preserved by skipping DEINIT).
ATH12K_MHI_POWER_ON calls mhi_sync_power_up(). This function triggers
the MHI state machine,
which handles the necessary BHI/BHIE programming and firmware download
(SBL) sequence.
Since mhi_sync_power_up() is still called during resume, the target is
correctly re-initialized and
registers are programmed, even if we skip the redundant host memory
allocation step (INIT).
Thanks & Regards,
Saikiran
Powered by blists - more mailing lists