[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9eba0149-290d-4010-8791-d4d8d8be3786@collabora.com>
Date: Mon, 7 Jul 2025 18:11:21 +0500
From: Muhammad Usama Anjum <usama.anjum@...labora.com>
To: Baochen Qiang <baochen.qiang@....qualcomm.com>,
Manivannan Sadhasivam <mani@...nel.org>, Jeff Johnson <jjohnson@...nel.org>,
Jeff Hugo <jeff.hugo@....qualcomm.com>,
Youssef Samir <quic_yabdulra@...cinc.com>,
Matthew Leung <quic_mattleun@...cinc.com>, Yan Zhen <yanzhen@...o.com>,
Alexander Wilhelm <alexander.wilhelm@...termo.com>,
Alex Elder <elder@...nel.org>, Kunwu Chan <chentao@...inos.cn>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Siddartha Mohanadoss <smohanad@...eaurora.org>,
Sujeev Dias <sdias@...eaurora.org>, Julia Lawall <julia.lawall@...6.fr>,
John Crispin <john@...ozen.org>, Muna Sinada <quic_msinada@...cinc.com>,
Venkateswara Naralasetty <quic_vnaralas@...cinc.com>,
Maharaja Kennadyrajan <quic_mkenna@...cinc.com>, mhi@...ts.linux.dev,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-wireless@...r.kernel.org, ath11k@...ts.infradead.org
Cc: kernel@...labora.com
Subject: Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
On 7/7/25 2:00 PM, Baochen Qiang wrote:
>
>
> On 7/7/2025 4:19 PM, Muhammad Usama Anjum wrote:
>> On 7/3/25 6:59 AM, Baochen Qiang wrote:
>>>
>>>
>>> On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote:
>>>> Thank you for reviewing.
>>>>
>>>> On 7/2/25 8:50 AM, Baochen Qiang wrote:
>>>>>
>>>>>
>>>>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote:
>>>>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>>>>>> deallocated and there is high possibility that we'll not be able to get
>>>>>> the same memory allocated from dma when there is high memory pressure.
>>>>>>
>>>>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>>>>>
>>>>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
>>>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@...labora.com>
>>>>>> ---
>>>>>> drivers/net/wireless/ath/ath11k/core.c | 5 -----
>>>>>> 1 file changed, 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>>>> mutex_unlock(&ab->core_lock);
>>>>>>
>>>>>> ath11k_dp_free(ab);
>>>>>> - ath11k_hal_srng_deinit(ab);
>>>>>>
>>>>>> ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>>>>
>>>>>> - ret = ath11k_hal_srng_init(ab);
>>>>>> - if (ret)
>>>>>> - return ret;
>>>>>> -
>>>>>
>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
>>>> in resume handler? So when device wakes up, its state is already correct.
>>>>
>>>
>>> Hmm... not quite understand your question. Can you elaborate?
>>
>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example:
>> * The driver has been loaded and has been working fine.
>> * The user called suspend. So all devices would be suspended.
>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states
>> like hal.
>> * When the device would resume after long time, the hal would have the correct state
>> already. So we'll not need to deinit and init again.
>
> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover
> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be
> done for reset/recover as well.
It makes sense.
So clearing the hal structure completely other than ab->hal.srn_config doesn't seem
right. I've also tested it and it crashes the whole system.
On contrary, with only the current patch applied, there is no abnormality.
num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0,
driver still keeps on working.
ab->hal.num_shadow_reg_configured = 0;
ab->hal.avail_blk_resource = 0;
ab->hal.current_blk_index = 0;
As you have suggested setting these 3 to zero, is there any other variable in hal
structure which should be set to zero?
>
>>
>>>
>>>> I'm not sure why it worked every time when I tested it on my device.
>>>>
>>>>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the
>>>>> _init() needs to be kept as the later operation needs a clean state of them.
>>>> So should we just memset these 3?
>>>
>>> more than them I think. We need to take care of all entries in hal since current code is
>>> memset them all.
>> I see. So memset the whole ath11k hal structure other than the config.
>>
>>>
>>>>
>>>>
>>>>>
>>>>>> clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>>>>>>
>>>>>> ret = ath11k_core_qmi_firmware_ready(ab);
>>>>>
>>>>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails,
>>>>> making it a little odd since there is no _init() anymore with this change, though this is
>>>>> the way of current logic (I mean the hal is currently deinit in the error path).
>>>>>
>>>>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module.
>>>>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues
>>>>> (at least I see a double free of hal->srng_config). But this is another topic which can be
>>>>> fixed in a separate patch.
>>>>
>>>> I don't think this is the problem as HAL is already initialized when before the system has
>>>> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or
>>>> maybe I've missed something?
>>>
>>> Yeah, it is OK in normal path. However in error path we face issues.
>> For example:
>> * When driver was initialized the first time, the hal was init.
>> * Then system was suspended and hal doesn't get deinit.
>> * At system resume, the hal is already init. We can memset some status variables. But its
>> initialized already from the first time. (considering this patch that deinit/init have
>> been removed)
>> * So at this stage if some error occurs and we can call the deinit in the error paths.
>>
>>
>
Powered by blists - more mailing lists