lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0dfa628a-b7b2-4aba-885b-a28b7a9b66d4@collabora.com>
Date: Thu, 24 Apr 2025 09:45:19 +0500
From: Muhammad Usama Anjum <usama.anjum@...labora.com>
To: Jeff Johnson <jeff.johnson@....qualcomm.com>,
 Baochen Qiang <quic_bqiang@...cinc.com>, Jeff Johnson <jjohnson@...nel.org>,
 Kalle Valo <kvalo@...nel.org>, Anilkumar Kolli <quic_akolli@...cinc.com>
Cc: kernel@...labora.com, linux-wireless@...r.kernel.org,
 ath11k@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] wifi: ath11k: Fix memory reuse logic

On 4/23/25 7:28 PM, Jeff Johnson wrote:
> On 4/23/2025 1:15 AM, Baochen Qiang wrote:
>> On 4/23/2025 2:59 PM, Muhammad Usama Anjum wrote:
> 
> For starters, can we make the subject a bit more specific, i.e.
> Fix MHI target memory reuse logic
Will do.

> 
>>> Firmware requests 2 segments at first. The first segment is of 6799360
>>> whose allocation fails due to dma remapping not available. The success
> 
> the memory allocation succeeds but the remapping fails? that seems like some
> specific information that actually isn't very useful. From the perspective of
> the driver all we really care about is that dma_alloc_coherent() fails, not
> why it fails.
I'm explaining the code flow here to define what's wrong. If I hadn't
mentioned this, I would have been asked how this bug gets triggered.

> 
>>> is returned to firmware. Then firmware asks for 22 smaller segments
>>> instead of 2 big ones. Those get allocated successfully. At suspend/
>>> hibernation time, these segments aren't freed as they will be reused
>>> by firmware after resuming.
>>>
>>> After resume the firmware asks for 2 segments again with first segment
>>> of 6799360 and vaddr is not NULL. We compare the type and size with
>>
>> suggest to rephrase as:
>>
>> After resume the firmware asks for 2 segments again with first segment
>> of 6799360. Since chunk->vaddr is not NULL, we compare the type and size with
>>
>>> previous type and size to know if it can be reused or not.
>>> Unfortunately, we detect that it cannot be reuses and this first smaller
>>
>> s/reuses/reused/
>>
>>> segment is freed. Then we continue to allocate 6799360 size memory from
>>> dma which fails and we call ath11k_qmi_free_target_mem_chunk() which
>>
>> it is odd with 'from dma' ...
>>
>> I think just say 'allocate 6799360 size memory' is good enough.
>>
>>> frees the second smaller segment as well. Later success is returned to
>>> firmware which asks for 22 smaller segments again. But as we had freed 2
>>> segments already, we'll allocate the first 2 new smaller segments again
>>> and reuse the remaining 20. Hence we aren't reusing the all 22 small
>>> segments, but only 20.
>>>
>>> This patch is correcting the skip logic when vaddr is set, but size/type
> 
> see
> <https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes>
> 
> Specifically:
> Describe your changes in imperative mood, e.g. “make xyzzy do frotz” instead
> of “[This patch] makes xyzzy do frotz” or “[I] changed xyzzy to do frotz”, as
> if you are giving orders to the codebase to change its behaviour.
> 
>>> don't match. In this case, we should use the same skip and success logic
> 
> who is "we"? the driver is performing the action. As part of changing the text
> to be in imperative mood this should go away.
> 
>>> as used when dma_alloc_coherent fails without freeing the memory area.
> 
> add () to function references
> 
Will do

>>>
>>> We had got reports that memory allocation in this function failed at
>>> resume [1] which made us debug why the reuse logic is wrong. Those
>>
>> The link is just v1 of this patch, it is not the report. If there is no public report,
>> just don't mention it.
>>
>>> failures weren't because of the bigger chunk allocation failure as they
>>> are skipped. Rather these failures were because of smaller chunk
>>> allocation failures. This issue is in the kernel side as because of
>>> memory pressure or fragmentation, the dma memory allocation fails. This
>>> patch fixes freeing and allocation of 2 smaller chunks.
>>
>> I know you are describing why you start to debug this issue. But I don't think it is
>> needed in the commit message. No matter kernel allocation fails or succeeds, the issue is
>> there, and the description above is sufficient to make the issue clear.
> 
> Concur with this.
> 
>>
>>>
>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> blank line needed.
>>
>>> [1] https://lore.kernel.org/all/b30bc7f6-845d-4f9d-967e-c04a2b5f13f5@collabora.com
>>>
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@...labora.com>
>>> ---
>>> Changes since v1:
>>> - Update description
>>>
>>> Fixes: 5962f370ce41 ("ath11k: Reuse the available memory after firmware reload")
>>> I think we should keep fixes tag as ^ claimed that its adding reuse
>>> support. But it left a bug in reuse which we are fixing.
>>>
>>> Feel free to add it or leave it as it is.
>>
>> Jeff, what do you think?
> 
> I would drop the tag. As I understand it, the issue described is due to memory
> fragmentation/starvation, and not due to the fact that ath11k does not
> actually reuse the first two segments.
Ath11k not reusing the segment is also an logical issue. The patch
adding reuse logic claims that it added reuse logic. But it had missed
this use case.

Anyways not a blocker for me. I just wanted this patch to get backported
to stable kernels as its fixing the reuse logic and of course issue
arising from fragmentation.

> 
>>
>>> ---
>>>  drivers/net/wireless/ath/ath11k/qmi.c | 10 +++++++++-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/ath11k/qmi.c b/drivers/net/wireless/ath/ath11k/qmi.c
>>> index 47b9d4126d3a9..3c26f4dcf5d29 100644
>>> --- a/drivers/net/wireless/ath/ath11k/qmi.c
>>> +++ b/drivers/net/wireless/ath/ath11k/qmi.c
>>> @@ -1990,8 +1990,16 @@ static int ath11k_qmi_alloc_target_mem_chunk(struct ath11k_base *ab)
>>>  		 */
>>>  		if (chunk->vaddr) {
>>>  			if (chunk->prev_type == chunk->type &&
>>> -			    chunk->prev_size == chunk->size)
>>> +			    chunk->prev_size == chunk->size) {
> 
> please don't change this...

> 
>>>  				continue;
>>> +			} else if (ab->qmi.mem_seg_count <= ATH11K_QMI_FW_MEM_REQ_SEGMENT_CNT) {
> 
> ...instead just use if here. we normally don't use else after a statement that
> changes the code flow (return, goto, continue, etc.)
Will send v3.

> 
> 
>>> +				ath11k_dbg(ab, ATH11K_DBG_QMI,
>>> +					   "size/type mismatch (current %d %u) (prev %d %u), try later with small size\n",
>>> +					    chunk->size, chunk->type,
>>> +					    chunk->prev_size, chunk->prev_type);
>>> +				ab->qmi.target_mem_delayed = true;
>>> +				return 0;
>>> +			}
>>>  
>>>  			/* cannot reuse the existing chunk */
>>>  			dma_free_coherent(ab->dev, chunk->prev_size,
>>
>>
> 


-- 
Regards,
Usama

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ