[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1fc423d1-762c-4e0f-8cf1-d4610c547596@collabora.com>
Date: Wed, 23 Apr 2025 11:26:18 +0500
From: Muhammad Usama Anjum <usama.anjum@...labora.com>
To: Baochen Qiang <quic_bqiang@...cinc.com>,
Jeff Johnson <jjohnson@...nel.org>, Kalle Valo <kvalo@...nel.org>,
Anilkumar Kolli <quic_akolli@...cinc.com>
Cc: kernel@...labora.com, linux-wireless@...r.kernel.org,
ath11k@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] wifi: ath11k: Fix memory reuse logic
On 4/22/25 1:01 PM, Baochen Qiang wrote:
>
>
> On 4/22/2025 3:46 PM, Muhammad Usama Anjum wrote:
>> Hi,
>>
>> Thank you for excellent review.
>>
>> On 4/22/25 7:15 AM, Baochen Qiang wrote:
>>>
>>>
>>> On 4/18/2025 8:09 PM, Muhammad Usama Anjum wrote:
>>>> Firmware requests 2 segments at first. 1st segment is of 6799360 whose
>>>> allocation fails and we return success to firmware. Then firmware asks
>>>
>>> Host won't fail in case DMA remapping is enabled. Better to rephrase to make it clear that
>>> the big segment allocation fails in case DMA remapping is not working, usually due to
>>> IOMMU not present or any necessary kernel config not enabled.
>> IOMMU is turned off. I'll make description better.
>>
>>>
>>>> for 22 smaller segments. Those get allocated. At suspend/hibernation
>>>> time, these segments aren't freed as they are reused by firmware.
>>>>
>>>> After resume the firmware asks for 2 segments again with first segment
>>>> of 6799360 and with same vaddr of the first smaller segment which we had
>>>
>>> Not follow you here. What do you mean by 'same vaddr'? firmware does not care about vaddr
>>> at all.
>> So we get request to allocate memory of size = 6799360 and vaddr =
>> 0xABC). We fail it. Then we get request to allocate memory of size =
>> 500000 and vaddr is same 0xABC which gets allocated successfully.
>>
>> When we resume, firmware asks again for 6799360 with 0xABC vaddr even
>> though we had allocated memory of 500000 size at 0xABC. I'm referring to
>> this vaddr that its same.
>
> OK, get your point. But like I said, firmware doesn't case about vaddr, so it is not
> asking for a 'same vaddr'.
>
> IMO just mentioning vaddr is not NULL is sufficient.
Okay. I'll update the description to avoid confusion.
>
>>
>>>
>>>> allocated. Hence vaddr isn't NULL and we compare the type and size if it
>>>> can be reused. Unfornately, we detect that we cannot reuse it and this
>>>
>>> s/Unfornately/Unfortunately/
>>>
>>>> first smaller segment is freed. Then we continue to allocate 6799360 size
>>>> memory from dma which fails and we call ath11k_qmi_free_target_mem_chunk()
>>>> which frees the second smaller segment as well. Later success is returned
>>>> to firmware which asks for 22 smaller segments again. But as we had freed
>>>> 2 segments already, we'll allocate the first 2 new segments again and
>>>> reuse the remaining 20.
>>>>
>>>> This patch is correctiong the skip logic when vaddr is set, but size/type
>>>
>>> s/correctiong/correcting/
>>>
>>>> don't match. In this case, we should use the same skip and success logic
>>>> as used when dma_alloc_coherent fails without freeing the memory area.
>>>>
>>>> We had got reports that memory allocation in this function failed at
>>>
>>> any public link to the report?
>> There's no public report. I've attached the logs. You'll find following
>> error logs in it:
>>
>> ath11k_pci 0000:03:00.0: failed to allocate dma memory for qmi (524288 B
>> type 1)
>> ath11k_pci 0000:03:00.0: failed to allocate qmi target memory: -22
>>
>>
>>>
>>>> resume which made us debug why the reuse logic is wrong. Those failures
>>>> weren't because of the bigger chunk allocation failure as they are
>>>> skipped. Rather these failures were because of smaller chunk allocation
>>>> failures. This patch fixes freeing and allocation of 2 smaller chunks.
>>>
>>> any you saying kernels fail to alloc a smaller chunk? why? is system memory exhausted or
>>> too fragmented?
>> Yes, the smaller chunk doesn't get allocated. I've not been able to
>> reproduce it on my setup. Both system memory exhaustion and
>> fragmentation are the suspects.
>
> so it is kernel failing to allocate the buffer, not any issue in ath12k leading to this.
> Please help make this clear to avoid confusion.
We caught the bug as kernel was unable to allocate memory at resume.
Later found out that with the optimization in ath11k, we shouldn't be
trying to allocate memory in the first place. That's why I've sent this
patch.
Let me update the description and send v2.
--
Regards,
Usama
Powered by blists - more mailing lists