[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a403eb91-c90d-444c-b508-c428a8ef1447@collabora.com>
Date: Fri, 25 Apr 2025 16:41:43 +0500
From: Muhammad Usama Anjum <usama.anjum@...labora.com>
To: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc: usama.anjum@...labora.com, Johannes Berg <johannes@...solutions.net>,
Jeff Johnson <jjohnson@...nel.org>, Jeffrey Hugo <quic_jhugo@...cinc.com>,
Yan Zhen <yanzhen@...o.com>, Youssef Samir <quic_yabdulra@...cinc.com>,
Qiang Yu <quic_qianyu@...cinc.com>, Alex Elder <elder@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Kunwu Chan <chentao@...inos.cn>, kernel@...labora.com, mhi@...ts.linux.dev,
linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-wireless@...r.kernel.org, ath11k@...ts.infradead.org
Subject: Re: [PATCH v2] bus: mhi: host: don't free bhie tables during
suspend/hibernation
On 4/25/25 1:59 PM, Manivannan Sadhasivam wrote:
> On Fri, Apr 25, 2025 at 12:42:38PM +0500, Muhammad Usama Anjum wrote:
>> On 4/25/25 12:32 PM, Manivannan Sadhasivam wrote:
>>> On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote:
>>>> On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote:
>>>>> On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote:
>>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table
>>>>>> allocation. There is a crash report where at resume time, the memory
>>>>>> from the dma doesn't get allocated and MHI fails to re-initialize.
>>>>>> There may be fragmentation of some kind which fails the allocation
>>>>>> call.
>>>>>>
>>>>>
>>>>> If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot
>>>>> workaround that in the device drivers. What is the guarantee that other drivers
>>>>> will also continue to work? Will you go ahead and patch all of them which
>>>>> release memory during suspend?
>>>>>
>>>>> Please investigate why the allocation fails. Even this is not a device issue, so
>>>>> we cannot add quirks :/
>>>> This isn't a platform specific quirk. We are only hitting it because
>>>> there is high memory pressure during suspend/resume. This dma allocation
>>>> failure can happen with memory pressure on any device.
>>>>
>>>
>>> Yes.
>> Thanks for understanding.
>>
>>>
>>>> The purpose of this patch is just to make driver more robust to memory
>>>> pressure during resume.
>>>>
>>>> I'm not sure about MHI. But other drivers already have such patches as
>>>> dma_direct_alloc() is susceptible to failures when memory pressure is
>>>> high. This patch was motivated from ath12k [1] and ath11k [2].
>>>>
>>>
>>> Even if we patch the MHI driver, the issue is going to trip some other driver.
>>> How does the DMA memory goes low during resume? So some other driver is
>>> consuming more than it did during probe()?
>> Think it like this. The first probe happens just after boot. Most of the
>> RAM was empty. Then let's say user launches applications which not only
>> consume entire RAM but also the Swap. The DMA memory area is the first
>> ~4GB on x86_64 (if I'm not mistaken). Now at resume time when we want to
>> allocate memory from dma, it may not be available entirely or because of
>> fragmentation we cannot allocate that much contiguous memory.
>>
>
> Looks like you have a workload that consumes the limited DMA coherent memory.
> Most likely the GPU applications I think.
>
>> In our testing and real world cases, right now only wifi driver is
>> misbehaving. Wifi is also very important. So we are hoping to make wifi
>> driver robust.
>>
>
> Sounds fair. If you want to move forward, please modify the exisiting
> mhi_power_down_keep_dev() to include this partial unprepare as well:
>
> mhi_power_down_unprepare_keep_dev()
>
> Since both APIs are anyway going to be used together, I don't see a need to
> introduce yet another API.
I've looked at usages of mhi_power_down_keep_dev(). Its getting used by
ath12k and ath11k both. We would have to look at ath12k as well before
we can change mhi_power_down_keep_dev(). Unfortunately, I don't have
device using ath12k at hand.
Should we keep this new API or what should we do?
>
> - Mani
>
>>>
>>>> [1]
>>>> https://lore.kernel.org/all/20240419034034.2842-1-quic_bqiang@quicinc.com/
>>>> [2]
>>>> https://lore.kernel.org/all/20220506141448.10340-1-quic_akolli@quicinc.com/
>>>>
>>>> What do you think can be the way forward for this patch?
>>>>
>>>
>>> Let's try first to analyze why the memory pressure happens during suspend. As I
>>> can see, even if we fix the MHI driver, you are likely to hit this issue
>>> somewhere else.>
>>> - Mani
>>>
>>>>>
>>>
>>> [...]
>>>
>>>>> Did you intend to leak this information? If not, please remove it from
>>>>> stacktrace.
>>>> The device isn't private. Its fine.
>>>>
>>>
>>> Okay.
>>>
>>> - Mani
>>>
>>
>>
>> --
>> Regards,
>> Usama
>
--
Regards,
Usama
Powered by blists - more mailing lists