[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c1fdbd16-4197-4a2e-a33d-6b29cc285f0a@oss.qualcomm.com>
Date: Tue, 22 Apr 2025 08:22:47 -0600
From: Jeff Hugo <jeff.hugo@....qualcomm.com>
To: Muhammad Usama Anjum <usama.anjum@...labora.com>,
Krishna Chaitanya Chundru <quic_krichai@...cinc.com>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
Johannes Berg <johannes@...solutions.net>,
Jeff Johnson
<jjohnson@...nel.org>,
Jeffrey Hugo <quic_jhugo@...cinc.com>, Yan Zhen <yanzhen@...o.com>,
Youssef Samir <quic_yabdulra@...cinc.com>,
Qiang Yu <quic_qianyu@...cinc.com>, Alex Elder <elder@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Kunwu Chan <chentao@...inos.cn>
Cc: kernel@...labora.com, mhi@...ts.linux.dev, linux-arm-msm@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-wireless@...r.kernel.org,
ath11k@...ts.infradead.org
Subject: Re: [PATCH v2] bus: mhi: host: don't free bhie tables during
suspend/hibernation
On 4/22/2025 1:23 AM, Muhammad Usama Anjum wrote:
> On 4/18/25 7:08 PM, Jeff Hugo wrote:
>> On 4/18/2025 2:10 AM, Muhammad Usama Anjum wrote:
>>> On 4/14/25 7:14 PM, Jeff Hugo wrote:
>>>> On 4/14/2025 1:32 AM, Muhammad Usama Anjum wrote:
>>>>> On 4/12/25 6:22 AM, Krishna Chaitanya Chundru wrote:
>>>>>>
>>>>>> On 4/12/2025 12:02 AM, Muhammad Usama Anjum wrote:
>>>>>>> On 4/11/25 1:39 PM, Krishna Chaitanya Chundru wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 4/11/2025 12:32 PM, Muhammad Usama Anjum wrote:
>>>>>>>>> On 4/11/25 8:37 AM, Krishna Chaitanya Chundru wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 4/10/2025 8:26 PM, Muhammad Usama Anjum wrote:
>>>>>>>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table
>>>>>>>>>>> allocation. There is a crash report where at resume time, the
>>>>>>>>>>> memory
>>>>>>>>>>> from the dma doesn't get allocated and MHI fails to re-
>>>>>>>>>>> initialize.
>>>>>>>>>>> There may be fragmentation of some kind which fails the
>>>>>>>>>>> allocation
>>>>>>>>>>> call.
>>>>>>>>>>>
>>>>>>>>>>> To fix it, don't free the memory at power down during suspend /
>>>>>>>>>>> hibernation. Instead, use the same allocated memory again after
>>>>>>>>>>> every
>>>>>>>>>>> resume / hibernation. This patch has been tested with resume and
>>>>>>>>>>> hibernation both.
>>>>>>>>>>>
>>>>>>>>>>> The rddm is of constant size for a given hardware. While the
>>>>>>>>>>> fbc_image
>>>>>>>>>>> size depends on the firmware. If the firmware changes, we'll
>>>>>>>>>>> free and
>>>>>>>>>> If firmware image will change between suspend and resume ?
>>>>>>>>> Yes, correct.
>>>>>>>>>
>>>>>>>> why the firmware image size will change between suspend & resume?
>>>>>>>> who will update the firmware image after bootup?
>>>>>>>> It is not expected behaviour.
>>>>>>> I was trying to research if the firmware can change or not. I've not
>>>>>>> found any documentation on it.
>>>>>>>
>>>>>>> If the firmare is updated in filesystem before suspend/hibernate,
>>>>>>> would
>>>>>>> the new firwmare be loaded the next time kernel resumes as the older
>>>>>>> firmware is no where to be found?
>>>>>>>
>>>>>>> What do you think about this?
>>>>>>>
>>>>>> I don't think firmware can be updated before suspend/hibernate. I
>>>>>> don't
>>>>>> see any reason why it can be updated. If you think it can be updated
>>>>>> please quote relevant doc.
>>>>> I've not found any documentation on it. Let's wait for others to review
>>>>> and it it cannot be updated, I'll remove this part.
>>>>>
>>>>
>>>> Wouldn't this be trivial to test? Boot the device, go modify the
>>>> firmware on the filesystem, then go through a suspend cycle.
>>> I just tested this. I've used an old firmware from last year vs the
>>> latest one.
>>>
>>> Firmware A: old firmware size: 5349376
>>> Firmware B: new firmware size: 5165056
>>>
>>> A here has bigger size.
>>>
>>> 1. I loaded A at boot and then replaced the firmwares in filesystem with
>>> B before syspend. At resume time, B was loaded fine by freeing the
>>> bigger memory area and allocating the smaller one.
>>>
>>> 2. I loaded B and then replaced A in its place before suspend. At resume
>>> time, memory was freed and larger memory was allocated. But driver
>>> wasn't able to initialize correctly:
>>>
>>> [ 184.051902] ath11k_pci 0000:03:00.0: timeout while waiting for
>>> restart complete
>>> [ 184.051916] ath11k_pci 0000:03:00.0: failed to resume core: -110
>>> [ 184.051923] ath11k_pci 0000:03:00.0: PM: dpm_run_callback():
>>> pci_pm_resume returns -110
>>> [ 184.051945] ath11k_pci 0000:03:00.0: PM: failed to resume async:
>>> error -110
>>> [ 187.251911] ath11k_pci 0000:03:00.0: wmi command 16387 timeout
>>> [ 187.251924] ath11k_pci 0000:03:00.0: failed to send
>>> WMI_PDEV_SET_PARAM cmd
>>> [ 187.251933] ath11k_pci 0000:03:00.0: failed to enable dynamic bw: -11
>>>
>>> So should we generalize above that changing firmware at
>>> suspend/hibernation time isn't supported. If firmware package is
>>> updated, does user restarts every time?
>>
>> You may want to review how other devices handle this. I can think of
>> these threads as potential reference
>>
>> https://lore.kernel.org/all/
>> CAPM=9twyvq3EWkwUeoTdMMj76u_sRPmUDHWrzbzEZFQ8eL++BQ@...l.gmail.com/
>> https://lore.kernel.org/all/20250207012531.621369-1-airlied@gmail.com/
> They are talking about firmware cache which is not being used in the
> wireless drivers. In my kernel config, firwmare cache is enabeld. But
> everytime kernel needs to read the firwamre, it reads from the filesystem.
>
> What can be the way forward for this patch? Assuming my previous
> experiment with changed firmwares across suspend/resume failed, I should
> remove reuse logic and send again?
Perhaps you need to refactor the wireless drivers?
I'm not convinced your patch is valid. If FW needs to be reloaded due
to suspend/resume, it seems like the proper thing is to load the same FW
that was loaded at device boot. Per your testing, loading changed FW
can cause a failure. Even if it doesn't fail, will the changed firmware
cause a "breakage" from the user perspective by modifying the device
behavior?
This does not seem to be a problem that is relevant to all MHI devices,
so whatever the end solution ends up being, I think that it should not
be blanket applied to all of MHI.
-Jeff
Powered by blists - more mailing lists