linux-kernel - Re: [PATCH v2 1/3] bus: mhi: host: keep bhi buffer through suspend cycle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9c9d0302-bbb8-468f-8be5-5a3e0015528f@collabora.com>
Date: Thu, 17 Jul 2025 15:00:14 +0500
From: Muhammad Usama Anjum <usama.anjum@...labora.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Manivannan Sadhasivam <mani@...nel.org>,
 Jeff Hugo <jeff.hugo@....qualcomm.com>,
 Youssef Samir <quic_yabdulra@...cinc.com>,
 Matthew Leung <quic_mattleun@...cinc.com>,
 Alexander Wilhelm <alexander.wilhelm@...termo.com>,
 Kunwu Chan <chentao@...inos.cn>,
 Krishna Chaitanya Chundru <krishna.chundru@....qualcomm.com>,
 Jacek Lawrynowicz <jacek.lawrynowicz@...ux.intel.com>,
 Yan Zhen <yanzhen@...o.com>, Sujeev Dias <sdias@...eaurora.org>,
 Siddartha Mohanadoss <smohanad@...eaurora.org>, mhi@...ts.linux.dev,
 linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel@...labora.com, stable@...r.kernel.org
Subject: Re: [PATCH v2 1/3] bus: mhi: host: keep bhi buffer through suspend
 cycle

Hi Greg,

On 7/16/25 2:34 PM, Greg Kroah-Hartman wrote:
> On Tue, Jul 15, 2025 at 06:25:07PM +0500, Muhammad Usama Anjum wrote:
>> When there is memory pressure, at resume time dma_alloc_coherent()
>> returns error which in turn fails the loading of firmware and hence
>> the driver crashes:
>>
>> kernel: kworker/u33:5: page allocation failure: order:7,
>> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>> kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1 3843143b92e9da0fa2d3d5f21f51beaed15c7d59
>> kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
>> kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
>> kernel: Call Trace:
>> kernel:  <TASK>
>> kernel:  dump_stack_lvl+0x4e/0x70
>> kernel:  warn_alloc+0x164/0x190
>> kernel:  ? srso_return_thunk+0x5/0x5f
>> kernel:  ? __alloc_pages_direct_compact+0xaf/0x360
>> kernel:  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
>> kernel:  __alloc_pages_noprof+0x321/0x350
>> kernel:  __dma_direct_alloc_pages.isra.0+0x14a/0x290
>> kernel:  dma_direct_alloc+0x70/0x270
>> kernel:  mhi_fw_load_handler+0x126/0x340 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
>> kernel:  mhi_pm_st_worker+0x5e8/0xac0 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
>> kernel:  ? srso_return_thunk+0x5/0x5f
>> kernel:  process_one_work+0x17e/0x330
>> kernel:  worker_thread+0x2ce/0x3f0
>> kernel:  ? __pfx_worker_thread+0x10/0x10
>> kernel:  kthread+0xd2/0x100
>> kernel:  ? __pfx_kthread+0x10/0x10
>> kernel:  ret_from_fork+0x34/0x50
>> kernel:  ? __pfx_kthread+0x10/0x10
>> kernel:  ret_from_fork_asm+0x1a/0x30
>> kernel:  </TASK>
>> kernel: Mem-Info:
>> kernel: active_anon:513809 inactive_anon:152 isolated_anon:0
>>     active_file:359315 inactive_file:2487001 isolated_file:0
>>     unevictable:637 dirty:19 writeback:0
>>     slab_reclaimable:160391 slab_unreclaimable:39729
>>     mapped:175836 shmem:51039 pagetables:4415
>>     sec_pagetables:0 bounce:0
>>     kernel_misc_reclaimable:0
>>     free:125666 free_pcp:0 free_cma:0
> 
> This is not a "crash", it is a warning that your huge memory allocation
> did not succeed.  Properly handle this issue (and if you know it's going
> to happen, turn the warning off in your allocation), and you should be
> fine.
Yes, the system is fine. But wifi/sound drivers fail to reinitialize.

> 
>> In above example, if we sum all the consumed memory, it comes out
>> to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM.
>> Even though memory is present. But all of the dma memory has been
>> exhausted or fragmented.
> 
> What caused that to happen?
Excessive use of the page cache occurs when user-space applications open
and consume large amounts of file system memory, even if those files are
no longer being actively read. I haven't found any documentation on limiting
the size of the page cache or preventing it from occupying DMA-capable
memory—perhaps the MM developers can provide more insight.

I can reproduce this issue by running stress tests that create and
sequentially read files. On a system with 16GB of RAM, the page cache can
easily grow to 10–12GB. Since the kernel manages the page cache, it's unclear
why it doesn't reclaim inactive cache more aggressively.

> 
>> Fix it by allocating it only once and then reuse the same allocated
>> memory. As we'll allocate this memory only once, this memory will stay
>> allocated.
> 
> As others said, no, don't consume memory for no good reason, that just
> means that other devices will fail more frequently.  If all
> devices/drivers did this, you wouldn't have memory to work either.
Makes sense.

> 
> thanks,
> 
> greg k-h