linux-kernel - Re: [PATCH v2 1/3] bus: mhi: host: keep bhi buffer through suspend cycle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <2025071722-panther-legwarmer-d2be@gregkh>
Date: Thu, 17 Jul 2025 13:50:52 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Muhammad Usama Anjum <usama.anjum@...labora.com>
Cc: Manivannan Sadhasivam <mani@...nel.org>,
	Jeff Hugo <jeff.hugo@....qualcomm.com>,
	Youssef Samir <quic_yabdulra@...cinc.com>,
	Matthew Leung <quic_mattleun@...cinc.com>,
	Alexander Wilhelm <alexander.wilhelm@...termo.com>,
	Kunwu Chan <chentao@...inos.cn>,
	Krishna Chaitanya Chundru <krishna.chundru@....qualcomm.com>,
	Jacek Lawrynowicz <jacek.lawrynowicz@...ux.intel.com>,
	Yan Zhen <yanzhen@...o.com>, Sujeev Dias <sdias@...eaurora.org>,
	Siddartha Mohanadoss <smohanad@...eaurora.org>, mhi@...ts.linux.dev,
	linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
	kernel@...labora.com, stable@...r.kernel.org
Subject: Re: [PATCH v2 1/3] bus: mhi: host: keep bhi buffer through suspend
 cycle

On Thu, Jul 17, 2025 at 03:00:14PM +0500, Muhammad Usama Anjum wrote:
> Hi Greg,
> 
> On 7/16/25 2:34 PM, Greg Kroah-Hartman wrote:
> > On Tue, Jul 15, 2025 at 06:25:07PM +0500, Muhammad Usama Anjum wrote:
> >> When there is memory pressure, at resume time dma_alloc_coherent()
> >> returns error which in turn fails the loading of firmware and hence
> >> the driver crashes:
> >>
> >> kernel: kworker/u33:5: page allocation failure: order:7,
> >> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> >> kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1 3843143b92e9da0fa2d3d5f21f51beaed15c7d59
> >> kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
> >> kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> >> kernel: Call Trace:
> >> kernel:  <TASK>
> >> kernel:  dump_stack_lvl+0x4e/0x70
> >> kernel:  warn_alloc+0x164/0x190
> >> kernel:  ? srso_return_thunk+0x5/0x5f
> >> kernel:  ? __alloc_pages_direct_compact+0xaf/0x360
> >> kernel:  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
> >> kernel:  __alloc_pages_noprof+0x321/0x350
> >> kernel:  __dma_direct_alloc_pages.isra.0+0x14a/0x290
> >> kernel:  dma_direct_alloc+0x70/0x270
> >> kernel:  mhi_fw_load_handler+0x126/0x340 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
> >> kernel:  mhi_pm_st_worker+0x5e8/0xac0 [mhi a96cb91daba500cc77f86bad60c1f332dc3babdf]
> >> kernel:  ? srso_return_thunk+0x5/0x5f
> >> kernel:  process_one_work+0x17e/0x330
> >> kernel:  worker_thread+0x2ce/0x3f0
> >> kernel:  ? __pfx_worker_thread+0x10/0x10
> >> kernel:  kthread+0xd2/0x100
> >> kernel:  ? __pfx_kthread+0x10/0x10
> >> kernel:  ret_from_fork+0x34/0x50
> >> kernel:  ? __pfx_kthread+0x10/0x10
> >> kernel:  ret_from_fork_asm+0x1a/0x30
> >> kernel:  </TASK>
> >> kernel: Mem-Info:
> >> kernel: active_anon:513809 inactive_anon:152 isolated_anon:0
> >>     active_file:359315 inactive_file:2487001 isolated_file:0
> >>     unevictable:637 dirty:19 writeback:0
> >>     slab_reclaimable:160391 slab_unreclaimable:39729
> >>     mapped:175836 shmem:51039 pagetables:4415
> >>     sec_pagetables:0 bounce:0
> >>     kernel_misc_reclaimable:0
> >>     free:125666 free_pcp:0 free_cma:0
> > 
> > This is not a "crash", it is a warning that your huge memory allocation
> > did not succeed.  Properly handle this issue (and if you know it's going
> > to happen, turn the warning off in your allocation), and you should be
> > fine.
> Yes, the system is fine. But wifi/sound drivers fail to reinitialize.
> 
> > 
> >> In above example, if we sum all the consumed memory, it comes out
> >> to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM.
> >> Even though memory is present. But all of the dma memory has been
> >> exhausted or fragmented.
> > 
> > What caused that to happen?
> Excessive use of the page cache occurs when user-space applications open
> and consume large amounts of file system memory, even if those files are
> no longer being actively read. I haven't found any documentation on limiting
> the size of the page cache or preventing it from occupying DMA-capable
> memory—perhaps the MM developers can provide more insight.
> 
> I can reproduce this issue by running stress tests that create and
> sequentially read files. On a system with 16GB of RAM, the page cache can
> easily grow to 10–12GB. Since the kernel manages the page cache, it's unclear
> why it doesn't reclaim inactive cache more aggressively.

It should be reclaiming this, as it's just cache, not really used
memory.  I think something isn't tuned properly for your system, OR your
drivers are asking for way too much memory.  Either way, the correct
solution is NOT to have the drivers consume even more memory, that just
makes the overall system less useful.

good luck!

greg k-h