linux-kernel - Re: Excessive page cache occupies DMA32 memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c93b34ca-1abf-4db0-90f9-3802ac02c25a@arm.com>
Date: Tue, 22 Jul 2025 11:03:18 +0100
From: Robin Murphy <robin.murphy@....com>
To: Greg KH <gregkh@...uxfoundation.org>,
 Muhammad Usama Anjum <usama.anjum@...labora.com>
Cc: Matthew Wilcox <willy@...radead.org>,
 Baochen Qiang <baochen.qiang@....qualcomm.com>,
 Jeff Hugo <jeff.hugo@....qualcomm.com>,
 Manivannan Sadhasivam <mani@...nel.org>, Jeff Johnson <jjohnson@...nel.org>,
 Marek Szyprowski <m.szyprowski@...sung.com>, linux-fsdevel@...r.kernel.org,
 linux-mm@...ck.org, kernel@...labora.com,
 Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
 iommu@...ts.linux.dev
Subject: Re: Excessive page cache occupies DMA32 memory

On 2025-07-22 8:24 am, Greg KH wrote:
> On Tue, Jul 22, 2025 at 11:05:11AM +0500, Muhammad Usama Anjum wrote:
>> Adding ath/mhi and dma API developers to the discussion.
>>
>> On 7/22/25 10:32 AM, Greg KH wrote:
>>> On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
>>>> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
>>>>> Hello,
>>>>>
>>>>> When 10-12GB our of total 16GB RAM is being used as page cache
>>>>> (active_file + inactive_file) at suspend time, the drivers fail to allocate
>>>>> dma memory at resume as dma memory is either occupied by the page cache or
>>>>> fragmented. Example:
>>>>>
>>>>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>>>>
>>>> Just to be clear, this is not a page cache problem.  The driver is asking
>>>> us to do a 512kB allocation without doing I/O!  This is a ridiculous
>>>> request that should be expected to fail.
>>>>
>>>> The solution, whatever it may be, is not related to the page cache.
>>>> I reject your diagnosis.  Almost all of the page cache is clean and
>>>> could be dropped (as far as I can tell from the output below).
>>>>
>>>> Now, I'm not too familiar with how the page allocator chooses to fail
>>>> this request.  Maybe it should be trying harder to drop bits of the page
>>>> cache.  Maybe it should be doing some compaction.
>> That's very thoughtful. I'll look at the page allocator why isn't it dropping
>> cache or doing compaction.
>>
>>>> I am not inclined to
>>>> go digging on your behalf, because frankly I'm offended by the suggestion
>>>> that the page cache is at fault.
>> I apologize—that wasn't my intention.
>>
>>>>
>>>> Perhaps somebody else will help you, or you can dig into this yourself.
>>>
>>> I'm with Matthew, this really looks like a driver bug somehow.  If there
>>> is page cache memory that is "clean", the driver should be able to
>>> access it just fine if really required.
>>>
>>> What exact driver(s) is having this problem?  What is the exact error,
>>> and on what lines of code?
>> The issue occurs on both ath11k and mhi drivers during resume, when
>> dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has
>> been observed at multiple points in these drivers.
>>
>> For example, in the mhi driver, the failure is triggered when the
>> MHI's st_worker gets scheduled-in at resume.
>>
>> mhi_pm_st_worker()
>> -> mhi_fw_load_handler()
>>     -> mhi_load_image_bhi()
>>        -> mhi_alloc_bhi_buffer()
>>           -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM
> 
> And what is the exact size you are asking for here?
> What is the dma ops set to for your system?  Are you sure that is
> working properly for your platform?  What platform is this exactly?
> 
> The driver isn't asking for DMA32 here, so that shouldn't be the issue,
> so why do you feel it is?  Have you tried using the tracing stuff for
> dma allocations to see exactly what is going on for this failure?

I'm guessing the device has a 32-bit DMA mask, and the allocation ends 
up in __dma_direct_alloc_pages() such that that adds GFP_DMA32 in order 
to try to satisfy the mask via regular page allocation. How GFP_KERNEL 
turns into GFP_NOIO, though, given that the DMA layer certainly isn't 
(knowingly) messing with __GFP_IO or __GFP_FS, is more of a mystery... I 
suppose "during resume" is the red flag there - is this worker perhaps 
trying to run too early in some restricted context before the rest of 
the system has fully woken up?

Thanks,
Robin.

> 
> I think you need to do a bit more debugging :)
> 
> thanks,
> 
> greg k-h