linux-kernel - Re: Regression caused by commit 4687fdbb805a ("mm/filemap: Support VM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87bjogdy4w.fsf@linux.dev>
Date: Fri, 15 Aug 2025 15:21:35 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,  "linux-mm@...ck.org"
 <linux-mm@...ck.org>,  "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>
Subject: Re: Regression caused by commit 4687fdbb805a ("mm/filemap: Support
 VM_HUGEPAGE for file mappings")

Roman Gushchin <roman.gushchin@...ux.dev> writes:

> Matthew Wilcox <willy@...radead.org> writes:
>
>> On Fri, Aug 15, 2025 at 11:43:25AM -0700, Roman Gushchin wrote:
>>> The commit 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file
>>> mappings") causes a regression in our production for containers
>>> which are running short on memory. In some cases they are getting
>>> stuck for hours in a vicious reclaim cycle. Reverting this commit
>>> fixes the problem.
>>> 
>>> As I understand, the intention of the commit is to allocate large folios
>>> whenever possible, and the idea is to ignore device-specific readahead
>>> settings and the mmap_miss logic to achieve that, which makes total
>>> sense.
>>> 
>>> However under a heavy memory pressure there must be a mechanism to
>>> revert to order-0 folios, otherwise the memory pressure is inevitable
>>> increased. Maybe mmap_miss heuristics should still be applied? Any other
>>> ideas how to fix it?
>>
>> What's supposed to happen is that we should have logic like:
>>
>>                         if (order > min_order)
>>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>>
>> so we try a little bit to free memory if we can't allocate an order-9
>> folio immediately, but we shouldn't be retrying for hours.  Maybe
>> that got lost somewhere along the line because I don't see it now.
>
> Yeah, I see it in __filemap_get_folio(), but not in ra_alloc_folio().
> I'll prepare a fix for this.

Actually I'm wrong. It's there, hidden in readahead_gfp_mask(), and it's
not conditional on the folio order. However it's not helping/not enough.