linux-kernel - Re: [PATCH 22/25] fs/buffer: prevent WARN_ON in __alloc_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aP0PachXS8Qxjo9Q@casper.infradead.org>
Date: Sat, 25 Oct 2025 18:56:57 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Baokun Li <libaokun@...weicloud.com>
Cc: "Darrick J. Wong" <djwong@...nel.org>, linux-ext4@...r.kernel.org,
	tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz,
	linux-kernel@...r.kernel.org, kernel@...kajraghav.com,
	mcgrof@...nel.org, linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org, yi.zhang@...wei.com, yangerkun@...wei.com,
	chengzhihao1@...wei.com, libaokun1@...wei.com,
	catherine.hoang@...cle.com
Subject: Re: [PATCH 22/25] fs/buffer: prevent WARN_ON in
 __alloc_pages_slowpath() when BS > PS

On Sat, Oct 25, 2025 at 02:32:45PM +0800, Baokun Li wrote:
> On 2025-10-25 12:45, Matthew Wilcox wrote:
> > On Sat, Oct 25, 2025 at 11:22:18AM +0800, libaokun@...weicloud.com wrote:
> >> +	while (1) {
> >> +		folio = __filemap_get_folio(mapping, index, fgp_flags,
> >> +					    gfp & ~__GFP_NOFAIL);
> >> +		if (!IS_ERR(folio) || !(gfp & __GFP_NOFAIL))
> >> +			return folio;
> >> +
> >> +		if (PTR_ERR(folio) != -ENOMEM && PTR_ERR(folio) != -EAGAIN)
> >> +			return folio;
> >> +
> >> +		memalloc_retry_wait(gfp);
> >> +	}
> > No, absolutely not.  We're not having open-coded GFP_NOFAIL semantics.
> > The right way forward is for ext4 to use iomap, not for buffer heads
> > to support large block sizes.
> 
> ext4 only calls getblk_unmovable or __getblk when reading critical
> metadata. Both of these functions set __GFP_NOFAIL to ensure that
> metadata reads do not fail due to memory pressure.

If filesystems actually require __GFP_NOFAIL for high-order allocations,
then this is a new requirement that needs to be communicated to the MM
developers, not hacked around in filesystems (or the VFS).  And that
communication needs to be a separate thread with a clear subject line
to attract the right attention, not buried in patch 26/28.

For what it's worth, I think you have a good case.  This really is
a new requirement (bs>PS) and in this scenario, we should be able to
reclaim page cache memory of the appropriate order to satisfy the NOFAIL
requirement.  There will be concerns that other users will now be able to
use it without warning, but I think eventually this use case will prevail.

> Both functions eventually call grow_dev_folio(), which is why we
> handle the __GFP_NOFAIL logic there. xfs_buf_alloc_backing_mem()
> has similar logic, but XFS manages its own metadata, allowing it
> to use vmalloc for memory allocation.

The other possibility is that we switch ext4 away from the buffer cache
entirely.  This is a big job!  I know Catherine has been working on
a generic replacement for the buffer cache, but I'm not sure if it's
ready yet.