[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <262BB988-A30B-4A4A-B96B-E604D86CA18C@dilger.ca>
Date: Fri, 26 Sep 2025 13:17:51 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Deepanshu Kartikey <kartikey406@...il.com>
Cc: Theodore Ts'o <tytso@....edu>,
linux-ext4 <linux-ext4@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] ext4: fix allocation failure in ext4_mb_load_buddy_gfp
On Sep 24, 2025, at 8:06 PM, Deepanshu Kartikey <kartikey406@...il.com> wrote:
>
>
> Hi Andreas,
>
> Thank you for pointing out the fundamental issue with my approach. You're right that removing __GFP_NOFAIL creates a worse problem by potentially triggering filesystem errors.
>
> I understand your suggestion about allowing the function to return errors so the caller can retry, but I need more specific guidance on the implementation approach.
>
> Questions:
>
> 1. **Function signature change**: Should ext4_discard_preallocations() be changed from void to int to return error codes? This would require updating all 13+ callers I found.
Changing internal function signatures is never a problem for Linux, except
userland syscall APIs. This is even less of a concern if all of the callers
are inside ext4.
I notice that ext4_discard_preallocations() also has an unused "int needed"
argument that could be removed.
> 2. **Caller modifications**: How should the various callers (ext4_truncate, ext4_clear_inode, ext4_release_file, etc.) handle allocation failures during memory pressure? Should they:
> - Retry the operation later?
> - Skip preallocation cleanup temporarily?
> - Handle it differently based on the calling context?
Looking at this more closely, it appears that ext4_discard_preallocations()
should not fail outright, since this would leak space in the filesystem.
I guess this goes back to a question of whether a warning message on the console
when the kernel is totally out of memory is a bad thing? The whole point of
__GFP_NOFAIL was to put the retry in the control of the MM layer, instead of
having the caller loop and doing the same thing.
> 3. **Memory pressure detection**: Is checking (current->flags & PF_MEMALLOC) the right approach to detect when we're in memory reclaim context?
>
> 4. **Scope of changes**: Would you prefer:
> - A minimal fix that just handles the allocation failure gracefully?
> - A more comprehensive rework of the error handling throughout the preallocation discard path?
I was thinking one option might be to have reserved memory to handle this
specific case, so that *one* preallocation can make progress at a time.
However, it isn't clear if this one allocation would be enough to guarantee
forward progress, or if there needs to be a pool at every step along the way.
Just freeing some other buddy bitmap by ext4 and retrying is not guaranteed
to make progress, since OOM means it is likely that *other* threads are also
trying hard to allocate memory.
> I want to make sure I understand the preferred approach before submitting v2, especially since this affects multiple call sites throughout the ext4 codebase.
I think the important question is what the impact is of the original problem
that was being fixed?
Another option would be to return -EAGAIN or -ENOMEM (or similar) from this
function and then the whole "flush THIS inode from cache" would be dropped,
and memory reclaim should progress to some other inode that doesn't have
preallocated blocks, or some other memory.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists