[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5397779c-9a89-4dd3-9937-208fefb58f78@suse.cz>
Date: Tue, 8 Jul 2025 10:50:31 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Dave Chinner <david@...morbit.com>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, Zi Yan
<ziy@...dia.com>, Barry Song <baohua@...nel.org>,
Carlos Maiolino <cem@...nel.org>, linux-xfs@...r.kernel.org,
syzbot <syzbot+359a67b608de1ef72f65@...kaller.appspotmail.com>,
akpm@...ux-foundation.org, apopple@...dia.com, byungchul@...com,
david@...hat.com, gourry@...rry.net, joshua.hahnjy@...il.com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, matthew.brost@...el.com,
rakie.kim@...com, syzkaller-bugs@...glegroups.com,
ying.huang@...ux.alibaba.com, Harry Yoo <harry.yoo@...cle.com>,
Michal Hocko <mhocko@...e.com>, Matthew Wilcox <willy@...radead.org>
Subject: Re: [syzbot] [mm?] WARNING in xfs_init_fs_context
On 7/8/25 00:10, Dave Chinner wrote:
> On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
>> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
>> > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ?
>> >
>> > mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
>> > if (!mp)
>> > return -ENOMEM;
>> >
>> > This looks an allocation attempt which can fail safely.
>
> It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being
> specified.
If you mean the "too small to fail" behavior then it's generally true,
except in some corner cases like being an oom victim, in which case the
allocation can fail - the userspace process is doomed anyway. But a (small)
kernel allocation not handling NULL would still need __GFP_NOFAIL to prevent
that corner case.
>> Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc()
>> to kmalloc()") dropped the xfs wrapper. This allocation didn't use
>> KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order
>> nofail issue for another allocation site that had to use xlog_kvmalloc().
>
> I don't see how high-order allocation behaviour is relevant here.
>
> Pahole says the struct xfs_mount is 4224 bytes in length. It is an
> order-1 allocation and if we've fragmented memory so badly that slab
> can't allocate an order-1 page then *lots* of other stuff is going
> to be stalling. (e.g. slab pages for inodes are typically order-3,
> same as the kmalloc-8kk slab).
Elsewhere in this thread we figured it out since I wrote the quoted reply.
4224 bytes means kmalloc-8k where the fallback allocation (the one that
passes on the __GFP_NOFAIL) order is 1 normally. But due to KASAN enabled
its metadata means the per-object size goes above 8k and thus the fallback
order will be 2. It's a corner case that wasn't anticipated and existed for
years without known reports. We'll need to deal with it somehow.
> Note that the size of the structure is largely because of the
> embedded cpumask for inodegc:
>
> struct cpumask m_inodegc_cpumask; /* 3104 1024 */
>
> This should probably be pulled out into a dynamically allocated
> inodegc specific structure. Then the struct xfs_mount is only a
> order-0 allocation and should never fail, regardless of
> __GFP_NOFAIL being specified or not.
>
>> I think either this allocation really can fail as the code (return
>> -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use
>> kvmalloc() - I think the wrapper for that can be removed now too after
>> the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc:
>> make kmalloc fast path real fast path").
>
> I know about that - I have patches that I'm testing that replace
> xlog_kvmalloc() with kvmalloc calls.
Great, thanks!
> -Dave.
Powered by blists - more mailing lists