linux-kernel - Re: [syzbot] [mm?] WARNING in xfs_init_fs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5397779c-9a89-4dd3-9937-208fefb58f78@suse.cz>
Date: Tue, 8 Jul 2025 10:50:31 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Dave Chinner <david@...morbit.com>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, Zi Yan
 <ziy@...dia.com>, Barry Song <baohua@...nel.org>,
 Carlos Maiolino <cem@...nel.org>, linux-xfs@...r.kernel.org,
 syzbot <syzbot+359a67b608de1ef72f65@...kaller.appspotmail.com>,
 akpm@...ux-foundation.org, apopple@...dia.com, byungchul@...com,
 david@...hat.com, gourry@...rry.net, joshua.hahnjy@...il.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org, matthew.brost@...el.com,
 rakie.kim@...com, syzkaller-bugs@...glegroups.com,
 ying.huang@...ux.alibaba.com, Harry Yoo <harry.yoo@...cle.com>,
 Michal Hocko <mhocko@...e.com>, Matthew Wilcox <willy@...radead.org>
Subject: Re: [syzbot] [mm?] WARNING in xfs_init_fs_context

On 7/8/25 00:10, Dave Chinner wrote:
> On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
>> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
>> > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ?
>> > 
>> > 	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL);
>> > 	if (!mp)
>> > 		return -ENOMEM;
>> > 
>> > This looks an allocation attempt which can fail safely.
> 
> It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being
> specified.

If you mean the "too small to fail" behavior then it's generally true,
except in some corner cases like being an oom victim, in which case the
allocation can fail - the userspace process is doomed anyway. But a (small)
kernel allocation not handling NULL would still need __GFP_NOFAIL to prevent
that corner case.

>> Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc()
>> to kmalloc()") dropped the xfs wrapper. This allocation didn't use
>> KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order
>> nofail issue for another allocation site that had to use xlog_kvmalloc().
> 
> I don't see how high-order allocation behaviour is relevant here.
> 
> Pahole says the struct xfs_mount is 4224 bytes in length. It is an
> order-1 allocation and if we've fragmented memory so badly that slab
> can't allocate an order-1 page then *lots* of other stuff is going
> to be stalling. (e.g. slab pages for inodes are typically order-3,
> same as the kmalloc-8kk slab).

Elsewhere in this thread we figured it out since I wrote the quoted reply.
4224 bytes means kmalloc-8k where the fallback allocation (the one that
passes on the __GFP_NOFAIL) order is 1 normally. But due to KASAN enabled
its metadata means the per-object size goes above 8k and thus the fallback
order will be 2. It's a corner case that wasn't anticipated and existed for
years without known reports. We'll need to deal with it somehow.

> Note that the size of the structure is largely because of the
> embedded cpumask for inodegc:
> 
> 	struct cpumask             m_inodegc_cpumask;    /*  3104  1024 */
> 
> This should probably be pulled out into a dynamically allocated
> inodegc specific structure. Then the struct xfs_mount is only a
> order-0 allocation and should never fail, regardless of
> __GFP_NOFAIL being specified or not.
> 
>> I think either this allocation really can fail as the code (return
>> -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use
>> kvmalloc() - I think the wrapper for that can be removed now too after
>> the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc:
>> make kmalloc fast path real fast path").
> 
> I know about that - I have patches that I'm testing that replace
> xlog_kvmalloc() with kvmalloc calls.

Great, thanks!

> -Dave.