linux-kernel - Re: [syzbot] [mm?] WARNING in xfs_init_fs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGeQTWHSjpc1JvbZ@hyeyoo>
Date: Fri, 4 Jul 2025 17:26:53 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, Zi Yan <ziy@...dia.com>,
        Barry Song <baohua@...nel.org>, Carlos Maiolino <cem@...nel.org>,
        linux-xfs@...r.kernel.org, Dave Chinner <david@...morbit.com>,
        syzbot <syzbot+359a67b608de1ef72f65@...kaller.appspotmail.com>,
        akpm@...ux-foundation.org, apopple@...dia.com, byungchul@...com,
        david@...hat.com, gourry@...rry.net, joshua.hahnjy@...il.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        matthew.brost@...el.com, rakie.kim@...com,
        syzkaller-bugs@...glegroups.com, ying.huang@...ux.alibaba.com,
        Michal Hocko <mhocko@...e.com>, Matthew Wilcox <willy@...radead.org>
Subject: Re: [syzbot] [mm?] WARNING in xfs_init_fs_context

On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
> +CC xfs and few more
> 
> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
> > On 2025/07/02 0:01, Zi Yan wrote:
> >>>  __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972
> >>>  alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419
> >>>  alloc_slab_page mm/slub.c:2451 [inline]
> >>>  allocate_slab+0xe2/0x3b0 mm/slub.c:2627
> >>>  new_slab mm/slub.c:2673 [inline]
> >>
> >> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it.
> >> In allocate_slab(), the first allocation without __GFP_NOFAIL
> >> failed, the retry used __GFP_NOFAIL but kmem_cache order
> >> was greater than 1, which led to the warning above.
> >>
> >> Maybe allocate_slab() should just fail when kmem_cache
> >> order is too big and first trial fails? I am no expert,
> >> so add Vlastimil for help.
> 
> Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead
> to subsystems like xfs to reintroduce their own forever retrying
> wrappers again. I think it's going the best it can for the fallback
> attempt by using the minimum order, so the warning will never happen due
> to the calculated optimal order being too large, but only if the
> kmalloc()/kmem_cache_alloc() requested/object size is too large itself.

Right. The warning would trigger only if the object size is bigger
than 8k (PAGE_SIZE * 2).

> Hm but perhaps enabling slab_debug can inflate it over the threshold, is
> it the case here?

CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 panic_on_warn=1"

CONFIG_SLUB_DEBUG=y
# CONFIG_SLUB_DEBUG_ON is not set

It seems no slab_debug is involved here.

I downloaded the config and built the kernel, and
sizeof(struct xfs_mount) is 4480 bytes. It should have allocated using
order 1?

Not sure why the min order was greater than 1?
Not sure what I'm missing...

> I think in that rare case we could convert such
> fallback allocations to large kmalloc to avoid adding the debugging
> overhead - we can't easily create an individual slab page without the
> debugging layout for a kmalloc cache with debugging enabled.

Yeah that can be doable when the size is exactly 8k or very close to 8k.

-- 
Cheers,
Harry / Hyeonggon