lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO4loolcE5u2gSM0@hyeyoo>
Date: Tue, 14 Oct 2025 19:27:46 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Hao Ge <hao.ge@...ux.dev>
Cc: Vlastimil Babka <vbabka@...e.cz>, Alexei Starovoitov <ast@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Shakeel Butt <shakeel.butt@...ux.dev>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>,
        Suren Baghdasaryan <surenb@...gle.com>, cgroups@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Hao Ge <gehao@...inos.cn>
Subject: Re: [PATCH] slab: Introduce __SECOND_OBJEXT_FLAG for objext_flags

On Tue, Oct 14, 2025 at 05:31:24PM +0800, Hao Ge wrote:
> From: Hao Ge <gehao@...inos.cn>
> 
> We should not reuse the first bit for OBJEXTS_ALLOC_FAIL.
> This is because the following scenarios may be encountered:
> 
> Under heavy system load, certain sequences of events can trigger the

Hi Hao, thanks for catching it!

It's late at night and my brain is tired so I may be missing something,
but let me leave comment anyway...

> VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS, folio) check:

Should we check (folio->memcg_data != OBJEXTS_ALLOC_FAIL) &&
(folio->memcg_data & MEMCG_DATA_OBJEXTS) instead then?

Not clearing a valid folio->memcg_data is considered an error, but freeing a
folio that is marked OBJEXTS_ALLOC_FAIL isn't.

> 1. High system pressure may cause objext allocation failure for a slab.
> 2. When objext allocation fails, slab->obj_exts is set to
>    OBJEXTS_ALLOC_FAIL (value 1).
> 3. Later, this slab may enter the release process.
> 4. During release of the associated folio, the existing
>    VM_BUG_ON_FOLIO check validates folio->memcg_data.
>    If the MEMCG_DATA_OBJEXTS bit is unexpectedly
>    set here, the bug check gets triggered.
>
> We have obtained the following logs:
> [ 7108.343437] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0002deb97600 pfn:0x31eb96
> [ 7108.343482] head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 7108.343500] memcg:1
> [ 7108.343507] flags: 0x17ffff800000040(head|node=0|zone=2|lastcpupid=0xfffff)
> [ 7108.343523] raw: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
> [ 7108.343528] raw: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
> [ 7108.343534] head: 017ffff800000040 ffff0000c000cac0 dead000000000100 0000000000000000
> [ 7108.343539] head: ffff0002deb97600 0000000000240000 00000000ffffffff 0000000000000001
> [ 7108.343562] head: 017ffff800000001 fffffdffcb7ae581 00000000ffffffff 00000000ffffffff
> [ 7108.343569] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000002
> [ 7108.343574] page dumped because: VM_BUG_ON_FOLIO(folio->memcg_data & MEMCG_DATA_OBJEXTS)
> [ 7108.343601] ------------[ cut here ]------------
> [ 7108.343607] kernel BUG at ./include/linux/memcontrol.h:537!
> [ 7108.343617] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
> [ 7108.345751] Modules linked in: squashfs isofs vhost_vsock vhost_net vmw_vsock_virtio_transport_common vfio_iommu_type1 vhost vfio vsock vhost_iotlb iommufd tap binfmt_misc nfsv3 nfs_acl nfs lockd grace netfs tls rds dns_resolver tun brd overlay ntfs3 exfat btrfs blake2b_generic xor xor_neon raid6_pq loop sctp ip6_udp_tunnel udp_tunnel nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables rfkill ip_set sunrpc vfat fat joydev sg sch_fq_codel nfnetlink virtio_gpu drm_client_lib virtio_dma_buf drm_shmem_helper sr_mod drm_kms_helper cdrom drm ghash_ce virtio_net virtio_scsi backlight virtio_console virtio_blk net_failover failover virtio_mmio dm_mirror dm_region_hash dm_log dm_multipath dm_mod fuse i2c_dev virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring autofs4 aes_neon_bs aes_ce_blk [last unloaded: hwpoison_inject]
> [ 7108.355662] CPU: 7 UID: 0 PID: 4470 Comm: kylin-process-m Kdump: loaded Not tainted 6.18.0-rc1-dirty #54 PREEMPT(voluntary)
> [ 7108.356864] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
> [ 7108.357621] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 7108.358981] pc : __free_frozen_pages+0xf18/0x18e8
> [ 7108.359834] lr : __free_frozen_pages+0xf18/0x18e8
> [ 7108.360379] sp : ffff8000a2bb7580
> [ 7108.360786] x29: ffff8000a2bb7580 x28: fffffdffcb7ae580 x27: fffffdffcb7ae580
> [ 7108.362013] x26: fffffdffcb7ae588 x25: 1fffffbff96f5cb1 x24: 1fffffbff96f5cb0
> [ 7108.362804] x23: ffff8000839d6ba0 x22: ffff8000839d6000 x21: 0000000000000000
> [ 7108.363596] x20: 0000000000000000 x19: 0000000000000001 x18: 0000000000000000
> [ 7108.364393] x17: 445f47434d454d20 x16: 2620617461645f67 x15: 636d656d3e2d6f69
> [ 7108.365498] x14: 6c6f66284f494c4f x13: 0000000000000001 x12: ffff600063fece93
> [ 7108.366317] x11: 1fffe00063fece92 x10: ffff600063fece92 x9 : dfff800000000000
> [ 7108.367610] x8 : 00009fff9c01316e x7 : ffff00031ff67493 x6 : 0000000000000001
> [ 7108.368455] x5 : ffff00031ff67490 x4 : ffff600063fece93 x3 : 0000000000000000
> [ 7108.369276] x2 : 0000000000000000 x1 : ffff000103fe5d40 x0 : 000000000000004c
> [ 7108.370140] Call trace:
> [ 7108.370463]  __free_frozen_pages+0xf18/0x18e8 (P)
> [ 7108.371011]  free_frozen_pages+0x1c/0x30
> [ 7108.372040]  __free_slab+0xd0/0x250
> [ 7108.372471]  free_slab+0x38/0x118
> [ 7108.372882]  free_to_partial_list+0x1d4/0x340
> [ 7108.373813]  __slab_free+0x24c/0x348
> [ 7108.374253]  ___cache_free+0xf0/0x110
> [ 7108.374699]  qlist_free_all+0x78/0x130
> [ 7108.375156]  kasan_quarantine_reduce+0x114/0x148
> [ 7108.375695]  __kasan_slab_alloc+0x7c/0xb0
> [ 7108.376668]  kmem_cache_alloc_noprof+0x164/0x5c8
> [ 7108.377206]  __alloc_object+0x44/0x1f8
> [ 7108.377659]  __create_object+0x34/0xc8
> [ 7108.378196]  kmemleak_alloc+0xb8/0xd8
> [ 7108.378644]  kmem_cache_alloc_noprof+0x368/0x5c8
> [ 7108.379224]  getname_flags.part.0+0xa4/0x610
> [ 7108.379733]  getname_flags+0x80/0xd8
> [ 7108.380169]  do_sys_openat2+0xb4/0x178
> [ 7108.380921]  __arm64_sys_openat+0x134/0x1d0
> [ 7108.381952]  invoke_syscall+0xd4/0x258
> [ 7108.382408]  el0_svc_common.constprop.0+0xb4/0x240
> [ 7108.382965]  do_el0_svc+0x48/0x68
> [ 7108.383375]  el0_svc+0x40/0xe0
> [ 7108.383757]  el0t_64_sync_handler+0xa0/0xe8
> [ 7108.384465]  el0t_64_sync+0x1ac/0x1b0
> [ 7108.385284] Code: 91398021 aa1b03e0 91138021 97fd35e3 (d4210000)
> [ 7108.386553] SMP: stopping secondary CPUs
> [ 7108.389714] Starting crashdump kernel...
> [ 7108.390190] Bye!
> 
> So, introduce __SECOND_OBJEXT_FLAG for objext_flags, adjust
> the corresponding order accordingly, and ensure that OBJEXTS_ALLOC_FAIL
> is no longer reused.
>
> Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")

Hmm using a new bit was suggested at that time, but that would
require bumping up the alignment when allocating slabobj_ext array?
(see alloc_slab_obj_exts())

And we can still distinguish two cases where

1) MEMCG_DATA_OBJEXTS is set, but upper bits are not set,
   so it should mean obj_exts allocation failed (OBJEXTS_ALLOC_FAIL),
   thus do not report error, or

2) MEMCG_DATA_OBJEXTS is set, and upper bits are also set, so someone
   did not clear a valid folio->memcg_data before freeing the folio
   (report error).

without introducing a new bit, right?

> Signed-off-by: Hao Ge <gehao@...inos.cn>
> ---
>  include/linux/memcontrol.h | 16 ++++++----------
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 873e510d6f8d..8ea023944fac 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -341,27 +341,23 @@ enum page_memcg_data_flags {
>  	__NR_MEMCG_DATA_FLAGS  = (1UL << 2),
>  };
>  
> -#define __OBJEXTS_ALLOC_FAIL	MEMCG_DATA_OBJEXTS
>  #define __FIRST_OBJEXT_FLAG	__NR_MEMCG_DATA_FLAGS
> +#define __SECOND_OBJEXT_FLAG    (__FIRST_OBJEXT_FLAG << 1)
>  
>  #else /* CONFIG_MEMCG */
>  
> -#define __OBJEXTS_ALLOC_FAIL	(1UL << 0)
>  #define __FIRST_OBJEXT_FLAG	(1UL << 0)
> +#define __SECOND_OBJEXT_FLAG	(1UL << 0)
>  
>  #endif /* CONFIG_MEMCG */
>  
>  enum objext_flags {
> -	/*
> -	 * Use bit 0 with zero other bits to signal that slabobj_ext vector
> -	 * failed to allocate. The same bit 0 with valid upper bits means
> -	 * MEMCG_DATA_OBJEXTS.
> -	 */
> -	OBJEXTS_ALLOC_FAIL = __OBJEXTS_ALLOC_FAIL,
> +	/* slabobj_ext vector failed to allocate */
> +	OBJEXTS_ALLOC_FAIL = __FIRST_OBJEXT_FLAG,
>  	/* slabobj_ext vector allocated with kmalloc_nolock() */
> -	OBJEXTS_NOSPIN_ALLOC = __FIRST_OBJEXT_FLAG,
> +	OBJEXTS_NOSPIN_ALLOC = __SECOND_OBJEXT_FLAG,
>  	/* the next bit after the last actual flag */
> -	__NR_OBJEXTS_FLAGS  = (__FIRST_OBJEXT_FLAG << 1),
> +	__NR_OBJEXTS_FLAGS  = (__SECOND_OBJEXT_FLAG << 1),
>  };
>  
>  #define OBJEXTS_FLAGS_MASK (__NR_OBJEXTS_FLAGS - 1)
> -- 
> 2.25.1
> 

-- 
Cheers,
Harry / Hyeonggon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ