linux-kernel - Re: [linus:master] [mm, slub] 0af8489b02: kernel_BUG_at

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e443af8a-6fde-bd5a-3e5b-e30f7d5ac578@suse.cz>
Date:   Thu, 12 Jan 2023 09:49:34 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Oliver Sang <oliver.sang@...el.com>
Cc:     Hyeonggon Yoo <42.hyeyoo@...il.com>, oe-lkp@...ts.linux.dev,
        lkp@...el.com, Mike Rapoport <rppt@...ux.ibm.com>,
        Christoph Lameter <cl@...ux.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Matthew Wilcox <willy@...radead.org>
Subject: Re: [linus:master] [mm, slub] 0af8489b02:
 kernel_BUG_at_include/linux/mm.h

On 1/12/23 08:47, Oliver Sang wrote:
> hi, Vlastimil,
> 
> On Tue, Jan 10, 2023 at 03:09:36PM +0100, Vlastimil Babka wrote:
>> On 1/10/23 14:53, Oliver Sang wrote:
>> > hi all,
>> > 
>> > On Mon, Jan 09, 2023 at 10:01:15PM +0800, Oliver Sang wrote:
>> >> 
>> >> On Fri, Jan 06, 2023 at 11:13:15AM +0100, Vlastimil Babka wrote:
>> >> 
>> >> > And if the rate at the parent (has it increased thanks to the
>> >> > DEBUG_PAGEALLOC?) is sufficient to bisect to the truly first bad commit. Thanks!
>> >> 
>> >> got it. Thanks for suggestion!
>> >> 
>> >> since 0af8489b02 is based on v6.1-rc2, we will test (both rectorture and boot)
>> >> with same config upon v6.1-rc2 to see if it's really clean there.
>> >> if so we will use dmesg.invalid_opcode:#[##] to trigger new bisect.
>> >> 
>> >> will keep you updated. Thanks
>> > 
>> > by more tests, we cannot make sure the v6.1-rc2 is clean, so we also checked
>> > v6.1-rc1 and v6.0. from results, we have low confidence that we can make a
>> > successful bisection based on them [1][2]. could you suggest?
>> 
>> So am I reading it right, that the probleam appears to be introduced between
>> v6.0 (0 failures) and v6.1-rc1 (>0 failures)? But agree that with such low
>> incidence, it's hard to bisect.
> 
> yeah, you are reading it right :)
> 
>> 
>> > a further information not sure if it's helpful, [1][2] are both i386 based.
>> > we also tried to run boot tests on x86_64 upon commit 0af8489b02, whatever
>> > with or without CONFIG_DEBUG_PAGEALLOC/CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT,
>> > we never obseve similar issues (also run 999 times).
>> 
>> Yeah it looks very much like something that manifests only on i386 (perhaps
>> only in QEMU as well?) and never x86_64.
>> 
>> What might be interesting then is v6.1-rc1 with further modified config to
>> enabled CONFIG_SLUB_DEBUG and CONFIG_SLUB_DEBUG_ON. Maybe it will catch the
>> culprit sooner. Or maybe it will obscure the bug instead, unfortunately.
> 
> oh, seems, unfortunalately, 'obscure' happen :(
> 
> we enabled CONFIG_SLUB_DEBUG and CONFIG_SLUB_DEBUG_ON, along with
> CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
> 
> boot (we also add the test for v6.2-rc3):
> =========================================================================================
> compiler/kconfig/rootfs/sleep/tbox_group/testcase:
>   gcc-11/i386-randconfig-a012-20221226+CONFIG_DEBUG_PAGEALLOC+CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT+CONFIG_SLUB_DEBUG_ON/debian-11.1-i386-20220923.cgz/1/vm-snb/boot
> 
>             v6.0                    v6.1-rc1                    v6.1-rc2 56d5a2b9ba85a390473e86b4fe4 0af8489b0216fa1dd83e264bef8                    v6.2-rc3
> ---------------- --------------------------- --------------------------- --------------------------- --------------------------- ---------------------------
>        fail:runs  %reproduction    fail:runs  %reproduction    fail:runs  %reproduction    fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
>            |             |             |             |             |             |             |             |             |             |             |
>          43:999          3%          68:999          4%          84:999          6%          99:999          5%          94:999          4%          86:999   dmesg.invalid_opcode:#[##]
>           4:999         -0%           2:999          0%           7:999          0%           8:999          0%           4:999         -0%            :999   dmesg.kernel_BUG_at_include/linux/mm.h
>           3:999          0%           4:999          0%           3:999          0%           7:999          0%           5:999          1%           9:999   dmesg.kernel_BUG_at_include/linux/page-flags.h
>          34:999          3%          61:999          4%          73:999          5%          81:999          5%          85:999          4%          74:999   dmesg.kernel_BUG_at_lib/list_debug.c
>            :999          0%            :999          0%            :999          0%           1:999          0%            :999          0%            :999   dmesg.kernel_BUG_at_mm/internal.h
>           3:999         -0%           1:999         -0%            :999         -0%           2:999         -0%            :999         -0%           2:999   dmesg.kernel_BUG_at_mm/page_alloc.c
>            :999          0%            :999          0%           2:999          0%            :999          0%            :999          0%           2:999   dmesg.kernel_BUG_at_mm/usercopy.c
> 
> 
> since now even the v6.0 is not clean, attached one dmesg FYI

Interesting one, btw.

> [   38.392164][  T193] page:e6c3e400 refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0x9e80
> [   38.393441][  T193] head:e6c3e2c0 order:3 compound_mapcount:0 compound_pincount:0

Yay, struct page pointers no longer hashed for printing, thanks to slub_debug.
Head page is order-3 (8 pages), yet e6c3e400 would be 9th page, nonsense.

> [   38.394491][  T193] flags: 0x10200(slab|head|zone=0)
> [   38.395181][  T193] raw: 00010000 e6c3e2c1 00000122 00000400 00000000 00000000 ffffffff 00000001

We have head flag, compound_head is e6c3e2c0+PageTail, there's again
00000400 in mapping that's specifically set by prep_compound_tail().

> [   38.396386][  T193] raw: 00000000 00000000
> [   38.396956][  T193] head: 00010200 e6ac7904 c01007f0 c0101a40 c9e79a00 00150004 ffffffff 00000001

00150004 are slub's counters suggesting it's been used for a while, we
should not be racing with e6c3e2c0's allocation here!

> [   38.398161][  T193] head: 00000000 00000000
> [   38.398775][  T193] page dumped because: VM_BUG_ON_PAGE(PageTail(page))
> [   38.399724][  T193] page_owner tracks the page as allocated
> [   38.400525][  T193] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 193, tgid 193 (systemd-udevd), ts 38392157598, free_ts 0

This is a dump of e6c3e400, not the "head".

> [   38.403183][  T193]  post_alloc_hook+0x29a/0x320
> [   38.403891][  T193]  get_page_from_freelist+0x226/0x310
> [   38.404644][  T193]  __alloc_pages+0xdd/0x360
> [   38.405256][  T193]  alloc_slab_page+0x12d/0x200
> [   38.405936][  T193]  allocate_slab+0x6a/0x670
> [   38.406586][  T193]  new_slab+0x48/0xc0
> [   38.407127][  T193]  ___slab_alloc+0x9ba/0x1610
> [   38.407942][  T193]  __slab_alloc+0x41/0x70
> [   38.408776][  T193]  __kmalloc_track_caller+0x57b/0x710
> [   38.409538][  T193]  kmalloc_reserve+0x3f/0x100
> [   38.410339][  T193]  __alloc_skb+0xad/0x310
> [   38.410925][  T193]  netlink_sendmsg+0x384/0x6f0
> [   38.411602][  T193]  sock_sendmsg+0x79/0xe0
> [   38.420425][  T193]  ____sys_sendmsg+0x1ef/0x390
> [   38.421069][  T193]  ___sys_sendmsg+0xb0/0x120
> [   38.421726][  T193]  __sys_sendmsg+0x84/0x100
> [   38.422372][  T193] page_owner free stack trace missing
> [   38.423113][  T193] ------------[ cut here ]------------
> [   38.423899][  T193] kernel BUG at include/linux/page-flags.h:319!
> [   38.424778][  T193] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> [   38.425610][  T193] CPU: 1 PID: 193 Comm: systemd-udevd Tainted: G S                 6.0.0 #1 b46e496fab9f0be3d12299e0bf71444aabf1c433
> [   38.427187][  T193] EIP: folio_flags+0x31/0x70
> [   38.427951][  T193] Code: 48 83 05 e8 89 bd c5 01 ba a4 c0 2b c4 55 89 e5 83 15 ec 89 bd c5 00 e8 8d 3f f6 ff 83 05 f8 89 bd c5 01 83 15 fc 89 bd c5 00 <0f> 0b 83 05 00 8a bd c5 01 b8 38 1e a6 c4 83 15 04 8a bd c5 00 e8
> [   38.430464][  T193] EAX: 00000000 EBX: e6c3e400 ECX: 00000000 EDX: ee7ead00
> [   38.431382][  T193] ESI: 00000000 EDI: 000c2cc0 EBP: c995fb30 ESP: c995fb30
> [   38.432350][  T193] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210246
> [   38.433348][  T193] CR0: 80050033 CR2: b5c4f018 CR3: 09cab000 CR4: 000006d0
> [   38.434288][  T193] Call Trace:
> [   38.434737][  T193]  alloc_slab_page+0x7f/0x200

Aha, we have crashed while allocating a fresh new slab page, probably at
__folio_set_slab(folio), the page doesn't have het PageSlab() flag set.

So we have just took the page from a freelist (that modifies lru.next
overlapping compound_head, we can still see the poison in lru.prev),
verified that it's page->mapping is 0 (in page_expected_state()). Few
moments later there's a compound_head set and mapping set in a way that only
prep_compound_tail() does.
But the head that would be connected seems to have done its
prep_compound_tail()'s a while ago (nevermind that it would have to be an
off-by-one). So that just makes no sense.

> [   38.435361][  T193]  allocate_slab+0x6a/0x670
> [   38.436007][  T193]  new_slab+0x48/0xc0
> [   38.436536][  T193]  ___slab_alloc+0x9ba/0x1610
> [   38.437293][  T193]  ? __might_sleep+0x5a/0x100
> [   38.437912][  T193]  ? __alloc_skb+0xad/0x310
> [   38.438513][  T193]  ? lock_release+0xa1/0x190
> [   38.439127][  T193]  __slab_alloc+0x41/0x70
> [   38.439845][  T193]  __kmalloc_track_caller+0x57b/0x710
> [   38.440557][  T193]  ? kmem_cache_alloc+0x2d6/0x690
> [   38.441217][  T193]  ? __alloc_skb+0xad/0x310
> [   38.441821][  T193]  kmalloc_reserve+0x3f/0x100
> [   38.442569][  T193]  ? security_capable+0x40/0xd0
> [   38.443214][  T193]  __alloc_skb+0xad/0x310
> [   38.443801][  T193]  ? ns_capable_common+0x41/0x80
> [   38.444452][  T193]  netlink_sendmsg+0x384/0x6f0
> [   38.445089][  T193]  ? iovec_from_user+0xea/0x1b0
> [   38.445737][  T193]  sock_sendmsg+0x79/0xe0
> [   38.446313][  T193]  ____sys_sendmsg+0x1ef/0x390
> [   38.446955][  T193]  ___sys_sendmsg+0xb0/0x120
> [   38.447586][  T193]  __sys_sendmsg+0x84/0x100
> [   38.448225][  T193]  __do_sys_socketcall+0x20d/0x5e0
> [   38.448908][  T193]  __ia32_sys_socketcall+0x1b/0x30
> [   38.449593][  T193]  __do_fast_syscall_32+0x72/0xd0
> [   38.450264][  T193]  ? syscall_exit_to_user_mode+0x5f/0x90
> [   38.451001][  T193]  ? __do_fast_syscall_32+0x7c/0xd0
> [   38.451702][  T193]  ? __do_fast_syscall_32+0x7c/0xd0
> [   38.452388][  T193]  ? lockdep_hardirqs_on_prepare+0x242/0x400
> [   38.453184][  T193]  ? syscall_exit_to_user_mode+0x5f/0x90
> [   38.453922][  T193]  ? __do_fast_syscall_32+0x7c/0xd0
> [   38.454609][  T193]  ? lockdep_hardirqs_on_prepare+0x242/0x400
> [   38.455390][  T193]  ? syscall_exit_to_user_mode+0x5f/0x90
> [   38.460224][  T193]  ? __do_fast_syscall_32+0x7c/0xd0
> [   38.460922][  T193]  ? irqentry_exit_to_user_mode+0x23/0x30
> [   38.461675][  T193]  do_fast_syscall_32+0x32/0x70
> [   38.462327][  T193]  do_SYSENTER_32+0x15/0x20
> [   38.462936][  T193]  entry_SYSENTER_32+0xa2/0xfb
> [   38.463567][  T193] EIP: 0xb7ed4549
> [   38.464077][  T193] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> [   38.466578][  T193] EAX: ffffffda EBX: 00000010 ECX: bfc1aa40 EDX: 00000000
> [   38.467499][  T193] ESI: b7e0f000 EDI: 00000000 EBP: 00534540 ESP: bfc1aa30
> [   38.468437][  T193] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00200282
> [   38.469442][  T193] Modules linked in:
> [   38.470183][  T193] ---[ end trace 0000000000000000 ]---
> [   38.470910][  T193] EIP: folio_flags+0x31/0x70
> [   38.471704][  T193] Code: 48 83 05 e8 89 bd c5 01 ba a4 c0 2b c4 55 89 e5 83 15 ec 89 bd c5 00 e8 8d 3f f6 ff 83 05 f8 89 bd c5 01 83 15 fc 89 bd c5 00 <0f> 0b 83 05 00 8a bd c5 01 b8 38 1e a6 c4 83 15 04 8a bd c5 00 e8
> [   38.474274][  T193] EAX: 00000000 EBX: e6c3e400 ECX: 00000000 EDX: ee7ead00
> [   38.475233][  T193] ESI: 00000000 EDI: 000c2cc0 EBP: c995fb30 ESP: c995fb30
> [   38.476218][  T193] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210246
> [   38.477246][  T193] CR0: 80050033 CR2: b5c4f018 CR3: 09cab000 CR4: 000006d0
> [   38.478206][  T193] Kernel panic - not syncing: Fatal exception
> [   38.479153][  T193] Kernel Offset: disabled