[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202301170941.49728982-oliver.sang@intel.com>
Date: Tue, 17 Jan 2023 15:10:05 +0800
From: kernel test robot <oliver.sang@...el.com>
To: Mike Kravetz <mike.kravetz@...cle.com>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
<linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jann Horn <jannh@...gle.com>,
Youquan Song <youquan.song@...el.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Jan Kara <jack@...e.cz>, John Hubbard <jhubbard@...dia.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
"Matthew Wilcox" <willy@...radead.org>,
Michal Hocko <mhocko@...nel.org>,
Muchun Song <songmuchun@...edance.com>,
Andrew Morton <akpm@...ux-foundation.org>,
<linux-mm@...ck.org>, Vlastimil Babka <vbabka@...e.cz>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
Feng Tang <feng.tang@...el.com>,
Fengwei Yin <fengwei.yin@...el.com>
Subject: [linus:master] [hugetlb] 7118fc2906: kernel_BUG_at_lib/list_debug.c
+Vlastimil Babka, Hyeonggon Yoo, Feng Tang and Fengwei Yin
Hi, Mike Kravetz,
we reported
"[linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h" [1]
Vlastimil, Hyeonggon, Feng and Fengwei gave us a lot of great guidances based on
it, and, perticularly, after enabling below config per Vlastimil's suggestion
CONFIG_DEBUG_PAGEALLOC
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
CONFIG_SLUB_DEBUG
CONFIG_SLUB_DEBUG_ON
by more tests, we realized the "0af8489b02" is not the real culprit.
the new bisection was triggered and finally it pointed to this "7118fc2906".
though reporting for different issues
("kernel_BUG_at_include/linux/mm.h" for 0af8489b02 vs.
"kernel_BUG_at_lib/list_debug.c" for this commit),
Feng and Fengwei helped further to confirm they are similar.
They will supply more technical wise analysis later.
please be noted the issues are not always happening
(~10% on this commit or 0af8489b02)
=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
gcc-11/i386-randconfig-a012-20221226+CONFIG_DEBUG_PAGEALLOC+CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT+CONFIG_SLUB_DEBUG_ON/debian-11.1-i386-20220923.cgz/1/vm-snb/boot
48b8d744ea841b8a 7118fc2906e2925d7edb5ed9c8a 0af8489b0216fa1dd83e264bef8
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:999 10% 97:999 9% 94:999 dmesg.invalid_opcode:#[##]
:999 0% :999 0% 4:999 dmesg.kernel_BUG_at_include/linux/mm.h
:999 0% 2:999 1% 5:999 dmesg.kernel_BUG_at_include/linux/page-flags.h
:999 9% 90:999 9% 85:999 dmesg.kernel_BUG_at_lib/list_debug.c
:999 0% 4:999 0% :999 dmesg.kernel_BUG_at_mm/page_alloc.c
:999 0% 1:999 0% :999 dmesg.kernel_BUG_at_mm/slub.c
[1] https://lore.kernel.org/all/202212312021.bc1efe86-oliver.sang@intel.com/
below is the detail report.
Greeting,
FYI, we noticed kernel_BUG_at_lib/list_debug.c due to commit (built with gcc-11):
commit: 7118fc2906e2925d7edb5ed9c8a57f2a5f23b849 ("hugetlb: address ref count racing in prep_compound_gigantic_page")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linux-next/master c12e2e5b76b2e739ccdf196bee960412b45d5f85]
in testcase: boot
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202301170941.49728982-oliver.sang@intel.com
[ 31.031172][ T210] ------------[ cut here ]------------
[ 31.032147][ T210] kernel BUG at lib/list_debug.c:54!
[ 31.033124][ T210] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 31.034237][ T210] CPU: 1 PID: 210 Comm: systemd-udevd Tainted: G S 5.13.0-00219-g7118fc2906e2 #1
[ 31.036108][ T210] EIP: __list_del_entry_valid.cold (lib/list_debug.c:54 (discriminator 3))
[ 31.037237][ T210] Code: 01 89 54 24 08 c7 04 24 08 c7 04 24 83 15 ec 4b 83 15 ec 4b 61 80 f9 ff 61 80 f9 ff 9a c5 01 83 9a c5 01 83 c5 00 0f 0b c5 00 <0f> 0b 9a c5 01 83 9a c5 01 83 c5 00 83 05 c5 00 83 05 01 b8 44 f4
All code
========
0: 01 89 54 24 08 c7 add %ecx,-0x38f7dbac(%rcx)
6: 04 24 add $0x24,%al
8: 08 c7 or %al,%bh
a: 04 24 add $0x24,%al
c: 83 15 ec 4b 83 15 ec adcl $0xffffffec,0x15834bec(%rip) # 0x15834bff
13: 4b 61 rex.WXB (bad)
15: 80 f9 ff cmp $0xff,%cl
18: 61 (bad)
19: 80 f9 ff cmp $0xff,%cl
1c: 9a (bad)
1d: c5 01 83 (bad)
20: 9a (bad)
21: c5 01 83 (bad)
24: c5 00 0f (bad)
27: 0b c5 or %ebp,%eax
29:* 00 0f add %cl,(%rdi) <-- trapping instruction
2b: 0b 9a c5 01 83 9a or -0x657cfe3b(%rdx),%ebx
31: c5 01 83 (bad)
34: c5 00 83 (bad)
37: 05 c5 00 83 05 add $0x58300c5,%eax
3c: 01 .byte 0x1
3d: b8 .byte 0xb8
3e: 44 f4 rex.R hlt
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 9a (bad)
3: c5 01 83 (bad)
6: 9a (bad)
7: c5 01 83 (bad)
a: c5 00 83 (bad)
d: 05 c5 00 83 05 add $0x58300c5,%eax
12: 01 .byte 0x1
13: b8 .byte 0xb8
14: 44 f4 rex.R hlt
[ 31.044796][ T210] EAX: 00000044 EBX: e6d0e564 ECX: 00000000 EDX: 00000001
[ 31.046040][ T210] ESI: ee7f4360 EDI: ee7f4328 EBP: ca1b795c ESP: ca1b7950
[ 31.047314][ T210] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210046
[ 31.048702][ T210] CR0: 80050033 CR2: 005d5dbc CR3: 0a287000 CR4: 000406d0
[ 31.049987][ T210] Call Trace:
[ 31.050583][ T210] __rmqueue_pcplist (include/linux/list.h:132 include/linux/list.h:146 mm/page_alloc.c:3644)
[ 31.051469][ T210] rmqueue_pcplist+0x13c/0x3a0
[ 31.052536][ T210] ? rmqueue_pcplist+0x4b/0x3a0
[ 31.053479][ T210] rmqueue+0x323/0xd20
[ 31.054230][ T210] get_page_from_freelist (mm/page_alloc.c:4162)
[ 31.055219][ T210] __alloc_pages (mm/page_alloc.c:5374)
[ 31.055998][ T210] allocate_slab (include/linux/gfp.h:558 include/linux/gfp.h:572 include/linux/gfp.h:585 mm/slub.c:1702 mm/slub.c:1842)
[ 31.056834][ T210] new_slab (mm/slub.c:1907)
[ 31.057538][ T210] new_slab_objects (mm/slub.c:2652)
[ 31.058396][ T210] ___slab_alloc+0xf8/0x520
[ 31.059340][ T210] ? lock_release (kernel/locking/lockdep.c:5534)
[ 31.060073][ T210] ? __d_alloc (fs/dcache.c:1745)
[ 31.060817][ T210] ? rcu_read_unlock (include/linux/rcupdate.h:272 (discriminator 7) include/linux/rcupdate.h:711 (discriminator 7))
[ 31.061621][ T210] ? get_obj_cgroup_from_current (mm/memcontrol.c:2931)
[ 31.062657][ T210] __slab_alloc+0x9b/0x100
[ 31.063622][ T210] ? __d_alloc (fs/dcache.c:1745)
[ 31.064402][ T210] kmem_cache_alloc (mm/slub.c:2936 mm/slub.c:2978 mm/slub.c:2983)
[ 31.065230][ T210] ? __d_alloc (fs/dcache.c:1745)
[ 31.065990][ T210] __d_alloc (fs/dcache.c:1745)
[ 31.066733][ T210] d_alloc (fs/dcache.c:1824)
[ 31.067415][ T210] d_alloc_parallel (fs/dcache.c:2575)
[ 31.068288][ T210] ? __init_waitqueue_head (kernel/sched/wait.c:13)
[ 31.069218][ T210] __lookup_slow (fs/namei.c:1615)
[ 31.070026][ T210] lookup_slow (fs/namei.c:1646)
[ 31.070786][ T210] walk_component (fs/namei.c:1942)
[ 31.071614][ T210] ? inode_permission (fs/namei.c:522)
[ 31.072484][ T210] link_path_walk (fs/namei.c:2269)
[ 31.073324][ T210] path_openat (fs/namei.c:3490 (discriminator 2))
[ 31.074100][ T210] ? __lock_acquired (kernel/locking/lockdep.c:5723)
[ 31.074949][ T210] do_filp_open (fs/namei.c:3521)
[ 31.075754][ T210] do_sys_openat2 (fs/open.c:1188)
[ 31.076615][ T210] do_sys_open (fs/open.c:1203)
[ 31.077356][ T210] __ia32_sys_openat (fs/open.c:1214)
[ 31.078206][ T210] __do_fast_syscall_32 (arch/x86/entry/common.c:78 arch/x86/entry/common.c:143)
[ 31.079126][ T210] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4109 kernel/locking/lockdep.c:4169)
[ 31.080178][ T210] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 19))
[ 31.081085][ T210] ? __call_rcu (kernel/rcu/tree.c:3072 (discriminator 1))
[ 31.081899][ T210] ? __fput (fs/file_table.c:58 fs/file_table.c:298)
[ 31.082630][ T210] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4109 kernel/locking/lockdep.c:4169)
[ 31.083685][ T210] ? syscall_exit_to_user_mode (kernel/entry/common.c:132 kernel/entry/common.c:304)
[ 31.084726][ T210] ? __do_fast_syscall_32 (arch/x86/entry/common.c:147)
[ 31.085681][ T210] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4109 kernel/locking/lockdep.c:4169)
[ 31.086738][ T210] ? irqentry_exit_to_user_mode (kernel/entry/common.c:132 kernel/entry/common.c:317)
[ 31.087741][ T210] do_fast_syscall_32 (arch/x86/entry/common.c:168)
[ 31.088627][ T210] do_SYSENTER_32 (arch/x86/entry/common.c:211)
[ 31.089427][ T210] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:872)
[ 31.090288][ T210] EIP: 0xb7f04549
[ 31.090926][ T210] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
All code
========
0: 03 74 c0 01 add 0x1(%rax,%rax,8),%esi
4: 10 05 03 74 b8 01 adc %al,0x1b87403(%rip) # 0x1b8740d
a: 10 06 adc %al,(%rsi)
c: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi
10: 10 07 adc %al,(%rdi)
12: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
16: 10 08 adc %cl,(%rax)
18: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
1c: 00 00 add %al,(%rax)
1e: 00 00 add %al,(%rax)
20: 00 51 52 add %dl,0x52(%rcx)
23: 55 push %rbp
24: 89 e5 mov %esp,%ebp
26: 0f 34 sysenter
28: cd 80 int $0x80
2a:* 5d pop %rbp <-- trapping instruction
2b: 5a pop %rdx
2c: 59 pop %rcx
2d: c3 retq
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 8d 76 00 lea 0x0(%rsi),%esi
35: 58 pop %rax
36: b8 77 00 00 00 mov $0x77,%eax
3b: cd 80 int $0x80
3d: 90 nop
3e: 8d .byte 0x8d
3f: 76 .byte 0x76
Code starting with the faulting instruction
===========================================
0: 5d pop %rbp
1: 5a pop %rdx
2: 59 pop %rcx
3: c3 retq
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 8d 76 00 lea 0x0(%rsi),%esi
b: 58 pop %rax
c: b8 77 00 00 00 mov $0x77,%eax
11: cd 80 int $0x80
13: 90 nop
14: 8d .byte 0x8d
15: 76 .byte 0x76
To reproduce:
# build kernel
cd linux
cp config-5.13.0-00219-g7118fc2906e2 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
View attachment "config-5.13.0-00219-g7118fc2906e2" of type "text/plain" (139470 bytes)
View attachment "job-script" of type "text/plain" (5541 bytes)
Download attachment "dmesg.xz" of type "application/x-xz" (54388 bytes)
Powered by blists - more mailing lists