lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202301170941.49728982-oliver.sang@intel.com>
Date:   Tue, 17 Jan 2023 15:10:05 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jann Horn <jannh@...gle.com>,
        Youquan Song <youquan.song@...el.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Jan Kara <jack@...e.cz>, John Hubbard <jhubbard@...dia.com>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        "Matthew Wilcox" <willy@...radead.org>,
        Michal Hocko <mhocko@...nel.org>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        <linux-mm@...ck.org>, Vlastimil Babka <vbabka@...e.cz>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>,
        Feng Tang <feng.tang@...el.com>,
        Fengwei Yin <fengwei.yin@...el.com>
Subject: [linus:master] [hugetlb]  7118fc2906: kernel_BUG_at_lib/list_debug.c


+Vlastimil Babka, Hyeonggon Yoo, Feng Tang and Fengwei Yin

Hi, Mike Kravetz,

we reported
"[linus:master] [mm, slub] 0af8489b02: kernel_BUG_at_include/linux/mm.h" [1]

Vlastimil, Hyeonggon, Feng and Fengwei gave us a lot of great guidances based on
it, and, perticularly, after enabling below config per Vlastimil's suggestion
  CONFIG_DEBUG_PAGEALLOC
  CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT
  CONFIG_SLUB_DEBUG
  CONFIG_SLUB_DEBUG_ON
by more tests, we realized the "0af8489b02" is not the real culprit.

the new bisection was triggered and finally it pointed to this "7118fc2906".

though reporting for different issues
("kernel_BUG_at_include/linux/mm.h" for 0af8489b02 vs.
"kernel_BUG_at_lib/list_debug.c" for this commit),
Feng and Fengwei helped further to confirm they are similar.
They will supply more technical wise analysis later.

please be noted the issues are not always happening
(~10% on this commit or 0af8489b02)

=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
  gcc-11/i386-randconfig-a012-20221226+CONFIG_DEBUG_PAGEALLOC+CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT+CONFIG_SLUB_DEBUG_ON/debian-11.1-i386-20220923.cgz/1/vm-snb/boot

48b8d744ea841b8a 7118fc2906e2925d7edb5ed9c8a 0af8489b0216fa1dd83e264bef8
---------------- --------------------------- ---------------------------
       fail:runs  %reproduction    fail:runs  %reproduction    fail:runs
           |             |             |             |             |
           :999         10%          97:999          9%          94:999   dmesg.invalid_opcode:#[##]
           :999          0%            :999          0%           4:999   dmesg.kernel_BUG_at_include/linux/mm.h
           :999          0%           2:999          1%           5:999   dmesg.kernel_BUG_at_include/linux/page-flags.h
           :999          9%          90:999          9%          85:999   dmesg.kernel_BUG_at_lib/list_debug.c
           :999          0%           4:999          0%            :999   dmesg.kernel_BUG_at_mm/page_alloc.c
           :999          0%           1:999          0%            :999   dmesg.kernel_BUG_at_mm/slub.c

[1] https://lore.kernel.org/all/202212312021.bc1efe86-oliver.sang@intel.com/


below is the detail report.


Greeting,

FYI, we noticed kernel_BUG_at_lib/list_debug.c due to commit (built with gcc-11):

commit: 7118fc2906e2925d7edb5ed9c8a57f2a5f23b849 ("hugetlb: address ref count racing in prep_compound_gigantic_page")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linux-next/master c12e2e5b76b2e739ccdf196bee960412b45d5f85]

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202301170941.49728982-oliver.sang@intel.com


[   31.031172][  T210] ------------[ cut here ]------------
[   31.032147][  T210] kernel BUG at lib/list_debug.c:54!
[   31.033124][  T210] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[   31.034237][  T210] CPU: 1 PID: 210 Comm: systemd-udevd Tainted: G S                5.13.0-00219-g7118fc2906e2 #1
[ 31.036108][ T210] EIP: __list_del_entry_valid.cold (lib/list_debug.c:54 (discriminator 3)) 
[ 31.037237][ T210] Code: 01 89 54 24 08 c7 04 24 08 c7 04 24 83 15 ec 4b 83 15 ec 4b 61 80 f9 ff 61 80 f9 ff 9a c5 01 83 9a c5 01 83 c5 00 0f 0b c5 00 <0f> 0b 9a c5 01 83 9a c5 01 83 c5 00 83 05 c5 00 83 05 01 b8 44 f4
All code
========
   0:	01 89 54 24 08 c7    	add    %ecx,-0x38f7dbac(%rcx)
   6:	04 24                	add    $0x24,%al
   8:	08 c7                	or     %al,%bh
   a:	04 24                	add    $0x24,%al
   c:	83 15 ec 4b 83 15 ec 	adcl   $0xffffffec,0x15834bec(%rip)        # 0x15834bff
  13:	4b 61                	rex.WXB (bad) 
  15:	80 f9 ff             	cmp    $0xff,%cl
  18:	61                   	(bad)  
  19:	80 f9 ff             	cmp    $0xff,%cl
  1c:	9a                   	(bad)  
  1d:	c5 01 83             	(bad)  
  20:	9a                   	(bad)  
  21:	c5 01 83             	(bad)  
  24:	c5 00 0f             	(bad)  
  27:	0b c5                	or     %ebp,%eax
  29:*	00 0f                	add    %cl,(%rdi)		<-- trapping instruction
  2b:	0b 9a c5 01 83 9a    	or     -0x657cfe3b(%rdx),%ebx
  31:	c5 01 83             	(bad)  
  34:	c5 00 83             	(bad)  
  37:	05 c5 00 83 05       	add    $0x58300c5,%eax
  3c:	01                   	.byte 0x1
  3d:	b8                   	.byte 0xb8
  3e:	44 f4                	rex.R hlt 

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	9a                   	(bad)  
   3:	c5 01 83             	(bad)  
   6:	9a                   	(bad)  
   7:	c5 01 83             	(bad)  
   a:	c5 00 83             	(bad)  
   d:	05 c5 00 83 05       	add    $0x58300c5,%eax
  12:	01                   	.byte 0x1
  13:	b8                   	.byte 0xb8
  14:	44 f4                	rex.R hlt 
[   31.044796][  T210] EAX: 00000044 EBX: e6d0e564 ECX: 00000000 EDX: 00000001
[   31.046040][  T210] ESI: ee7f4360 EDI: ee7f4328 EBP: ca1b795c ESP: ca1b7950
[   31.047314][  T210] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00210046
[   31.048702][  T210] CR0: 80050033 CR2: 005d5dbc CR3: 0a287000 CR4: 000406d0
[   31.049987][  T210] Call Trace:
[ 31.050583][ T210] __rmqueue_pcplist (include/linux/list.h:132 include/linux/list.h:146 mm/page_alloc.c:3644) 
[ 31.051469][ T210] rmqueue_pcplist+0x13c/0x3a0 
[ 31.052536][ T210] ? rmqueue_pcplist+0x4b/0x3a0 
[ 31.053479][ T210] rmqueue+0x323/0xd20 
[ 31.054230][ T210] get_page_from_freelist (mm/page_alloc.c:4162) 
[ 31.055219][ T210] __alloc_pages (mm/page_alloc.c:5374) 
[ 31.055998][ T210] allocate_slab (include/linux/gfp.h:558 include/linux/gfp.h:572 include/linux/gfp.h:585 mm/slub.c:1702 mm/slub.c:1842) 
[ 31.056834][ T210] new_slab (mm/slub.c:1907) 
[ 31.057538][ T210] new_slab_objects (mm/slub.c:2652) 
[ 31.058396][ T210] ___slab_alloc+0xf8/0x520 
[ 31.059340][ T210] ? lock_release (kernel/locking/lockdep.c:5534) 
[ 31.060073][ T210] ? __d_alloc (fs/dcache.c:1745) 
[ 31.060817][ T210] ? rcu_read_unlock (include/linux/rcupdate.h:272 (discriminator 7) include/linux/rcupdate.h:711 (discriminator 7)) 
[ 31.061621][ T210] ? get_obj_cgroup_from_current (mm/memcontrol.c:2931) 
[ 31.062657][ T210] __slab_alloc+0x9b/0x100 
[ 31.063622][ T210] ? __d_alloc (fs/dcache.c:1745) 
[ 31.064402][ T210] kmem_cache_alloc (mm/slub.c:2936 mm/slub.c:2978 mm/slub.c:2983) 
[ 31.065230][ T210] ? __d_alloc (fs/dcache.c:1745) 
[ 31.065990][ T210] __d_alloc (fs/dcache.c:1745) 
[ 31.066733][ T210] d_alloc (fs/dcache.c:1824) 
[ 31.067415][ T210] d_alloc_parallel (fs/dcache.c:2575) 
[ 31.068288][ T210] ? __init_waitqueue_head (kernel/sched/wait.c:13) 
[ 31.069218][ T210] __lookup_slow (fs/namei.c:1615) 
[ 31.070026][ T210] lookup_slow (fs/namei.c:1646) 
[ 31.070786][ T210] walk_component (fs/namei.c:1942) 
[ 31.071614][ T210] ? inode_permission (fs/namei.c:522) 
[ 31.072484][ T210] link_path_walk (fs/namei.c:2269) 
[ 31.073324][ T210] path_openat (fs/namei.c:3490 (discriminator 2)) 
[ 31.074100][ T210] ? __lock_acquired (kernel/locking/lockdep.c:5723) 
[ 31.074949][ T210] do_filp_open (fs/namei.c:3521) 
[ 31.075754][ T210] do_sys_openat2 (fs/open.c:1188) 
[ 31.076615][ T210] do_sys_open (fs/open.c:1203) 
[ 31.077356][ T210] __ia32_sys_openat (fs/open.c:1214) 
[ 31.078206][ T210] __do_fast_syscall_32 (arch/x86/entry/common.c:78 arch/x86/entry/common.c:143) 
[ 31.079126][ T210] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4109 kernel/locking/lockdep.c:4169) 
[ 31.080178][ T210] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 19)) 
[ 31.081085][ T210] ? __call_rcu (kernel/rcu/tree.c:3072 (discriminator 1)) 
[ 31.081899][ T210] ? __fput (fs/file_table.c:58 fs/file_table.c:298) 
[ 31.082630][ T210] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4109 kernel/locking/lockdep.c:4169) 
[ 31.083685][ T210] ? syscall_exit_to_user_mode (kernel/entry/common.c:132 kernel/entry/common.c:304) 
[ 31.084726][ T210] ? __do_fast_syscall_32 (arch/x86/entry/common.c:147) 
[ 31.085681][ T210] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4109 kernel/locking/lockdep.c:4169) 
[ 31.086738][ T210] ? irqentry_exit_to_user_mode (kernel/entry/common.c:132 kernel/entry/common.c:317) 
[ 31.087741][ T210] do_fast_syscall_32 (arch/x86/entry/common.c:168) 
[ 31.088627][ T210] do_SYSENTER_32 (arch/x86/entry/common.c:211) 
[ 31.089427][ T210] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:872) 
[   31.090288][  T210] EIP: 0xb7f04549
[ 31.090926][ T210] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
All code
========
   0:	03 74 c0 01          	add    0x1(%rax,%rax,8),%esi
   4:	10 05 03 74 b8 01    	adc    %al,0x1b87403(%rip)        # 0x1b8740d
   a:	10 06                	adc    %al,(%rsi)
   c:	03 74 b4 01          	add    0x1(%rsp,%rsi,4),%esi
  10:	10 07                	adc    %al,(%rdi)
  12:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
  16:	10 08                	adc    %cl,(%rax)
  18:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
  1c:	00 00                	add    %al,(%rax)
  1e:	00 00                	add    %al,(%rax)
  20:	00 51 52             	add    %dl,0x52(%rcx)
  23:	55                   	push   %rbp
  24:	89 e5                	mov    %esp,%ebp
  26:	0f 34                	sysenter 
  28:	cd 80                	int    $0x80
  2a:*	5d                   	pop    %rbp		<-- trapping instruction
  2b:	5a                   	pop    %rdx
  2c:	59                   	pop    %rcx
  2d:	c3                   	retq   
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	8d 76 00             	lea    0x0(%rsi),%esi
  35:	58                   	pop    %rax
  36:	b8 77 00 00 00       	mov    $0x77,%eax
  3b:	cd 80                	int    $0x80
  3d:	90                   	nop
  3e:	8d                   	.byte 0x8d
  3f:	76                   	.byte 0x76

Code starting with the faulting instruction
===========================================
   0:	5d                   	pop    %rbp
   1:	5a                   	pop    %rdx
   2:	59                   	pop    %rcx
   3:	c3                   	retq   
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	8d 76 00             	lea    0x0(%rsi),%esi
   b:	58                   	pop    %rax
   c:	b8 77 00 00 00       	mov    $0x77,%eax
  11:	cd 80                	int    $0x80
  13:	90                   	nop
  14:	8d                   	.byte 0x8d
  15:	76                   	.byte 0x76


To reproduce:

        # build kernel
	cd linux
	cp config-5.13.0-00219-g7118fc2906e2 .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



View attachment "config-5.13.0-00219-g7118fc2906e2" of type "text/plain" (139470 bytes)

View attachment "job-script" of type "text/plain" (5541 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (54388 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ