lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202307111535.4823c255-oliver.sang@intel.com>
Date:   Tue, 11 Jul 2023 16:48:06 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Frederic Weisbecker <frederic@...nel.org>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>,
        "Paul E. McKenney" <paulmck@...nel.org>, <rcu@...r.kernel.org>,
        <oliver.sang@...el.com>
Subject: [linus:master] [rcu/nocb]  7625926086:
 WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_try_bypass



Hello,

kernel test robot noticed "WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_try_bypass" on:

commit: 7625926086765123251f765d91fc3a70617d334d ("rcu/nocb: Fix shrinker race against callback enqueuer")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master Linux 6.5-rc1]
[test failed on linux-next/master fe57d0d86f03a8b2afe2869a95477d0ed1824c96]

in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:

	runtime: 300s
	group: group-00

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace):


the WARNING only happens in a low rate (10 out of 204 runs as below).
but since we never observed the issue on parent, and we also noticed
there is a WARN_ON_ONCE in code change, so just report what we observed
in our tests FYI.

5c83cedbaaad6dfe 7625926086765123251f765d91f
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :202          5%          10:204   dmesg.RIP:rcu_nocb_bypass_lock
           :202          5%          10:204   dmesg.RIP:rcu_nocb_try_bypass
           :202          5%          10:204   dmesg.WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_bypass_lock
           :202          5%          10:204   dmesg.WARNING:at_kernel/rcu/tree_nocb.h:#rcu_nocb_try_bypass


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202307111535.4823c255-oliver.sang@intel.com



[ 315.569574][ T16] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree_nocb.h:124 rcu_nocb_try_bypass (kernel/rcu/tree_nocb.h:124 kernel/rcu/tree_nocb.h:513) 
[  315.569582][   T16] Modules linked in: crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel rapl qemu_fw_cfg intel_agp intel_gtt
[  315.569591][   T16] CPU: 1 PID: 16 Comm: rcuop/0 Not tainted 6.4.0-rc1-00002-g762592608676 #1
[ 315.569594][ T16] RIP: 0010:rcu_nocb_try_bypass (kernel/rcu/tree_nocb.h:124 kernel/rcu/tree_nocb.h:513) 
[ 315.569965][ T49] WARNING: CPU: 0 PID: 49 at kernel/rcu/tree_nocb.h:104 rcu_nocb_bypass_lock (kernel/rcu/tree_nocb.h:104 (discriminator 11)) 
[ 315.570472][ T16] Code: 41 0f b6 34 24 48 8b 54 24 08 e9 23 01 00 00 65 8b 05 32 7b f3 7e 3b 83 90 05 00 00 74 02 0f 0b 8b 83 48 02 00 00 85 c0 74 06 <0f> 0b f3 90 eb f0 48 89 df e8 44 7f ff ff 48 8b 3c 24 4c 8b a3 90
All code
========
   0:	41 0f b6 34 24       	movzbl (%r12),%esi
   5:	48 8b 54 24 08       	mov    0x8(%rsp),%rdx
   a:	e9 23 01 00 00       	jmp    0x132
   f:	65 8b 05 32 7b f3 7e 	mov    %gs:0x7ef37b32(%rip),%eax        # 0x7ef37b48
  16:	3b 83 90 05 00 00    	cmp    0x590(%rbx),%eax
  1c:	74 02                	je     0x20
  1e:	0f 0b                	ud2
  20:	8b 83 48 02 00 00    	mov    0x248(%rbx),%eax
  26:	85 c0                	test   %eax,%eax
  28:	74 06                	je     0x30
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	f3 90                	pause
  2e:	eb f0                	jmp    0x20
  30:	48 89 df             	mov    %rbx,%rdi
  33:	e8 44 7f ff ff       	call   0xffffffffffff7f7c
  38:	48 8b 3c 24          	mov    (%rsp),%rdi
  3c:	4c                   	rex.WR
  3d:	8b                   	.byte 0x8b
  3e:	a3                   	.byte 0xa3
  3f:	90                   	nop

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	f3 90                	pause
   4:	eb f0                	jmp    0xfffffffffffffff6
   6:	48 89 df             	mov    %rbx,%rdi
   9:	e8 44 7f ff ff       	call   0xffffffffffff7f52
   e:	48 8b 3c 24          	mov    (%rsp),%rdi
  12:	4c                   	rex.WR
  13:	8b                   	.byte 0x8b
  14:	a3                   	.byte 0xa3
  15:	90                   	nop
[  315.570981][   T49] Modules linked in: crc32c_intel
[  315.571642][   T16] RSP: 0000:ffffc9000008bd60 EFLAGS: 00010002
[  315.572102][   T49]  polyval_clmulni
[  315.572412][   T16]
[  315.572413][   T16] RAX: 0000000000000001 RBX: ffff88842fd2e9c0 RCX: 0000000000000040
[  315.572911][   T49]  polyval_generic
[  315.573917][   T16] RDX: 0000000100000eb8 RSI: ffff88842fd2ebe0 RDI: ffff88842fd2ea60
[  315.574200][   T49]  ghash_clmulni_intel
[  315.574517][   T16] RBP: 0000000100000eb8 R08: 0000000000000001 R09: 0000000000000001
[  315.574715][   T49]  rapl qemu_fw_cfg
[  315.574841][   T16] R10: 0000000000000000 R11: ffff88817e599608 R12: ffffc9000008bdcf
[  315.575254][   T49]  intel_agp
[  315.575450][   T16] R13: 0000000000000001 R14: 0000000000000002 R15: 0000000000000002
[  315.575863][   T49]  intel_gtt
[  315.576078][   T16] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[  315.576491][   T49]
[  315.576694][   T16] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  315.577105][   T49] CPU: 0 PID: 49 Comm: kswapd0 Not tainted 6.4.0-rc1-00002-g762592608676 #1
[  315.577277][   T16] CR2: 00000000f7f6f2e8 CR3: 0000000165fc4000 CR4: 00000000000406a0
[ 315.577689][ T49] RIP: 0010:rcu_nocb_bypass_lock (kernel/rcu/tree_nocb.h:104 (discriminator 11)) 
[  315.577861][   T16] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 315.578340][ T49] Code: 0b 48 8d ab 40 03 00 00 48 89 ef e8 2f 78 af 00 85 c0 75 27 f0 ff 83 48 02 00 00 65 8b 05 85 fb f3 7e 39 83 90 05 00 00 74 02 <0f> 0b 48 89 ef e8 67 76 af 00 f0 ff 8b 48 02 00 00 5b 5d c3 cc 83
All code
========
   0:	0b 48 8d             	or     -0x73(%rax),%ecx
   3:	ab                   	stos   %eax,%es:(%rdi)
   4:	40 03 00             	rex add (%rax),%eax
   7:	00 48 89             	add    %cl,-0x77(%rax)
   a:	ef                   	out    %eax,(%dx)
   b:	e8 2f 78 af 00       	call   0xaf783f
  10:	85 c0                	test   %eax,%eax
  12:	75 27                	jne    0x3b
  14:	f0 ff 83 48 02 00 00 	lock incl 0x248(%rbx)
  1b:	65 8b 05 85 fb f3 7e 	mov    %gs:0x7ef3fb85(%rip),%eax        # 0x7ef3fba7
  22:	39 83 90 05 00 00    	cmp    %eax,0x590(%rbx)
  28:	74 02                	je     0x2c
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	48 89 ef             	mov    %rbp,%rdi
  2f:	e8 67 76 af 00       	call   0xaf769b
  34:	f0 ff 8b 48 02 00 00 	lock decl 0x248(%rbx)
  3b:	5b                   	pop    %rbx
  3c:	5d                   	pop    %rbp
  3d:	c3                   	ret
  3e:	cc                   	int3
  3f:	83                   	.byte 0x83

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	48 89 ef             	mov    %rbp,%rdi
   5:	e8 67 76 af 00       	call   0xaf7671
   a:	f0 ff 8b 48 02 00 00 	lock decl 0x248(%rbx)
  11:	5b                   	pop    %rbx
  12:	5d                   	pop    %rbp
  13:	c3                   	ret
  14:	cc                   	int3
  15:	83                   	.byte 0x83
[  315.578467][   T16] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  315.578809][   T49] RSP: 0018:ffffc9000019bcb0 EFLAGS: 00010002
[  315.579259][   T16] Call Trace:
[  315.579261][   T16]  <TASK>
[  315.579673][   T49]
[  315.579674][   T49] RAX: 0000000000000000 RBX: ffff88842fd2e9c0 RCX: 0000000000000000
[ 315.579980][ T16] ? find_next_bit (arch/x86/events/core.c:816 arch/x86/events/core.c:965) 
[  315.580390][   T49] RDX: ffff88842fd2ebe0 RSI: ffff88842fd2ebe0 RDI: ffff88842fd2ed00
[ 315.581398][ T16] __call_rcu_common (kernel/rcu/tree.c:2644) 
[  315.581812][   T49] RBP: ffff88842fd2ed00 R08: 0000000000000001 R09: 0000000000000001
[ 315.582136][ T16] slab_free+0x53/0xf7 
[  315.582311][   T49] R10: ffff88842fd2ebe0 R11: 0000000000000001 R12: 0000000000000000
[ 315.582469][ T16] rcu_do_batch (kernel/rcu/tree.c:2117) 
[  315.582594][   T49] R13: 0000000100000eb8 R14: ffffc9000019bd88 R15: 0000000000000202
[ 315.583009][ T16] nocb_cb_wait (kernel/rcu/tree_nocb.h:955) 
[  315.583301][   T49] FS:  0000000000000000(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[ 315.583716][ T16] ? nocb_cb_wait (kernel/rcu/tree_nocb.h:1003) 
[  315.583975][   T49] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 315.584388][ T16] rcu_nocb_cb_kthread (kernel/rcu/tree_nocb.h:1010 (discriminator 24)) 
[  315.584686][   T49] CR2: 00000000590fb014 CR3: 0000000179d50000 CR4: 00000000000406b0
[ 315.585100][ T16] kthread (kernel/kthread.c:381) 
[  315.585339][   T49] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 315.585752][ T16] ? kthread_complete_and_exit (kernel/kthread.c:332) 
[  315.585990][   T49] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 315.586462][ T16] ret_from_fork (arch/x86/entry/entry_64.S:314) 
[  315.586710][   T49] Call Trace:
[  315.586711][   T49]  <TASK>
[  315.587058][   T16]  </TASK>
[ 315.587313][ T49] rcu_nocb_flush_bypass (kernel/rcu/tree_nocb.h:381) 
[  315.587727][   T16] irq event stamp: 38560087
[ 315.587934][ T49] lazy_rcu_shrink_scan (kernel/rcu/tree_nocb.h:1367 (discriminator 6)) 
[ 315.588346][ T16] hardirqs last enabled at (38560086): _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) 
[ 315.588643][ T49] do_shrink_slab+0x1d0/0x2ea 
[ 315.589056][ T16] hardirqs last disabled at (38560087): __call_rcu_common (kernel/rcu/tree.c:2629 (discriminator 3)) 
[ 315.589292][ T49] shrink_slab+0xa7/0x136 
[ 315.589466][ T16] softirqs last enabled at (38560080): local_bh_enable (include/linux/bottom_half.h:33) 
[ 315.589624][ T49] shrink_node (arch/x86/include/asm/current.h:41 mm/vmscan.c:565 mm/vmscan.c:6503) 
[ 315.589784][ T16] softirqs last disabled at (38560082): local_bh_disable (include/linux/bottom_half.h:20) 
[ 315.590054][ T49] balance_pgdat (mm/vmscan.c:7296 mm/vmscan.c:7477) 
[  315.590296][   T16] ---[ end trace 0000000000000000 ]---
[ 315.590570][ T49] kswapd (mm/vmscan.c:7737) 
[ 315.594237][ T49] ? balance_pgdat (mm/vmscan.c:7670) 
[ 315.594498][ T49] kthread (kernel/kthread.c:381) 
[ 315.594711][ T49] ? kthread_complete_and_exit (kernel/kthread.c:332) 
[ 315.595014][ T49] ret_from_fork (arch/x86/entry/entry_64.S:314) 
[  315.595260][   T49]  </TASK>
[  315.595427][   T49] irq event stamp: 123174
[ 315.595658][ T49] hardirqs last enabled at (123173): raw_spin_rq_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 kernel/sched/sched.h:1378) 
[ 315.596191][ T49] hardirqs last disabled at (123174): lazy_rcu_shrink_scan (kernel/rcu/tree_nocb.h:1366 (discriminator 3)) 
[ 315.596724][ T49] softirqs last enabled at (122534): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600) 
[ 315.597222][ T49] softirqs last disabled at (122527): __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650) 
[  315.597720][   T49] ---[ end trace 0000000000000000 ]---
[  315.622775][  T212] [main] kernel became tainted! (512/0) Last seed was 144492271
[  315.622787][  T212]
[  315.624666][  T212] trinity: Detected kernel tainting. Last seed was 144492271
[  315.624673][  T212]
[  315.626388][  T212] [main] 1282601 iterations. [F:915248 S:367378 HI:102957]


To reproduce:

        # build kernel
	cd linux
	cp config-6.4.0-rc1-00002-g762592608676 .config
	make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


View attachment "config-6.4.0-rc1-00002-g762592608676" of type "text/plain" (164469 bytes)

View attachment "job-script" of type "text/plain" (4727 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (26660 bytes)

View attachment "trinity" of type "text/plain" (23803 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ