[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZxCt/JQkGZHo0n2o@xsang-OptiPlex-9020>
Date: Thu, 17 Oct 2024 14:26:04 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
CC: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Peter Zijlstra
<peterz@...radead.org>, <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
<linux-kernel@...r.kernel.org>, <rcu@...r.kernel.org>,
<oliver.sang@...el.com>
Subject: Re: [peterz-queue:sched/lazy] [sched, x86] 74d850cd4c:
WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
hi, Paul,
On Wed, Oct 16, 2024 at 09:12:35AM -0700, Paul E. McKenney wrote:
> On Wed, Oct 16, 2024 at 05:40:48PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2024-10-15 15:47:20 [+0800], kernel test robot wrote:
> > > Hello,
> > Hi,
> >
> > it took me a while to reproduce this because it does not trigger without
> > the ltp userland and this is not downloaded properly so I had to
> > workaround it. However…
> >
> > > [ 98.006999][ T543] busted-torture: rtc: 00000000639e821e ver: 6796 tfle: 0 rta: 6796 rtaf: 0 rtf: 6787 rtmbe: 1 rtmbkf: 0/0 rtbe: 0 rtbke: 0 rtbf: 0 rtb: 0 nt: 89 barrier: 0/0:0 read-exits: 64 nocb-toggles: 0:0
> > > [ 98.017386][ T543] busted-torture: !!!
> > > [ 98.017662][ T543] ------------[ cut here ]------------
> > > [ 98.019330][ T543] WARNING: CPU: 0 PID: 543 at kernel/rcu/rcutorture.c:2258 rcu_torture_stats_print+0x24c/0x610 [rcutorture]
> > > [ 98.021831][ T543] Modules linked in: rcutorture torture
> > …
> > > [ 98.237025][ T543] ------------[ cut here ]------------
> > > [ 98.238052][ T543] WARNING: CPU: 0 PID: 543 at kernel/rcu/rcutorture.c:2263 rcu_torture_stats_print+0x373/0x610 [rcutorture]
> > > [ 98.240000][ T543] Modules linked in: rcutorture torture
> > …
> >
> > It took me a while to figure out that this test is using
> > rcu_busted_torture_deferred_free() which in turn invokes the callback
> > before the grace period. Buh.
> > So it looks like LAZY preempt triggers this more reliably than the
> > normal preempt version…
>
> Agreed!
>
> Please don't run any rcutorture scenario whose name contains "BUSTED"
> or "busted" unless you are trying to test rcutorture's ability to find
> broken RCU implementations.
thanks for information!
we will refine our test to avoid rcutorture 'busted' related testing.
BTW, for this commit, our bot finished several bisect. we happened to report
based on this 300s-cpuhotplug-busted test since we found the parent is quite
clean:
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type:
vm-snb/rcutorture/debian-12-x86_64-20240206.cgz/x86_64-randconfig-014-20241011/clang-18/300s/cpuhotplug/busted
693ed5cdf46ac062 74d850cd4c5bc26ce83511b4247
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 100% 6:6 rcutorture.cpuhotplug-busted.fail
:6 100% 6:6 dmesg.RIP:rcu_torture_stats_print[rcutorture]
:6 100% 6:6 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
in fact, we found this commit also causes some new issues in other tests, such
like below 300s-cpuhotplug-trivial test
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/torture_type:
vm-snb/rcutorture/debian-11.1-i386-20220923.cgz/x86_64-randconfig-103-20241011/gcc-12/300s/cpuhotplug/trivial
693ed5cdf46ac062 74d850cd4c5bc26ce83511b4247
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
6:6 0% 6:6 dmesg.RIP:rcu_torture_stats_print[rcutorture]
6:6 0% 6:6 dmesg.RIP:synchronize_rcu_trivial[rcutorture]
:6 100% 6:6 dmesg.RIP:torture_sched_setaffinity
6:6 0% 6:6 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_stats_print[rcutorture]
6:6 0% 6:6 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#synchronize_rcu_trivial[rcutorture]
:6 100% 6:6 dmesg.WARNING:at_kernel/rcu/update.c:#torture_sched_setaffinity <------ new stat
the config for this 300s-cpuhotplug-trivial test is attached as
config-6.12.0-rc1-00037-g74d850cd4c5b
which is a randconfig, too
it has diff with config from parent as below [1]
also attached a full dmesg, from it:
[ 92.540279][ T461] ------------[ cut here ]------------
[ 92.540718][ T461] torture_sched_setaffinity: sched_setaffinity(461) returned -22
[ 92.541280][ T461] WARNING: CPU: 0 PID: 461 at kernel/rcu/update.c:535 torture_sched_setaffinity+0x3b/0x4f
[ 92.541998][ T461] Modules linked in: rcutorture torture crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni sha1_ssse3 rapl bochs drm_vram_helper
drm_ttm_helper evbug ttm input_leds drm_kms_helper serio_raw qemu_fw_cfg drm ipv6
[ 92.543610][ T461] CPU: 0 UID: 0 PID: 461 Comm: rcu_torture_fak Tainted: G W T 6.12.0-rc1-00037-g74d850cd4c5b #1
[ 92.544421][ T461] Tainted: [W]=WARN, [T]=RANDSTRUCT
[ 92.544781][ T461] RIP: 0010:torture_sched_setaffinity+0x3b/0x4f
[ 92.545216][ T461] Code: c0 74 29 80 3d cd 7b d5 01 00 75 20 89 c1 c6 05 c2 7b d5 01 01 89 ea 48 c7 c6 c0 92 01 82 48 c7 c7 a6 03 5d 82 e8 2a 26 f7 ff <0
f> 0b 48 63 c3 5b 5d 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc f3 0f
[ 92.546555][ T461] RSP: 0018:ffffc90001cf3e70 EFLAGS: 00210246
[ 92.546995][ T461] RAX: 0000000000000000 RBX: ffffffffffffffea RCX: 0000000000000000
[ 92.547568][ T461] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 92.551112][ T461] RBP: 00000000000001cd R08: 0000000000000000 R09: 0000000000000000
[ 92.551698][ T461] R10: 0000000000000000 R11: 5f65727574726f74 R12: 0000000000000002
[ 92.552290][ T461] R13: 0000000000000000 R14: 0000000000000000 R15: ffffc90001fbbb50
[ 92.552867][ T461] FS: 0000000000000000(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[ 92.553467][ T461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 92.553909][ T461] CR2: 00000000f6a3e040 CR3: 000000011fd2f000 CR4: 00000000000406b0
[ 92.554446][ T461] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 92.555018][ T461] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 92.555555][ T461] Call Trace:
[ 92.555809][ T461] <TASK>
[ 92.556024][ T461] ? __warn+0xa1/0x16e
[ 92.556327][ T461] ? torture_sched_setaffinity+0x3b/0x4f
[ 92.556726][ T461] ? report_bug+0x13b/0x1d1
[ 92.557070][ T461] ? handle_bug+0x53/0x8c
[ 92.557390][ T461] ? exc_invalid_op+0x17/0x74
[ 92.557732][ T461] ? asm_exc_invalid_op+0x1a/0x20
[ 92.558140][ T461] ? torture_sched_setaffinity+0x3b/0x4f
[ 92.558558][ T461] synchronize_rcu_trivial+0x67/0xaf [rcutorture]
[ 92.559068][ T461] ? rcu_lock_release+0xc/0xc [rcutorture]
[ 92.559494][ T461] rcu_torture_fakewriter+0x264/0x469 [rcutorture]
[ 92.559972][ T461] kthread+0xff/0x107
[ 92.560271][ T461] ? list_del_init+0x2e/0x2e
[ 92.560608][ T461] ret_from_fork+0x23/0x35
[ 92.560943][ T461] ? list_del_init+0x2e/0x2e
[ 92.561294][ T461] ret_from_fork_asm+0x11/0x20
[ 92.561665][ T461] </TASK>
[ 92.561899][ T461] irq event stamp: 36775
[ 92.562218][ T461] hardirqs last enabled at (36783): [<ffffffff8111d295>] __up_console_sem+0x4e/0x5a
[ 92.562956][ T461] hardirqs last disabled at (36792): [<ffffffff8111d27a>] __up_console_sem+0x33/0x5a
[ 92.563663][ T461] softirqs last enabled at (36554): [<ffffffff810b148d>] handle_softirqs+0x393/0x3e0
[ 92.564375][ T461] softirqs last disabled at (36539): [<ffffffff810b1594>] __irq_exit_rcu+0x52/0xac
[ 92.565083][ T461] ---[ end trace 0000000000000000 ]---
[1]
--- /pkg/linux/x86_64-randconfig-103-20241011/gcc-12/693ed5cdf46ac062aa7713700c77019c60f0ae41/.config 2024-10-12 20:01:25.345180715 +0800
+++ /pkg/linux/x86_64-randconfig-103-20241011/gcc-12/74d850cd4c5bc26ce83511b4247ea786eb73217c/.config 2024-10-12 16:00:42.428850680 +0800
@@ -121,12 +121,15 @@ CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
# CONFIG_BPF_PRELOAD is not set
# end of BPF subsystem
-CONFIG_PREEMPT_NONE_BUILD=y
-CONFIG_PREEMPT_NONE=y
+CONFIG_PREEMPT_BUILD=y
+CONFIG_ARCH_HAS_PREEMPT_LAZY=y
+# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
+CONFIG_PREEMPT_LAZY=y
# CONFIG_PREEMPT_RT is not set
CONFIG_PREEMPT_COUNT=y
+CONFIG_PREEMPTION=y
# CONFIG_PREEMPT_DYNAMIC is not set
CONFIG_SCHED_CORE=y
@@ -148,6 +151,7 @@ CONFIG_HAVE_SCHED_AVG_IRQ=y
# RCU Subsystem
#
CONFIG_TREE_RCU=y
+CONFIG_PREEMPT_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
@@ -162,6 +166,9 @@ CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
CONFIG_RCU_FANOUT=64
CONFIG_RCU_FANOUT_LEAF=16
+CONFIG_RCU_BOOST=y
+CONFIG_RCU_BOOST_DELAY=500
+# CONFIG_RCU_EXP_KTHREAD is not set
# CONFIG_RCU_NOCB_CPU is not set
# CONFIG_TASKS_TRACE_RCU_READ_MB is not set
# CONFIG_RCU_DOUBLE_CHECK_CB_TIME is not set
@@ -6722,6 +6729,7 @@ CONFIG_SCHEDSTATS=y
# end of Scheduler Debugging
# CONFIG_DEBUG_TIMEKEEPING is not set
+CONFIG_DEBUG_PREEMPT=y
#
# Lock Debugging (spinlocks, mutexes, etc...)
@@ -6820,6 +6828,7 @@ CONFIG_FTRACE=y
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
+# CONFIG_PREEMPT_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_HWLAT_TRACER is not set
# CONFIG_OSNOISE_TRACER is not set
>
> Thanx, Paul
>
View attachment "config-6.12.0-rc1-00037-g74d850cd4c5b" of type "text/plain" (173946 bytes)
Download attachment "dmesg.xz" of type "application/x-xz" (46864 bytes)
Powered by blists - more mailing lists