lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc50b62c-813b-45e6-ad7b-227cbd0415ba@paulmck-laptop>
Date:   Wed, 22 Nov 2023 10:43:52 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     sfr@...b.auug.org.au, peterz@...radead.org
Cc:     linux-kernel@...r.kernel.org
Subject: [BUG] RCU CPU stall warning (heads up)

Hello!

Just FYI for the moment.

I hit the following three times out of 15 ten-hour TREE03 rcutorture
runs on next-20231121, which suggests an MTBF of about 50 hours.  This is
new over the past week or so.

The symptom is that the RCU grace-period kthread is marked as running
("R"), but remains stuck in schedule() for the remainder of the run.

My next steps will be to retry on today's -next, and if that reproduces
the bug, attempt to bisect.

But I figured that I should send this out on the off-chance that it is
a known problem.

						Thanx, Paul

[  456.859071] rcu-torture: rtc: 0000000002666dc0 ver: 14234 tfle: 0 rta: 14235 rtaf: 0 rtf: 14225 rtmbe: 0 rtmbkf: 0/10105 rtbe: 0 rtbke: 0 rtbf: 0 rtb: 39 nt: 792049 onoff: 324/324:329/330 18,4182:5,9107 84652:154698 (HZ=1000) barrier: 1987/1988:0 read-exits: 416 nocb-toggles: 0:0
[  456.865422] rcu-torture: Reader Pipe:  1075314187 71402 0 0 0 0 0 0 0 0 0
[  456.867182] rcu-torture: Reader Batch:  1075090167 295422 0 0 0 0 0 0 0 0 0
[  456.868930] rcu-torture: Free-Block Circulation:  14234 14233 14232 14231 14230 14229 14228 14227 14226 14225 0
[  464.018060] rcu: INFO: rcu_preempt self-detected stall on CPU
[  464.019191] rcu: 	11-...!: (1 GPs behind) idle=b41c/0/0x1 softirq=0/0 fqs=88
[  464.020489] rcu: 	(t=21003 jiffies g=302741 q=71371 ncpus=11)
[  464.021549] rcu: rcu_preempt kthread starved for 20595 jiffies! g302741 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[  464.023457] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[  464.025117] rcu: RCU grace-period kthread stack dump:
[  464.026030] task:rcu_preempt     state:R  running task     stack:14896 pid:14    tgid:14    ppid:2      flags:0x00004000
[  464.028032] Call Trace:
[  464.028507]  <TASK>
[  464.028900]  __schedule+0x306/0x880
[  464.029559]  schedule+0x2b/0xb0
[  464.030139]  schedule_timeout+0x86/0x160
[  464.030857]  ? __pfx_process_timeout+0x10/0x10
[  464.031673]  rcu_gp_fqs_loop+0x12a/0x5f0
[  464.032406]  ? __pfx_rcu_gp_kthread+0x10/0x10
[  464.033207]  rcu_gp_kthread+0x191/0x240
[  464.033905]  kthread+0xd3/0x100
[  464.034494]  ? __pfx_kthread+0x10/0x10
[  464.035189]  ret_from_fork+0x2f/0x50
[  464.035837]  ? __pfx_kthread+0x10/0x10
[  464.036534]  ret_from_fork_asm+0x1a/0x30
[  464.037264]  </TASK>
[  464.037665] rcu: Stack dump where RCU GP kthread last ran:
[  464.038671] Sending NMI from CPU 11 to CPUs 0:
[  464.039507] NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20
[  464.040505] CPU: 11 PID: 0 Comm: swapper/11 Not tainted 6.7.0-rc2-next-20231121 #7188
[  464.042806] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[  464.044842] RIP: 0010:default_idle+0xf/0x20
[  464.045618] Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 03 a8 36 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
[  464.048992] RSP: 0000:ffffac70400fbef0 EFLAGS: 00000202
[  464.049952] RAX: ffff9d381f4c0000 RBX: ffff9d380124c740 RCX: 0000000000000000
[  464.051264] RDX: 4000000000000000 RSI: ffffffffb4afa83b RDI: 000000000c18b414
[  464.052557] RBP: 000000000000000b R08: 000000000c18b414 R09: 0000000000000000
[  464.053856] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  464.055169] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  464.056463] FS:  0000000000000000(0000) GS:ffff9d381f4c0000(0000) knlGS:0000000000000000
[  464.057936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  464.058988] CR2: 0000000000000000 CR3: 000000000204e000 CR4: 00000000000006f0
[  464.060314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  464.061607] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  464.062907] Call Trace:
[  464.063378]  <IRQ>
[  464.063754]  ? rcu_dump_cpu_stacks+0xf1/0x150
[  464.064629]  ? rcu_sched_clock_irq+0x56e/0xee0
[  464.065450]  ? __kmem_cache_alloc_node+0x44/0x1e0
[  464.066316]  ? rcu_torture_timer+0x4e/0x80
[  464.067063]  ? __pfx_rcu_torture_timer+0x10/0x10
[  464.067910]  ? kvm_sched_clock_read+0x11/0x20
[  464.068713]  ? sched_clock+0x10/0x30
[  464.069395]  ? sched_clock_cpu+0xf/0x190
[  464.070116]  ? update_process_times+0x45/0x70
[  464.070907]  ? tick_nohz_highres_handler+0xb6/0xe0
[  464.071790]  ? __pfx_tick_nohz_highres_handler+0x10/0x10
[  464.072772]  ? __hrtimer_run_queues+0x10d/0x2a0
[  464.073606]  ? hrtimer_interrupt+0xfe/0x240
[  464.074386]  ? __sysvec_apic_timer_interrupt+0x53/0x140
[  464.075353]  ? sysvec_apic_timer_interrupt+0x6b/0x80
[  464.076270]  </IRQ>
[  464.076655]  <TASK>
[  464.077040]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  464.078021]  ? default_idle+0xf/0x20
[  464.078692]  default_idle_call+0x2d/0xe0
[  464.079420]  do_idle+0x180/0x1c0
[  464.080021]  cpu_startup_entry+0x24/0x30
[  464.080756]  start_secondary+0xf7/0x100
[  464.081479]  secondary_startup_64_no_verify+0x178/0x17b
[  464.082445]  </TASK>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ