lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 7 Apr 2024 22:06:43 +0800
From: "zhaowenhui (A)" <zhaowenhui8@...wei.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira <bristot@...hat.com>,
	Valentin Schneider <vschneid@...hat.com>, "open list:SCHEDULER"
	<linux-kernel@...r.kernel.org>
Subject: [bug report] WARNING: CPU: 0 PID: 49573 at kernel/sched/rt.c:802
 rq_offline_rt+0x24d/0x260

Hello,
Recently, our machine triggered a warning in __disable_runtime. The 
dmesg are as follow:
[  991.697692] WARNING: CPU: 0 PID: 49573 at kernel/sched/rt.c:802 
rq_offline_rt+0x24d/0x260
[  991.697795] CPU: 0 PID: 49573 Comm: kworker/1:0 Kdump: loaded Not 
tainted 6.9.0-rc1+ #4
[  991.697798] Hardware name: SuperCloud R5210 G12/X12DPi-N6, BIOS 1.1c 
08/30/2021
[  991.697800] Workqueue: events cpuset_hotplug_workfn
[  991.697803] RIP: 0010:rq_offline_rt+0x24d/0x260
[  991.697825] Call Trace:
[  991.697827]  <TASK>
[  991.697830]  ? __warn+0x7c/0x130
[  991.697835]  ? rq_offline_rt+0x24d/0x260
[  991.697837]  ? report_bug+0xf8/0x1e0
[  991.697842]  ? handle_bug+0x3f/0x70
[  991.697858]  set_rq_offline.part.125+0x2d/0x70
[  991.697864]  rq_attach_root+0xda/0x110
[  991.697867]  cpu_attach_domain+0x433/0x860
[  991.697870]  ? psi_task_switch+0x11d/0x260
[  991.697873]  ? __kmalloc_node+0x1dc/0x390
[  991.697877]  ? alloc_cpumask_var_node+0x1b/0x30
[  991.697880]  partition_sched_domains_locked+0x2a8/0x3a0
[  991.697883]  ? css_next_child+0x61/0x80
[  991.697885]  rebuild_sched_domains_locked+0x608/0x800
[  991.697890]  ? percpu_rwsem_wait+0x160/0x160
[  991.697895]  rebuild_sched_domains+0x1b/0x30
[  991.697897]  cpuset_hotplug_workfn+0x4b6/0x1160
[  991.697899]  ? balance_push+0x4e/0x120
[  991.697903]  ? finish_task_switch+0x8d/0x2d0
[  991.697905]  ? __switch_to+0x126/0x4f0
[  991.697909]  process_scheduled_works+0xad/0x430
[  991.697917]  worker_thread+0x105/0x270
[  991.697922]  kthread+0xde/0x110
[  991.697928]  ret_from_fork+0x2d/0x50
[  991.697935]  ret_from_fork_asm+0x11/0x20
[  991.697940]  </TASK>
[  991.697941] ---[ end trace 0000000000000000 ]---

The corresponding code is :
802    WARN_ON_ONCE(want);
Because this WARN_ON_ONCE hasn’t changed from BUG_ON under linux-6.1, it 
will trigger panic in those version.

More information:
1. RT_RUNTIME_SHARE is enabled.
2. We continuously create and remove cpu cgroups. We use cgexec to do 
some tasks like "tree" or "ps" in these cgroups and the rt_runtime_us in 
these cgroups are set to 2000~6000.
3. There are frequent cpu offline/online operations, so it will trigger 
__disable_runtime.

Every time we run these operations after reboot, this warning will 
happen easily.

---
Regards
Zhao Wenhui

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ