[<prev] [next>] [day] [month] [year] [list]
Message-ID: <74cdf283-cc4f-40e7-9cee-a90939eba38d@I-love.SAKURA.ne.jp>
Date: Sun, 4 Aug 2024 18:48:10 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Peter Zijlstra <peterz@...radead.org>, Kees Cook <kees@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: LKML <linux-kernel@...r.kernel.org>
Subject: [PATCH] profiling: remove profile=sleep support
The kernel sleep profile is no longer working due to a recursive locking
bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan()
to keep task blocked"); either booting with
profile=sleep
kernel command line option added or executing
# echo -n sleep > /sys/kernel/profiling
after boot causes the system to lock up.
============================================
WARNING: possible recursive locking detected
5.15.0-rc4+ #104 Not tainted
--------------------------------------------
kthreadd/3 is trying to acquire lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70
but task is already holding lock:
ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&p->pi_lock);
lock(&p->pi_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kthreadd/3:
#0: ffffae5ac2833d68 (&x->wait){....}-{2:2}, at: complete+0x18/0x40
#1: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370
#2: ffff93ad29fe9458 (&rq->__lock){-...}-{2:2}, at: try_to_wake_up+0x19f/0x370
stack backtrace:
CPU: 0 PID: 3 Comm: kthreadd Not tainted 5.15.0-rc4+ #104
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
dump_stack_lvl+0x57/0x72
validate_chain.cold+0x122/0x127
__lock_acquire+0x4d2/0x930
lock_acquire+0xc8/0x2f0
? get_wchan+0x32/0x70
? rcu_read_lock_sched_held+0x3f/0x80
_raw_spin_lock_irq+0x47/0x90
? get_wchan+0x32/0x70
get_wchan+0x32/0x70
__update_stats_enqueue_sleeper+0x151/0x430
enqueue_entity+0x4b0/0x520
enqueue_task_fair+0x92/0x6b0
? lock_is_held_type+0xa7/0x120
ttwu_do_activate+0x73/0x140
try_to_wake_up+0x213/0x370
swake_up_locked+0x20/0x50
complete+0x2f/0x40
? __cancel_work_timer+0x1b0/0x1b0
kthread+0xfb/0x180
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
However, since nobody noticed this regression for more than 2 years, let's
remove profile=sleep support based on an assumption that nobody needs this
functionality.
Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked")
Cc: stable@...r.kernel.org # v5.16+
Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
---
Some distro kernel (e.g. 5.14.0-427.28.1.el9_4.x86_64) includes that commit.
I hope that distro agrees with removing this functionality...
Documentation/admin-guide/kernel-parameters.txt | 4 +---
include/linux/profile.h | 1 -
kernel/profile.c | 11 +----------
kernel/sched/stats.c | 10 ----------
4 files changed, 2 insertions(+), 24 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1384c7b59c9..09126bb8cc9f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4798,11 +4798,9 @@
profile= [KNL] Enable kernel profiling via /proc/profile
Format: [<profiletype>,]<number>
- Param: <profiletype>: "schedule", "sleep", or "kvm"
+ Param: <profiletype>: "schedule" or "kvm"
[defaults to kernel profiling]
Param: "schedule" - profile schedule points.
- Param: "sleep" - profile D-state sleeping (millisecs).
- Requires CONFIG_SCHEDSTATS
Param: "kvm" - profile VM exits.
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
diff --git a/include/linux/profile.h b/include/linux/profile.h
index 2fb487f61d12..3f53cdb0c27c 100644
--- a/include/linux/profile.h
+++ b/include/linux/profile.h
@@ -10,7 +10,6 @@
#define CPU_PROFILING 1
#define SCHED_PROFILING 2
-#define SLEEP_PROFILING 3
#define KVM_PROFILING 4
struct proc_dir_entry;
diff --git a/kernel/profile.c b/kernel/profile.c
index ff68d3816182..1fcf1adcf4eb 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -50,20 +50,11 @@ EXPORT_SYMBOL_GPL(prof_on);
int profile_setup(char *str)
{
static const char schedstr[] = "schedule";
- static const char sleepstr[] = "sleep";
static const char kvmstr[] = "kvm";
const char *select = NULL;
int par;
- if (!strncmp(str, sleepstr, strlen(sleepstr))) {
-#ifdef CONFIG_SCHEDSTATS
- force_schedstat_enabled();
- prof_on = SLEEP_PROFILING;
- select = sleepstr;
-#else
- pr_warn("kernel sleep profiling requires CONFIG_SCHEDSTATS\n");
-#endif /* CONFIG_SCHEDSTATS */
- } else if (!strncmp(str, schedstr, strlen(schedstr))) {
+ if (!strncmp(str, schedstr, strlen(schedstr))) {
prof_on = SCHED_PROFILING;
select = schedstr;
} else if (!strncmp(str, kvmstr, strlen(kvmstr))) {
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 78e48f5426ee..eb0cdcd4d921 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -92,16 +92,6 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p,
trace_sched_stat_blocked(p, delta);
- /*
- * Blocking time is in units of nanosecs, so shift by
- * 20 to get a milliseconds-range estimation of the
- * amount of time that the task spent sleeping:
- */
- if (unlikely(prof_on == SLEEP_PROFILING)) {
- profile_hits(SLEEP_PROFILING,
- (void *)get_wchan(p),
- delta >> 20);
- }
account_scheduler_latency(p, delta >> 10, 0);
}
}
--
2.43.5
Powered by blists - more mailing lists