lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc4de64d-abd6-2c50-a10c-5b901d604092@google.com>
Date: Thu, 9 Jan 2025 09:59:07 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Josh Don <joshdon@...gle.com>
cc: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, 
    Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
    Juri Lelli <juri.lelli@...hat.com>, 
    Vincent Guittot <vincent.guittot@...aro.org>, 
    Dietmar Eggemann <dietmar.eggemann@....com>, 
    Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, 
    Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, 
    linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for
 warnings

On Tue, 7 Jan 2025, Josh Don wrote:

> > Right, I think this should be entirely up to what the admin configures in
> > debugfs.  If they elect to disable latency_warn_once, we'll simply emit
> > the information as often as they specify in latency_warn_ms and not add
> > our own ratelimiting on top.  If they have a preference for lots of
> > logging, so be it, let's not hide that data.
> 
> Your change doesn't reset rq->last_seen_need_resched_ns, so now
> without the ratelimit I think we'll get a dump every single tick until
> we eventually reschedule.
> 
> Another potential benefit to the ratelimit is that if we have
> something wedging multiple cpus concurrently, we don't spam the log
> (if warn_once is disabled). Though, probably an unlikely occurrence.
> 
> I think if you modify the patch to reset last_seen_need_resched_ns
> that'll give the behavior you're after.
> 

Thanks Josh for pointing this out!  I'm surprised by the implementation 
here where, even though it's only CONFIG_SCHED_DEBUG, we'd be taking the 
function call every tick only to find that the ratelimit makes it a no-op 
:/

Is that worth improving as well?

Otherwise, please take a look, is this what you had in mind?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5659,8 +5659,10 @@ void sched_tick(void)
 
 	rq_unlock(rq, &rf);
 
-	if (sched_feat(LATENCY_WARN) && resched_latency)
+	if (sched_feat(LATENCY_WARN) && resched_latency) {
 		resched_latency_warn(cpu, resched_latency);
+		rq->last_seen_need_resched_ns = 0;
+	}
 
 	perf_event_task_tick();
 
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
 
 void resched_latency_warn(int cpu, u64 latency)
 {
-	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
-
-	if (likely(!__ratelimit(&latency_check_ratelimit)))
-		return;
-
 	pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
 	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
 	dump_stack();

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ