linux-kernel - Re: [patch 2/2] sched/debug: Remove need

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4952e9fd-0d85-4d4d-9bf4-ae127d612008@linux.ibm.com>
Date: Tue, 7 Jan 2025 23:44:53 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: David Rientjes <rientjes@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for
 warnings

Hi David Rientjes,

On 07/01/25 02:09, David Rientjes wrote:
> The need_resched warnings are controlled by two tunables in debugfs:
>  - latency_warn_ms
>  - latency_warn_once
> 
> By default, latency_warn_once is enabled.  Thus, a need_resched warning
> is only emitted once per boot.
> 
> If the user configures this to not be the case and changes the default,
> then allow the user to also control the threshold through latency_warn_ms
> that these warnings trigger.  Do not impose our own ratelimiting on top
> that may make it appear like there are no cases where need_resched is set
> for longer than the threshold.

Any idea why it was initially kept to one warning per hour?

The possible reasons that come to mind are to prevent excessive logging under
high CPU contention, as well as to ensure that a warning logged once an hour
indicates the issue is not caused by a short workload spike. Additionally,
this rate limit might help avoid impacting system performance due to excessive
logging.

However, if the default value of latency_warn_once is changed to disable it, it
may be acceptable to bypass the rate limit, as it would indicate a preference
for logging over performance.

Thoughts?

Thanks,
Madadi Vineeth Reddy

> 
> Signed-off-by: David Rientjes <rientjes@...gle.com>
> ---
>  kernel/sched/debug.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
>  
>  void resched_latency_warn(int cpu, u64 latency)
>  {
> -	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
> -
> -	if (likely(!__ratelimit(&latency_check_ratelimit)))
> -		return;
> -
>  	pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
>  	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
>  	dump_stack();