lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4952e9fd-0d85-4d4d-9bf4-ae127d612008@linux.ibm.com>
Date: Tue, 7 Jan 2025 23:44:53 +0530
From: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To: David Rientjes <rientjes@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for
 warnings

Hi David Rientjes,

On 07/01/25 02:09, David Rientjes wrote:
> The need_resched warnings are controlled by two tunables in debugfs:
>  - latency_warn_ms
>  - latency_warn_once
> 
> By default, latency_warn_once is enabled.  Thus, a need_resched warning
> is only emitted once per boot.
> 
> If the user configures this to not be the case and changes the default,
> then allow the user to also control the threshold through latency_warn_ms
> that these warnings trigger.  Do not impose our own ratelimiting on top
> that may make it appear like there are no cases where need_resched is set
> for longer than the threshold.

Any idea why it was initially kept to one warning per hour?

The possible reasons that come to mind are to prevent excessive logging under
high CPU contention, as well as to ensure that a warning logged once an hour
indicates the issue is not caused by a short workload spike. Additionally,
this rate limit might help avoid impacting system performance due to excessive
logging.

However, if the default value of latency_warn_once is changed to disable it, it
may be acceptable to bypass the rate limit, as it would indicate a preference
for logging over performance.

Thoughts?

Thanks,
Madadi Vineeth Reddy

> 
> Signed-off-by: David Rientjes <rientjes@...gle.com>
> ---
>  kernel/sched/debug.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
>  
>  void resched_latency_warn(int cpu, u64 latency)
>  {
> -	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
> -
> -	if (likely(!__ratelimit(&latency_check_ratelimit)))
> -		return;
> -
>  	pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
>  	       cpu, latency, cpu_rq(cpu)->ticks_without_resched);
>  	dump_stack();


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ