linux-kernel - Re: [patch 2/2] sched/debug: Remove need

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29Nvtje0WB=HC=4UStpk0x3Fo84FjkgAQExpotDUBpxVDog@mail.gmail.com>
Date: Thu, 9 Jan 2025 10:53:28 -0800
From: Josh Don <joshdon@...gle.com>
To: David Rientjes <rientjes@...gle.com>
Cc: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings

On Thu, Jan 9, 2025 at 9:59 AM David Rientjes <rientjes@...gle.com> wrote:
>
> On Tue, 7 Jan 2025, Josh Don wrote:
>
> > > Right, I think this should be entirely up to what the admin configures in
> > > debugfs.  If they elect to disable latency_warn_once, we'll simply emit
> > > the information as often as they specify in latency_warn_ms and not add
> > > our own ratelimiting on top.  If they have a preference for lots of
> > > logging, so be it, let's not hide that data.
> >
> > Your change doesn't reset rq->last_seen_need_resched_ns, so now
> > without the ratelimit I think we'll get a dump every single tick until
> > we eventually reschedule.
> >
> > Another potential benefit to the ratelimit is that if we have
> > something wedging multiple cpus concurrently, we don't spam the log
> > (if warn_once is disabled). Though, probably an unlikely occurrence.
> >
> > I think if you modify the patch to reset last_seen_need_resched_ns
> > that'll give the behavior you're after.
> >
>
> Thanks Josh for pointing this out!  I'm surprised by the implementation
> here where, even though it's only CONFIG_SCHED_DEBUG, we'd be taking the
> function call every tick only to find that the ratelimit makes it a no-op
> :/
>
> Is that worth improving as well?

I think your change takes care of it by removing the ratelimit entirely :)

> Otherwise, please take a look, is this what you had in mind?

I'm realizing now that we'll end up getting multiple splats for a
single very long stall (one per the warning threshold). We could fix
that by using a magic number rather than 0 here (such as U64_MAX), and
then teach resched_latency() to bail out on this value.

Additionally, while on the surface it might appear odd to write to the
rq field not under lock, but we'll never have concurrent read/write to
a given rq's last_seen_need_resched, so that's fine, but just wanted
to mention explicitly.

>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5659,8 +5659,10 @@ void sched_tick(void)
>
>         rq_unlock(rq, &rf);
>
> -       if (sched_feat(LATENCY_WARN) && resched_latency)
> +       if (sched_feat(LATENCY_WARN) && resched_latency) {
>                 resched_latency_warn(cpu, resched_latency);
> +               rq->last_seen_need_resched_ns = 0;
> +       }
>
>         perf_event_task_tick();
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1293,11 +1293,6 @@ void proc_sched_set_task(struct task_struct *p)
>
>  void resched_latency_warn(int cpu, u64 latency)
>  {
> -       static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
> -
> -       if (likely(!__ratelimit(&latency_check_ratelimit)))
> -               return;
> -

I think it is possible some users would want a control to enact some
type of rate-limit even with warn_once disabled, but for now I think
this is perfectly reasonable. We can always add a separate knob later
on to control a minimum cooldown between splats in that case.

>         pr_err("sched: CPU %d need_resched set for > %llu ns (%d ticks) without schedule\n",
>                cpu, latency, cpu_rq(cpu)->ticks_without_resched);
>         dump_stack();