[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABk29NsPH-xpTNSB5CcLOHZ-TPVgFa3Dj0O=VO_OL9v+BGMh0Q@mail.gmail.com>
Date: Tue, 7 Jan 2025 12:45:40 -0800
From: Josh Don <joshdon@...gle.com>
To: David Rientjes <rientjes@...gle.com>
Cc: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] sched/debug: Remove need_resched ratelimiting for warnings
On Tue, Jan 7, 2025 at 12:15 PM David Rientjes <rientjes@...gle.com> wrote:
>
> On Tue, 7 Jan 2025, Madadi Vineeth Reddy wrote:
>
> > Any idea why it was initially kept to one warning per hour?
> >
>
> Adding Josh Don who may have insight into this historically.
No idea on the hour default, unfortunately. Almost certainly arbitrary.
> > The possible reasons that come to mind are to prevent excessive logging under
> > high CPU contention, as well as to ensure that a warning logged once an hour
> > indicates the issue is not caused by a short workload spike. Additionally,
> > this rate limit might help avoid impacting system performance due to excessive
> > logging.
> >
> > However, if the default value of latency_warn_once is changed to disable it, it
> > may be acceptable to bypass the rate limit, as it would indicate a preference
> > for logging over performance.
> >
>
> Right, I think this should be entirely up to what the admin configures in
> debugfs. If they elect to disable latency_warn_once, we'll simply emit
> the information as often as they specify in latency_warn_ms and not add
> our own ratelimiting on top. If they have a preference for lots of
> logging, so be it, let's not hide that data.
Your change doesn't reset rq->last_seen_need_resched_ns, so now
without the ratelimit I think we'll get a dump every single tick until
we eventually reschedule.
Another potential benefit to the ratelimit is that if we have
something wedging multiple cpus concurrently, we don't spam the log
(if warn_once is disabled). Though, probably an unlikely occurrence.
I think if you modify the patch to reset last_seen_need_resched_ns
that'll give the behavior you're after.
Best,
Josh
Powered by blists - more mailing lists