[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240425101629.GC21980@noisy.programming.kicks-ass.net>
Date: Thu, 25 Apr 2024 12:16:29 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Christian Loehle <christian.loehle@....com>
Cc: Jens Axboe <axboe@...nel.dk>, linux-kernel@...r.kernel.org,
tglx@...utronix.de, "Rafael J. Wysocki" <rjw@...ysocki.net>,
linux-pm@...r.kernel.org, daniel.lezcano@...aro.org
Subject: Re: [PATCH 4/4] sched/core: split iowait state into two states
On Wed, Apr 24, 2024 at 11:08:42AM +0100, Christian Loehle wrote:
> On 24/04/2024 11:01, Peter Zijlstra wrote:
> > On Tue, Apr 16, 2024 at 06:11:21AM -0600, Jens Axboe wrote:
> >> iowait is a bogus metric, but it's helpful in the sense that it allows
> >> short waits to not enter sleep states that have a higher exit latency
> >> than would've otherwise have been picked for iowait'ing tasks. However,
> >> it's harmless in that lots of applications and monitoring assumes that
> >> iowait is busy time, or otherwise use it as a health metric.
> >> Particularly for async IO it's entirely nonsensical.
> >
> > Let me get this straight, all of this is about working around
> > cpuidle menu governor insaity?
> >
> > Rafael, how far along are we with fully deprecating that thing? Yes it
> > still exists, but should people really be using it still?
> >
>
> Well there is also the iowait boost handling in schedutil and intel_pstate
> which, at least in synthetic benchmarks, does have an effect [1].
Those are cpufreq not cpuidle and at least they don't use nr_iowait. The
original Changelog mentioned idle states, and I hate on menu for using
nr_iowait.
> io_uring (the only user of iowait but not iowait_acct) works around both.
>
> See commit ("8a796565cec3 io_uring: Use io_schedule* in cqring wait")
>
> [1]
> https://lore.kernel.org/lkml/20240304201625.100619-1-christian.loehle@arm.com/#t
So while I agree with most of the short-commings listed in that set,
however that patch is quite terrifying.
I would prefer to start with something a *lot* simpler. How about a tick
driven decay of iops count per task. And that whole step array
*shudder*.
Powered by blists - more mailing lists