lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Aug 2020 12:51:53 +0200
From:   peterz@...radead.org
To:     Jan Kara <jack@...e.cz>
Cc:     Xianting Tian <tian.xianting@....com>, viro@...iv.linux.org.uk,
        bcrl@...ck.org, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        linux-fsdevel@...r.kernel.org, linux-aio@...ck.org,
        linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
        hannes@...xchg.org
Subject: Re: [PATCH] aio: make aio wait path to account iowait time

On Fri, Aug 28, 2020 at 11:41:29AM +0200, Jan Kara wrote:
> On Fri 28-08-20 11:07:29, peterz@...radead.org wrote:
> > On Fri, Aug 28, 2020 at 02:07:12PM +0800, Xianting Tian wrote:
> > > As the normal aio wait path(read_events() ->
> > > wait_event_interruptible_hrtimeout()) doesn't account iowait time, so use
> > > this patch to make it to account iowait time, which can truely reflect
> > > the system io situation when using a tool like 'top'.
> > 
> > Do be aware though that io_schedule() is potentially far more expensive
> > than regular schedule() and io-wait accounting as a whole is a
> > trainwreck.
> 
> Hum, I didn't know that io_schedule() is that much more expensive. Thanks
> for info.

It's all relative, but it can add up under contention. And since these
storage thingies are getting faster every year, I'm assuming these
schedule rates are increasing along with it.

> > When in_iowait is set schedule() and ttwu() will have to do additional
> > atomic ops, and (much) worse, PSI will take additional locks.
> > 
> > And all that for a number that, IMO, is mostly useless, see the comment
> > with nr_iowait().
> 
> Well, I understand the limited usefulness of the system or even per CPU
> percentage spent in IO wait. However whether a particular task is sleeping
> waiting for IO or not

So strict per-task state is not a problem, and we could easily change
get_task_state() to distinguish between IO-wait or not, basically
duplicate S/D state into an IO-wait variant of the same. Although even
this has ABI implications :-(

> is IMO a useful diagnostic information and there are
> several places in the kernel that take that into account (PSI, hangcheck
> timer, cpufreq, ...).

So PSI is the one I hate most. We spend an aweful lot of time to not
have to take the old rq->lock on wakeup, and PSI reintroduced it for
accounting purposes -- I hate accounting overhead. :/

There's a number of high frequency scheduling workloads where it really
adds up, which is the reason we got rid of it in the first place.

OTOH, PSI gives more sensible numbers, although it goes side-ways when
you introduce affinity masks / cpusets.

The menu-cpufreq gov is known crazy and we're all hard working on
replacing it.

And the tick-sched usage is, iirc, the nohz case of iowait.

> So I don't see that properly accounting that a task
> is waiting for IO is just "expensive random number generator" as you
> mention below :). But I'm open to being educated...

It's the userspace iowait, and in particular the per-cpu iowait numbers
that I hate. Only on UP does any of that make sense.

But we can't remove them because ABI :-(

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ