lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 7 May 2014 21:06:47 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	linux-kernel@...r.kernel.org,
	Frederic Weisbecker <fweisbec@...il.com>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
	Fernando Luis Vazquez Cao <fernando_b1@....ntt.co.jp>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Arjan van de Ven <arjan@...ux.intel.com>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH 3/4 v2] nohz: Fix idle/iowait counts going backwards

On Wed, May 07, 2014 at 08:24:25PM +0200, Denys Vlasenko wrote:
> On 05/07/2014 06:56 PM, Peter Zijlstra wrote:
> > On Wed, May 07, 2014 at 06:49:47PM +0200, Denys Vlasenko wrote:
> >> On 05/07/2014 04:23 PM, Peter Zijlstra wrote:
> >>> On Wed, May 07, 2014 at 03:41:33PM +0200, Denys Vlasenko wrote:
> >>>> With this change, "iowait-ness" of every idle period is decided
> >>>> at the moment it starts:
> >>>> if this CPU's run-queue had tasks waiting on I/O, then this idle
> >>>> period's duration will be added to iowait_sleeptime.
> >>>>
> >>>> This fixes the bug where iowait and/or idle counts could go backwards,
> >>>> but iowait accounting is not precise (it can show more iowait
> >>>> that there really is).
> >>>>
> >>>
> >>> NAK on this, the thing going backwards is a symptom of the bug, not an
> >>> actual bug itself.
> >>
> >> This patch does fix that bug.
> > 
> > Which bug, there's two here:
> > 
> >  1) that NOHZ and !NOHZ iowait accounting aren't identical
> 
> They can hardly be identical, considering how different these modes are.

They can, we've managed it for pretty much everything else, although its
not always easy.

And if you look at the patch I send, that provides the exact moment the
task wakes up, so you can round that to the nearest jiffy boundary and
account appropriately as if it were accounted on the per-cpu timer tick.

Now, there's likely fun corner cases which need more TLC, see
kernel/sched/proc.c for the fun times we had with the global load avg.

> And they don't have to be identical, in fact.

Yes they have to; per definition. CONFIG_NOHZ should have no user
visible difference (except of course the obvious of less interrupts and
ideally energy usage).

> >  2) that iowait accounting in general is a steaming pile of crap
> 
> If you want to nuke iowait (for example, make its counter constant 0),
> I personally won't object. Can't guarantee others won't...

I won't object to a constant 0, but then we have to do it irrespective
of NOHZ. But not necessarily, I think we can have a coherent definition
of iowait, just most likely not per-cpu.

So for UP we have the very simple definition that any idle cycle while
there is a task waiting for io is accounted to iowait.

This definition can be 'trivially' extended to a global iowait,
expensive to compute though.

However, one can argue its not correct to do that trivial extension,
since if there's only 1 task waiting for io, it could at most tie up 1
CPUs worth of idle time (but very emphatically not a specific cpu).

So somewhere in that space is I think a viable way to account iowait,
but the straight fwd implementation (in as far as the eventual
definition will be straight fwd to begin with) will likely be
prohibitively expensive to compute.

Then again, if you look at kernel/sched/proc.c (again) and look at the
bloody mess we had to make for the global load avg accounting to work
for NOHZ there might (or might just not) be a shimmer of hope we can
pull this off in a scalable manner.

Like said, 'fun' problem :-)

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ