lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1342803425.2583.25.camel@twins>
Date:	Fri, 20 Jul 2012 18:57:05 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Rik van Riel <riel@...hat.com>
Cc:	Linux kernel Mailing List <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>, Avi Kivity <avi@...hat.com>,
	Gleb Natapov <gleb@...hat.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Andi Kleen <ak@...ux.intel.com>
Subject: Re: CFS vs. cpufreq/cstates vs. latency

On Tue, 2012-07-17 at 10:23 -0400, Rik van Riel wrote:
> While tracking down a latency issue with communication between
> KVM guests, we ran into a very interesting issue, an interplay
> of CFS and power saving code.
> 
> About 3/4 of the 230us latency came from CPUs waking up out of
> C-states. Disabling C states reduced the latency to 60us...
> 
> The issue? The communication is between various threads and
> processes, each of which last ran on a CPU that is now in a
> deeper C state. The total latency from that is "CPU wakeup
> latency * NR CPUs woken".
> 
> This problem could be common to many different multi-threaded
> or multi-process applications. It looks like something that
> would be fixable at the scheduler + cpufreq level.

There's tons to be fixed there... we should pull most if not all cpufreq
load accounting into the scheduler, it already does most of it anyway.

Also, you want to do per-task policy tracking, something which isn't
possible with the current per-cpu cpufreq setup.

Sadly some hardware makes this very difficult indeed because it needs a
schedulable context to change the cpu freq/volt etc..

> Specifically, waking up some process requires that the CPU
> which is running the wakeup is already in C0 state. If the
> CPU on which the to-be-woken task ran last is in a deep C
> state, it may make sense to simply run the woken up task
> on the local CPU, not the CPU where it was originally.

That's cpuidle, not cpufreq :-) Yay for more players, but yes, I know
I've talked about this very issue to a number of people.

Same as for cpufreq, the accounting crap should move into the scheduler,
we want to use the idle-time guestimator for different things as well.

> I seem to remember some scheduling code that (for power
> saving reasons) tried running all the tasks on one CPU,
> until that CPU got busy, and then spilled over onto other
> CPUs.
> 
> I do not seem to be able to find that code in recent kernels,
> but I have the feeling that a policy like that (related to
> WAKE_AFFINE scheduling?) could improve this issue.
> 
> As an additional benefit, it has the possibility of further
> improving power saving.

What power saving? I recently ripped all that stuff out because it was
terminally broken and the fixes I got were beyond ugly.

There were some people interested in writing a new power aware balancer
infrastructure, but nothing has been forthcoming as yet. Although it
could be they're waiting for PJT's load tracking patches to hit
mainline.

Anyway, you're conflating issues.. you don't want a power aware
balancer, you just don't want it to be unaware of C-states irrespective
of whatever balance policy we're using.

> What do the scheduler and cpufreq people think about this
> problem?
> 
> Any preferred ways to solve the "N * cpu wakeup latency"
> problem that is plaguing multi-process and multi-threaded
> workloads?

Yeah, unify all the various load tracking and guestimator logic in the
scheduler and go from there ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ