linux-kernel - Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <555D3629.8080002@kernel.org>
Date:	Wed, 20 May 2015 18:34:33 -0700
From:	Andy Lutomirski <luto@...nel.org>
To:	Huang Rui <ray.huang@....com>, Borislav Petkov <bp@...e.de>,
	Len Brown <lenb@...nel.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Thomas Gleixner <tglx@...utronix.de>
CC:	x86@...nel.org, linux-kernel@...r.kernel.org,
	Fengguang Wu <fengguang.wu@...el.com>,
	Aaron Lu <aaron.lu@...el.com>, Tony Li <tony.li@....com>,
	Peter Zijlstra <peterz@...radead.org>,
	John Stultz <john.stultz@...aro.org>
Subject: Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable
 timer

On 05/19/2015 01:01 AM, Huang Rui wrote:
> MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> The cpu core still consumes less power while waiting, and has faster exit
> from waiting than "Halt". This patch implements an interface using the
> kernel parameter "idle=" to configure mwaitx type and timer value.
>
> If "idle=mwaitx", the timeout will be set as the maximum value
> ((2^64 - 1) * TSC cycle).
> If "idle=mwaitx,100", the timeout will be set as 100ns.
> If the processor doesn't support MWAITX, then halt is used.

I think this is wrong way to do this...

> +		x86_idle = mwaitx_idle;

...this is a legacy thing.  The modern idle path is cpuidle_idle_call, I 
believe, that that goes through the cpuidle subsystem, which has little 
to do with any of this.

Where is the MWAITX documentation?  It seems that AMD has failed to 
update the obvious reference:

http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/

 From my vague understanding, MWAITX accepts a 32-bit maximum number of 
TSC ticks to wait.  If that's correct, and it's not too late to change, 
then: AMD, you blew it.  The correct way to do this would be to accept a 
64-bit absolute TSC deadline.

The 32-bit relative timeout model utterly sucks for two reasons. 
Suppose we tried to use it.  We'd have two major issues:

1. We can't sleep more than about 1.5 seconds because we'll overflow the 
deadline.

2. The relative timeout is annoying.  Imagine:

rdtsc
shove the computed timeout into ebx
<-- IRQ here
mwaitx

now we sleep too long.

We can do:

cli
rdtsc
shove the computed timeout into ebx
mov $1,%ecx
mwaitx
sti

but that's annoying and isn't really correct wrt NMIs.

So this sucks.

In any event, I think this is barely useful.

That being said, it might be worth teaching the timer code about a 
magical ideal type of clock that is simultaneously a perfect invariant 
high-res clocksource *and* a very fast (in fact free) wakeup source that 
uses the same time base.  In fact, Sandy Bridge and newer Intel CPUs 
have such a thing: it's called the TSC deadline timer.  I think it's 
much faster to reprogram than other timers, and it ought to avoid a 
whole bunch of complicated messy code that handles the fact that 
crappier timers have their own crappy time bases.

If we did that *and* we had a non-crappy mwaitx, then we could apply an 
optimization: when going idle, we could turn off the TSC deadline timer 
and use mwaitx instead.  This would about an interrupt if the event that 
wakes us is our timer.

In the mean time, I don't really see the point.

John, Peter, Thomas: would it actually make sense to teach the core 
timer/clockevent code about perfect time sources like invariant TSC + 
TSC deadline?  AFAICT right now we're not doing anything particularly 
interesting with the TSC deadline timer.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/