[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <555D3629.8080002@kernel.org>
Date: Wed, 20 May 2015 18:34:33 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Huang Rui <ray.huang@....com>, Borislav Petkov <bp@...e.de>,
Len Brown <lenb@...nel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Thomas Gleixner <tglx@...utronix.de>
CC: x86@...nel.org, linux-kernel@...r.kernel.org,
Fengguang Wu <fengguang.wu@...el.com>,
Aaron Lu <aaron.lu@...el.com>, Tony Li <tony.li@....com>,
Peter Zijlstra <peterz@...radead.org>,
John Stultz <john.stultz@...aro.org>
Subject: Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a configurable
timer
On 05/19/2015 01:01 AM, Huang Rui wrote:
> MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> The cpu core still consumes less power while waiting, and has faster exit
> from waiting than "Halt". This patch implements an interface using the
> kernel parameter "idle=" to configure mwaitx type and timer value.
>
> If "idle=mwaitx", the timeout will be set as the maximum value
> ((2^64 - 1) * TSC cycle).
> If "idle=mwaitx,100", the timeout will be set as 100ns.
> If the processor doesn't support MWAITX, then halt is used.
I think this is wrong way to do this...
> + x86_idle = mwaitx_idle;
...this is a legacy thing. The modern idle path is cpuidle_idle_call, I
believe, that that goes through the cpuidle subsystem, which has little
to do with any of this.
Where is the MWAITX documentation? It seems that AMD has failed to
update the obvious reference:
http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/
From my vague understanding, MWAITX accepts a 32-bit maximum number of
TSC ticks to wait. If that's correct, and it's not too late to change,
then: AMD, you blew it. The correct way to do this would be to accept a
64-bit absolute TSC deadline.
The 32-bit relative timeout model utterly sucks for two reasons.
Suppose we tried to use it. We'd have two major issues:
1. We can't sleep more than about 1.5 seconds because we'll overflow the
deadline.
2. The relative timeout is annoying. Imagine:
rdtsc
shove the computed timeout into ebx
<-- IRQ here
mwaitx
now we sleep too long.
We can do:
cli
rdtsc
shove the computed timeout into ebx
mov $1,%ecx
mwaitx
sti
but that's annoying and isn't really correct wrt NMIs.
So this sucks.
In any event, I think this is barely useful.
That being said, it might be worth teaching the timer code about a
magical ideal type of clock that is simultaneously a perfect invariant
high-res clocksource *and* a very fast (in fact free) wakeup source that
uses the same time base. In fact, Sandy Bridge and newer Intel CPUs
have such a thing: it's called the TSC deadline timer. I think it's
much faster to reprogram than other timers, and it ought to avoid a
whole bunch of complicated messy code that handles the fact that
crappier timers have their own crappy time bases.
If we did that *and* we had a non-crappy mwaitx, then we could apply an
optimization: when going idle, we could turn off the TSC deadline timer
and use mwaitx instead. This would about an interrupt if the event that
wakes us is our timer.
In the mean time, I don't really see the point.
John, Peter, Thomas: would it actually make sense to teach the core
timer/clockevent code about perfect time sources like invariant TSC +
TSC deadline? AFAICT right now we're not doing anything particularly
interesting with the TSC deadline timer.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists