linux-kernel - Re: [PATCH 3/3] PM: Introduce Intel PowerClamp Driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121114000259.GK2489@linux.vnet.ibm.com>
Date:	Tue, 13 Nov 2012 16:03:00 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Arjan van de Ven <arjan@...ux.intel.com>
Cc:	Jacob Pan <jacob.jun.pan@...ux.intel.com>,
	Linux PM <linux-pm@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Rafael Wysocki <rafael.j.wysocki@...el.com>,
	Len Brown <len.brown@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...e.hu>,
	Zhang Rui <rui.zhang@...el.com>, Rob Landley <rob@...dley.net>
Subject: Re: [PATCH 3/3] PM: Introduce Intel PowerClamp Driver

On Tue, Nov 13, 2012 at 02:45:11PM -0800, Arjan van de Ven wrote:
> On 11/13/2012 2:23 PM, Paul E. McKenney wrote:
> > On Tue, Nov 13, 2012 at 01:39:22PM -0800, Jacob Pan wrote:
> >> On Tue, 13 Nov 2012 13:16:02 -0800
> >> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> >>
> >>>> Please refer to Documentation/thermal/intel_powerclamp.txt for more
> >>>> details.  
> >>>
> >>> If I read this correctly, this forces a group of CPUs into idle for
> >>> about 600 milliseconds at a time.  This would indeed delay grace
> >>> periods, which could easily result in user complaints.  Also, given
> >>> the default RCU_BOOST_DELAY of 500 milliseconds in kernels enabling
> >>> RCU_BOOST, you would see needless RCU priority boosting.
> >>>
> >> the default idle injection duration is 6ms. we adjust the sleep
> >> interval to ensure idle ratio. So the idle duration stays the same once
> >> set. So would it be safe to delay grace period for this small amount in
> >> exchange for less over head in each injection period?
> > 
> > Ah, 6ms of delay is much better than 600ms.  Should be OK (famous last
> > words!).
> 
> well... power clamping is not "free".
> You're going to lose performance as a trade off for dropping instantaneous power consumption....
> in the measurements we've done comparing various methods.. this one is doing remarkably well.

No argument here.  My concern is not performance in this situation, but
rather in-kernel confusion, particularly any such confusion involving RCU.

And understood, you can get similar effects from virtualization.
For all I know, the virtualization guys might leverage your experience
with power clamping to push for gang scheduling once more.  ;-)

> > For most kernel configuration options, it does use softirq.  And yes,
> > the kthread you are using would yield to softirqs -- but only as long
> > as softirq processing hasn't moved over to ksoftirqd.  Longer term,
> > RCU will be moving from softirq to kthreads, though, and these might be
> > prempted by your powerclamp kthread, depending on priorities.  It looks
> > like you use RT prio 50, which would usually preempt the RCU kthreads
> > (unless someone changed the priorities).
> 
> we tried to pick a "middle of the road" value, so that usages that really really
> want to run, still get to run, but things that are more loose about it, get put on hold.

Makes sense.

> >>> It looks like you could end up with part of the system powerclamped
> >>> in some situations, and with all of it powerclamped in other
> >>> situations. Is that the case, or am I confused?
> >>>
> >> could you explain the part that is partially powerclamped?
> > 
> > Suppose that a given system has two sockets.  Are the two sockets
> > powerclamped independently, or at the same time?  My guess was the
> > former, but looking at the code again, it looks like the latter.
> > So it is a good thing I asked, I guess.  ;-)
> 
> they are clamped together, and they have to.
> you don't get (on the systems where this driver works) any "package" C state unless
> all packages are idle completely.
> And it's these package C states where the real deep power savings happen, that's
> why they are such a juicy target for power clamping ;-)

OK, so the point of clamping all sockets simultaneously is to be able
to power down the electronics surrounding the sockets as well as the
sockets themselves?  If all you cared about was the individual sockets,
I don't see why you couldn't power the sockets down individually rather
than in sync with each other.

Just to make sure I am really understanding what is happening, let's
suppose we have a HZ=1000 system that has a few tasks that occasionally
run at prio 99.  These tasks would run during the clamp interval,
but would (for example) see the jiffies counter remaining at the
value at the beginning of the clamp interval until the end of that
interval, when the jiffies counter would suddenly jump by roughly
six counts, right?

If so, this could cause some (minor) RCU issues, such as RCU
deciding to force quiescent states right at the end of a clamping
interval, even though none of the RCU readers would have had a
chance to do anything in the meantime.  Shouldn't result in a
bug though, just wasted motion.

I think I know, but I feel the need to ask anyway.  Why not tell
RCU about the clamping?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/