linux-kernel - Re: [PATCH 8/15] sched: Add parameter sched_mn_power_savings to control MN domain sched policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090825083828.GE20811@alberich.amd.com>
Date:	Tue, 25 Aug 2009 10:38:28 +0200
From:	Andreas Herrmann <andreas.herrmann3@....com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	Gautham Shenoy <ego@...ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Dipankar Sarma <dipankar@...ibm.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	"svaidy@...ux.vnet.ibm.com" <svaidy@...ux.vnet.ibm.com>,
	Arun R Bharadwaj <arun@...ux.vnet.ibm.com>
Subject: Re: [PATCH 8/15] sched: Add parameter sched_mn_power_savings to
	control MN domain sched policy

On Tue, Aug 25, 2009 at 08:41:36AM +0200, Peter Zijlstra wrote:
> On Tue, 2009-08-25 at 08:24 +0200, Andreas Herrmann wrote:
> > On Mon, Aug 24, 2009 at 04:56:18PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2009-08-20 at 15:39 +0200, Andreas Herrmann wrote:
> > > > Signed-off-by: Andreas Herrmann <andreas.herrmann3@....com>
> > > > ---
> > > 
> > > > +#ifdef CONFIG_SCHED_MN
> > > > +	if (!err && mc_capable())
> > > > +		err = sysfs_create_file(&cls->kset.kobj,
> > > > +					&attr_sched_mn_power_savings.attr);
> > > > +#endif
> > > 
> > > *sigh* another crappy sysfs file
> > > 
> > > Guys, can't we come up with anything better than sched_*_power_saving=n?
> > 
> > Thought this is a settled thing. At least there are already two
> > such parameters. So using the existing convention is an obvious
> > thing, no?
> 
> Well, yes its the obvious thing, but I'm questioning whether its the
> best thing ;-)

Ok.

> > > This configuration space is _way_ too large, and now it gets even
> > > crazier.
> > 
> > I don't fully agree.
> > 
> > Having one control interface for each domain level is just one
> > approach. It gives the user full control of scheduling policies.
> > It just might have to be properly documented.
> > 
> > In another mail Vaidy mentioned that
> > 
> >   "at some point we wanted to change the interface to
> >    sched_power_savings=N and and set the flags according to system
> >    topology".
> > 
> > But how you'll decide at which domain level you have to do power
> > savings scheduling?
> 
> The user isn't interested in knowing about domains and cpu topology in
> 99% of the cases, all they want is the machine not burning power like
> there's no tomorrow.
> 
> Users (me including) have no interest exploring a 27-state power
> configuration space in order to find out what works best for them, I'd
> throw up my hands and not bother, really.

If we have only a single knob (with 0==performance, 1==power savings)
then the arch-specific code must properly set the required SD flags
after CPU/topology detection. Only this will allow the scheduler code
to do the right thing.

Imagine you have following "virtual" CPU topology in a server

- more than one thread per core (sharing cache, FPU, whatsoever)
- multiple cores per internal node (sharing cache, maybe same memory channels)
- multiple internal nodes per socket
- multiple sockets

For power savings scheduling you can choose one or more option from

(a) You might save power when first utilizing all threads of one core, but
    degrade performance by not using other cores.

(b) You might save power when first utilizing all cores of an internal node,
    but you degrade performance by not using other internal nodes.

(c) You might save power when first utilizing all internal nodes of one socket
    before using another socket.

With only a single knob, would you switch on (a) and (b) and (c)?
Or do you decide to switch on only (c) because performance degradation
is too high with (a) and (b)?

One solution could be to have
- two sysfs attributes:
  * sched_power_domain, value=one of {SMT, MC, MN}
  * sched_power_level, value=one of {0, 1, 2})
- and an implicit rule that (a) implies (b) and (b) implies (c).
- Note: this implies that its impossible to switch on only (a).

> > Using sched_mn_power_savings=1 is quite different from
> > sched_smt_power_savings=1. Probably, the most power you save if you
> > switch on power saving scheduling on each domain level. I.e. first
> > filling threads of one core, then filling all cores on one internal
> > node, then filling all internal nodes of one socket.
> > 
> > But for performance reasons a user might just want to use power
> > savings in the MN domain. How you'd allow the user to configure that
> > with just one interface? Passing the domain level to
> > sched_power_savings, e.g.  sched_power_savings=MC instead of the power
> > saving level?
> 
> Sure its different, it reduces the configuration space, that gives less
> choice, but does make it accessible.
> 
> Ask joe-admin what he prefers.
> 
> If you're really really worried people might miss the joy of fine tuning
> their power scheduling, then we can provide a dual interface, one for
> dumb people like me, and one for crazy people like you ;-)

> > Besides that, don't we have to keep the user-interface stable, i.e.
> > stick to sched_smt_power_savings and sched_mc_power_savings?
> 
> Don't ever defend crappy stuff with interface stability, that's just
> lame ;-)

Yep, I have no problem with changing interfaces if they are considered
crappy.

But we should have an approriate replacement.


Thanks,

Andreas

-- 
Operating | Advanced Micro Devices GmbH
  System  | Karl-Hammerschmidt-Str. 34, 85609 Dornach b. München, Germany
 Research | Geschäftsführer: Thomas M. McCoy, Giuliano Meroni
  Center  | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
  (OSRC)  | Registergericht München, HRB Nr. 43632


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/