linux-kernel - Re: Plumbers: Tweaking scheduler policy micro-conf RFP

Open Source and information security mailing list archives

Date:	Fri, 18 May 2012 17:24:45 +0100
From:	Morten Rasmussen <Morten.Rasmussen@....com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	"panto@...oniou-consulting.com" <panto@...oniou-consulting.com>,
	"smuckle@...cinc.com" <smuckle@...cinc.com>,
	Juri Lelli <juri.lelli@...il.com>,
	"mingo@...e.hu" <mingo@...e.hu>,
	"linaro-sched-sig@...ts.linaro.org" 
	<linaro-sched-sig@...ts.linaro.org>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"geoff@...radead.org" <geoff@...radead.org>,
	"efault@....de" <efault@....de>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: Plumbers: Tweaking scheduler policy micro-conf RFP

On Fri, May 18, 2012 at 05:18:17PM +0100, Morten Rasmussen wrote:
> On Tue, May 15, 2012 at 04:35:41PM +0100, Peter Zijlstra wrote:
> > On Tue, 2012-05-15 at 17:05 +0200, Vincent Guittot wrote:
> > > On 15 May 2012 15:00, Peter Zijlstra <peterz@...radead.org> wrote:
> > > > On Tue, 2012-05-15 at 14:57 +0200, Vincent Guittot wrote:
> > > >>
> > > >> Not sure that nobody cares but it's much more that scheduler,
> > > >> load_balance and sched_mc are sensible enough that it's difficult to
> > > >> ensure that a modification will not break everything for someone
> > > >> else.
> > > >
> > > > Thing is, its already broken, there's nothing else to break :-)
> > > >
> > > 
> > > sched_mc is the only power-aware knob in the current scheduler. It's
> > > far from being perfect but it seems to work on some ARM platform at
> > > least. You mentioned at the scheduler mini-summit that we need a
> > > cleaner replacement and everybody has agreed on that point. Is anybody
> > > working on it yet ? 
> > 
> > Apparently not.. 
> > 
> > > and can we discuss at Plumber's what this replacement would look like ?
> > 
> > one knob: sched_balance_policy with tri-state {performance, power, auto}
> 
> Interesting. What would the power policy look like? Would performance
> and power be the two extremes of the power/performance trade-off? In
> that case I would assume that most embedded systems would be using auto.
> 
> > 
> > Where auto should likely look at things like are we on battery and
> > co-ordinate with cpufreq muck or whatever.
> > 
> > Per domain knobs are insane, large multi-state knobs are insane, the
> > existing scheme is therefore insane^2. Can you find a sysad who'd like
> > to explore 3^3=27 states for optimal power/perf for his workload on a
> > simple 2 socket hyper-threaded machine and 3^4=81 state space for 8
> > sockets etc..?
> > 
> > As to the exact policy, I think the current 2 (load-balance + wakeup) is
> > the sensible one..
> > 
> > Also, I still have this pending email from you asking about the topology
> > setup stuff I really need to reply to.. but people keep sending me bugs
> > reports :/
> > 
> > But really short, look at kernel/sched/core.c:default_topology[]
> > 
> > I'd like to get rid of sd_init_* into a single function like
> > sd_numa_init(), this would mean all archs would need to do is provide a
> > simple list of ever increasing masks that match their topology.
> > 
> > To aid this we can add some SDTL_flags, initially I was thinking of:
> > 
> >  SDTL_SHARE_CORE	-- aka SMT
> >  SDTL_SHARE_CACHE	-- LLC cache domain (typically multi-core)
> >  SDTL_SHARE_MEMORY	-- NUMA-node (typically socket)
> > 
> > The 'performance' policy is typically to spread over shared resources so
> > as to minimize contention on these.
> >
> 
> Would it be worth extending this architecture specification to contain
> more information like CPU_POWER for each core? After having experimented
> a bit with scheduling on big.LITTLE my experience is that more
> information about the platform is needed to make proper scheduling
> decisions. So if the topology definition is going to be more generic and
> be set up by the architecture it could be worth adding all the bits of
> information that the scheduler would need to that data structure.
> 
> With such data structure, the scheduler would only need one knob to
> adjust the power/performance trade-off. Any thoughts?
>  

One more thing. I have experimented with PJT's load-tracking patchset
and found it very useful for big.LITTLE scheduling. Is there any plans
for including them?

	Morten

> > If you want to add some power we need some extra flags, maybe something
> > like:
> > 
> >  SDTL_SHARE_POWERLINE	-- power domain (typically socket)
> > 
> > so you know where the boundaries are where you can turn stuff off so you
> > know what/where to pack bits.
> > 
> > Possibly we also add something like:
> > 
> >  SDTL_PERF_SPREAD	-- spread on performance mode
> >  SDTL_POWER_PACK	-- pack on power mode
> > 
> > To over-ride the defaults. But ideally I'd leave those until after we've
> > got the basics working and there is a clear need for them (with a
> > spread/pack default for perf/power aware).
> 
> In my experience power optimized scheduling is quite tricky, especially
> if you still want some level of performance. For heterogeneous
> architecture packing might not be the best solution. Some indication of
> the power/performance profile of each core could be useful.
> 
> Best regards,
> Morten
> 
> 
> _______________________________________________
> linaro-sched-sig mailing list
> linaro-sched-sig@...ts.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-sched-sig
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives