linux-kernel - Re: [PATCH 0/18] sched: simplified fork, enable load average into LB and power awareness scheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 12 Dec 2012 21:55:52 +0800
From:	Alex Shi <lkml.alex@...il.com>
To:	Amit Kucheria <amit.kucheria@...aro.org>
Cc:	Arjan van de Ven <arjan@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>, Alex Shi <alex.shi@...el.com>,
	rob@...dley.net, mingo@...hat.com, peterz@...radead.org,
	gregkh@...uxfoundation.org, andre.przywara@....com, rjw@...k.pl,
	paul.gortmaker@...driver.com, akpm@...ux-foundation.org,
	paulmck@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	pjt@...gle.com, vincent.guittot@...aro.org,
	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
Subject: Re: [PATCH 0/18] sched: simplified fork, enable load average into LB
 and power awareness scheduling

>>>>
>>>>
>>>> well... it's not always beneficial to group or to spread out
>>>> it depends on cache behavior mostly which is best
>>>
>>>
>>> Let me try to understand what this means: so "performance" above with
>>> 8 threads means that those threads are spread out across more than one
>>> socket, no?
>>>
>>> If so, this would mean that you have a smaller amount of tasks on each
>>> socket, thus the smaller wattage.
>>>
>>> The "powersaving" method OTOH fills up the one socket up to the brim,
>>> thus the slightly higher consumption due to all threads being occupied.
>>>
>>> Is that it?
>>
>>
>> not sure.
>>
>> by and large, power efficiency is the same as performance efficiency, with
>> some twists.
>> or to reword that to be more clear
>> if you waste performance due to something that becomes inefficient, you're
>> wasting power as well.
>> now, you might have some hardware effects that can then save you power...
>> but those effects
>> then first need to overcome the waste from the performance inefficiency...
>> and that almost never happens.
>>
>> for example, if you have two workloads that each fit barely inside the last
>> level cache...
>> it's much more efficient to spread these over two sockets... where each has
>> its own full LLC
>> to use.
>> If you'd group these together, both would thrash the cache all the time and
>> run inefficient --> bad for power.
>>
>> now, on the other hand, if you have two threads of a process that share a
>> bunch of data structures,
>> and you'd spread these over 2 sockets, you end up bouncing data between the
>> two sockets a lot,
>> running inefficient --> bad for power.
>>
>
> Agree with all of the above. However..
>
>> having said all this, if you have to tasks that don't have such cache
>> effects, the most efficient way
>> of running things will be on 2 hyperthreading halves... it's very hard to
>> beat the power efficiency of that.
>
> .. there are alternatives to hyperthreading. On ARM's big.LITTLE
> architecture you could simply schedule them on the LITTLE cores. The
> big cores just can't beat the power efficiency of the LITTLE ones even
> with 'race to halt' that you allude to below. And usecases like mp3
> playback simply don't require the kind of performance that the big
> cores can offer.
>
>> But this assumes the tasks don't compete with resources much on the HT
>> level, and achieve good scaling.
>> and this still has to compete with "race to halt", because if you're done
>> quicker, you can put the memory
>> in self refresh quicker.
>>
>> none of this stuff is easy for humans or computer programs to determine
>> ahead of time... or sometimes even afterwards.
>> heck, even for just performance it's really really hard already, never mind
>> adding power.
>>
>> my personal gut feeling is that we should just optimize this scheduler stuff
>> for performance, and that
>> we're going to be doing quite well on power already if we achieve that.
>
> If Linux is to continue to work efficiently on heterogeneous
> multi-processing platforms, it needs to provide scheduling mechanisms
> that can be exploited as per the demands of the HW architecture.

Linus definitely disagree such ideas. :) So, need to summaries the
logical beyond all hardware.

> example is the "small task packing (and spreading)" for which Vincent
> Guittot has posted a patchset[1] earlier and so has Alex now.

Sure. I just thought my patchset should handled the 'small task
packing' scenario. Could you guy like to have a try?
>
> [1] http://lwn.net/Articles/518834/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/