linux-kernel - Re: [RFC v1] Tunable sched_mc_power

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48649EBA.5010403@firstfloor.org>
Date:	Fri, 27 Jun 2008 10:03:06 +0200
From:	Andi Kleen <andi@...stfloor.org>
To:	dipankar@...ibm.com
CC:	balbir@...ux.vnet.ibm.com,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Suresh B Siddha <suresh.b.siddha@...el.com>,
	Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Vatsa <vatsa@...ux.vnet.ibm.com>,
	Gautham R Shenoy <ego@...ibm.com>
Subject: Re: [RFC v1] Tunable sched_mc_power_savings=n

Dipankar Sarma wrote:
> On Thu, Jun 26, 2008 at 11:37:08PM +0200, Andi Kleen wrote:
>> Dipankar Sarma wrote:
>>
>>> Some workload managers already do that - they provision cpu and memory
>>> resources based on request rates and response times. Such software is
>>> in a better position to make a decision whether they can live with
>>> reduced performance due to power saving mode or not. The point I am
>>> making is the the kernel doesn't have any notion of transactional
>>> performance 
>> The kernel definitely knows about burstiness vs non burstiness at least
>> (although it currently has no long term memory for that). Does it need
>> more than that for this? Anyways if nice levels were used that is not
>> even needed, because it's ok to run niced processes slower.
>>
>> And your workload manager could just nice processes. It should probably
>> do that anyways to tell ondemand you don't need full frequency.
> 
> The current usage of this we are looking requires system-wide
> settings. That means nicing every process running on the system.
> That seems a little messy. 

Is it less messy than the letting applications negotiate
for the best policy by themselves as someone else suggested on the thread?

> Secondly, even if you nice the processes
> they are still going to be spread all over the CPU packages
> running at lower frequencies due to nice. 

My point was that this could be fixed and you could use nice
(or another per process parameter if you prefer)
as an input to load balancer decisions.

> Using nice, you can force lowering of frequency - but you can do that
> using userspace governor as well - no need to mess with process
> priorities.


> We are talking about a different optimization here - something
> that will give more benefits in powersave mode when you have large
> systems.

Yes it's a different optimization (although the over all theme -- power saving
-- is the same), but is there a real reason it cannot be driven from the
same per process heuristics instead of your ugly global sysctl?

>>>>> In a small-scale datacenters, peak and off-peak hour settings can be
>>>>> potentially done through simple cron jobs.  
>>>> Is there any real drawback from only controlling it through nice levels?
>>> In a system with more than a couple of sockets, it is more beneficial
>>> (power-wise) to pack all work in to a small number of processors
>>> and let the other processors go to very low power sleep. Compared 
>>> to running tasks slowly and spreading them all over the processors.
>> You answered a different question?
> 
> The point is that grouping tasks into small number of sockets is
> more effective than nicing which may still spread the tasks all
> over the sockets. 

Sorry you completely misunderstood me. I know the principle
behind the socket grouping.  And yes it's a different mechanism
from cpu frequency scaling.

My point was just that the heuristics
used by one power saving mechanism (ondemand) could be used
for the other too (socket grouping) -- and it would be certainly
a far saner interface than a global sysctl!.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/