linux-kernel - Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost whenever newidle balance is attempted

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5359EEBE.2030808@linux.vnet.ibm.com>
Date:	Fri, 25 Apr 2014 10:42:30 +0530
From:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
To:	Jason Low <jason.low2@...com>
CC:	Peter Zijlstra <peterz@...radead.org>, mingo@...nel.org,
	linux-kernel@...r.kernel.org, daniel.lezcano@...aro.org,
	alex.shi@...aro.org, efault@....de, vincent.guittot@...aro.org,
	morten.rasmussen@....com, aswin@...com, chegu_vinod@...com
Subject: Re: [PATCH 1/3] sched, balancing: Update rq->max_idle_balance_cost
 whenever newidle balance is attempted

Hi Jason,

On 04/25/2014 03:48 AM, Jason Low wrote:
> On Thu, 2014-04-24 at 19:14 +0200, Peter Zijlstra wrote:
>> On Thu, Apr 24, 2014 at 09:53:37AM -0700, Jason Low wrote:
>>>
>>> So I thought that the original rationale (commit 1bd77f2d) behind
>>> updating rq->next_balance in idle_balance() is that, if we are going
>>> idle (!pulled_task), we want to ensure that the next_balance gets
>>> calculated without the busy_factor.
>>>
>>> If the rq is busy, then rq->next_balance gets updated based on
>>> sd->interval * busy_factor. However, when the rq goes from "busy"
>>> to idle, rq->next_balance might still have been calculated under
>>> the assumption that the rq is busy. Thus, if we are going idle, we
>>> would then properly update next_balance without the busy factor
>>> if we update when !pulled_task.
>>>
>>
>> Its late here and I'm confused!
>>
>> So the for_each_domain() loop calculates a new next_balance based on
>> ->balance_interval (which has that busy_factor on, right).
>>
>> But if it fails to pull anything, we'll (potentially) iterate the entire
>> tree up to the largest domain; and supposedly set next_balanced to the
>> largest possible interval.
>>
>> So when we go from busy to idle (!pulled_task), we actually set
>> ->next_balance to the longest interval. Whereas the commit you
>> referenced says it sets it to a shorter while.
>>
>> Not seeing it.
> 
> So this is the way I understand that code:
> 
> In rebalance_domain, next_balance is suppose to be set to the
> minimum of all sd->last_balance + interval so that we properly call
> into rebalance_domains() if one of the domains is due for a balance.
> 
> In the domain traversals:
> 
> 	if (time_after(next_balance, sd->last_balance + interval))
> 		next_balance = sd->last_balance + interval;
> 
> we update next_balance to a new value if the current next_balance
> is after, and we only update next_balance to a smaller value.
> 
> In rebalance_domains, we have code:
> 
> 	interval = sd->balance_interval;
> 	if (idle != CPU_IDLE)
> 		interval *= sd->busy_factor;
> 
> 	...
> 
> 	if (time_after(next_balance, sd->last_balance + interval)) {
> 		next_balance = sd->last_balance + interval;
> 
> 	...
> 
> 	rq->next_balance = next_balance;
> 
> In the CPU_IDLE case, interval would not include the busy factor,
> whereas in the !CPU_IDLE case, we multiply the interval by the
> sd->busy_factor.
> 
> So as an example, if a CPU is not idle and we run this:
> 
> rebalance_domain()
> 	interval = 1 ms;
> 	if (idle != CPU_IDLE)
> 		interval *= 64;
> 
> 	next_balance = sd->last_balance + 64 ms
> 
> 	rq->next_balance = next_balance
> 
> The rq->next_balance is set to a large value since the CPU is not idle.
> 
> Then, let's say the CPU then goes idle 1 ms later. The
> rq->next_balance can be up to 63 ms later, because we computed
> it when the CPU is not idle. Now that we are going idle,
> we would have to wait a long time for the next balance.
> 
> So I believe that the initial reason why rq->next_balance was
> updated in idle_balance is that if the CPU is in the process 
> of going idle (!pulled_task in idle_balance()), we can reset the
> rq->next_balance based on the interval = 1 ms, as oppose to
> having it remain up to 64 ms later (in idle_balance(), interval
> doesn't get multiplied by sd->busy_factor).

I agree with this. However I am concerned with an additional point that
I have mentioned in my reply to Peter's mail on this thread.

Should we verify if rq->next_balance update is independent of
pulled_tasks? sd->balance_interval is changed during load_balance() and
rq->next_balance should perhaps consider that?

Regards
Preeti U Murthy
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/