linux-kernel - Re: [patch] sched: beef up wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <559FCBA3.70703@fb.com>
Date:	Fri, 10 Jul 2015 09:41:55 -0400
From:	Josef Bacik <jbacik@...com>
To:	Mike Galbraith <umgwanakikbuti@...il.com>,
	Peter Zijlstra <peterz@...radead.org>
CC:	<riel@...hat.com>, <mingo@...hat.com>,
	<linux-kernel@...r.kernel.org>, <morten.rasmussen@....com>,
	kernel-team <Kernel-team@...com>
Subject: Re: [patch] sched: beef up wake_wide()

On 07/10/2015 01:19 AM, Mike Galbraith wrote:
> On Thu, 2015-07-09 at 15:26 +0200, Peter Zijlstra wrote:
>> On Wed, Jul 08, 2015 at 08:13:46AM +0200, Mike Galbraith wrote:
>>>   static int wake_wide(struct task_struct *p)
>>>   {
>>> +	unsigned int waker_flips = current->wakee_flips;
>>> +	unsigned int wakee_flips = p->wakee_flips;
>>>   	int factor = this_cpu_read(sd_llc_size);
>>>
>>> +	if (waker_flips < wakee_flips)
>>> +		swap(waker_flips, wakee_flips);
>>
>> This makes the wakee/waker names useless, the end result is more like
>> wakee_flips := client_flips, waker_flips := server_flips.
>
> I settled on master/slave plus hopefully improved comment block.
>
>>> +	if (wakee_flips < factor || waker_flips < wakee_flips * factor)
>>> +		return 0;
>>
>> I don't get the first condition... why would the client ever flip? It
>> only talks to that one server.
>
> (tightening heuristic up a bit by one means or another would be good,
> but "if it ain't broke, don't fix it" applies for this patchlet)
>
>>> @@ -5021,14 +5015,17 @@ select_task_rq_fair(struct task_struct *
>>>   {
>>>   	struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
>>>   	int cpu = smp_processor_id();
>>> +	int new_cpu = prev_cpu;
>>>   	int want_affine = 0;
>>>   	int sync = wake_flags & WF_SYNC;
>>>
>>>   	rcu_read_lock();
>>> +	if (sd_flag & SD_BALANCE_WAKE) {
>>> +		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
>>> +		if (!want_affine)
>>> +			goto select_idle;
>>> +	}
>>
>> So this preserves/makes worse the bug Morten spotted, even without
>> want_affine we should still attempt SD_BALANCE_WAKE if set.
>
> Fixed.  wake_wide() may override want_affine as before, want_affine may
> override other ->flags as before, but a surviving domain selection now
> results in a full balance instead of a select_idle_sibling() call.
>
> sched: beef up wake_wide()
>
> Josef Bacik reported that Facebook sees better performance with their
> 1:N load (1 dispatch/node, N workers/node) when carrying an old patch
> to try very hard to wake to an idle CPU.  While looking at wake_wide(),
> I noticed that it doesn't pay attention to the wakeup of a many partner
> waker, returning 1 only when waking one of its many partners.
>
> Correct that, letting explicit domain flags override the heuristic.
>
> While at it, adjust task_struct bits, we don't need a 64bit counter.
>
> Signed-off-by: Mike Galbraith <umgwanakikbuti@...il.com>
> Tested-by: Josef Bacik <jbacik@...com>


I'll give this new one a whirl and let you know how it goes.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/