linux-kernel - Re: [PATCH] sched/fair: Skip wake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <560AB648.8090009@odin.com>
Date:	Tue, 29 Sep 2015 19:03:20 +0300
From:	Kirill Tkhai <ktkhai@...n.com>
To:	Mike Galbraith <umgwanakikbuti@...il.com>
CC:	<linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH] sched/fair: Skip wake_affine() for core siblings



On 29.09.2015 19:00, Kirill Tkhai wrote:
> 
> 
> On 29.09.2015 17:55, Mike Galbraith wrote:
>> On Mon, 2015-09-28 at 18:36 +0300, Kirill Tkhai wrote:
>>
>>> ---
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 4df37a4..dfbe06b 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4930,8 +4930,13 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
>>>  	int want_affine = 0;
>>>  	int sync = wake_flags & WF_SYNC;
>>>  
>>> -	if (sd_flag & SD_BALANCE_WAKE)
>>> -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
>>> +	if (sd_flag & SD_BALANCE_WAKE) {
>>> +		want_affine = 1;
>>> +		if (cpu == prev_cpu || !cpumask_test_cpu(cpu, tsk_cpus_allowed(p)))
>>> +			goto want_affine;
>>> +		if (wake_wide(p))
>>> +			goto want_affine;
>>> +	}
>>
>> That blew wake_wide() right out of the water.
>>
>> It's not only about things like pgbench.  Drive multiple tasks in a Xen
>> guest (single event channel dom0 -> domu, and no select_idle_sibling()
>> to save the day) via network, and watch workers fail to be all they can
>> be because they keep being stacked up on the irq source.  Load balancing
>> yanks them apart, next irq stacks them right back up.  I met that in
>> enterprise land, thought wake_wide() should cure it, and indeed it did.
> 
> 1)Hm.. The patch makes select_task_rq_fair() to prefer old cpu instead of
> current, doesn't it? We more often don't set affine_sd. So, the skipped
> part of patch (skipped in quote) selects prev_cpu.
> 
> 2)I thought about waking by irq handler and even was going to ask why
> we use affine logic for such wakeups. Device handlers usually aren't
> bound, timers may migrate since NO_HZ logic presents. The only explanation
> I found is unbound timers is very unlikely case (I added statistics printk
> to my local sched_debug to check that). But if we have the situations like
> you described above, don't we have to disable affine logic for in_interrupt()
> cases?
> 
> 3)I ask about just because (being outside of scheduler history) it's a little
> bit strange, we prefer smp_processor_id()'s sd_llc so much. Sync wakeup's
> profit is less or more clear: smp_processor_id()'s sd_llc may contain some
> data, which is interesting for a wakee, and this minimizes cache misses.
> But we do the same in other cases too, and at every migration we loose
> itlb, dtlb... Of course, it requires more accurate patches, then posted

***typo: instruction and data caches

> (not so rude patches).
> 
> Thanks,
> Kirill
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/