lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0550623d-f8f5-ac73-7baa-7506636d952a@oracle.com>
Date:   Fri, 28 Jun 2019 15:14:18 -0700
From:   Subhra Mazumdar <subhra.mazumdar@...cle.com>
To:     Parth Shah <parth@...ux.ibm.com>, linux-kernel@...r.kernel.org
Cc:     peterz@...radead.org, mingo@...hat.com, tglx@...utronix.de,
        steven.sistare@...cle.com, dhaval.giani@...cle.com,
        daniel.lezcano@...aro.org, vincent.guittot@...aro.org,
        viresh.kumar@...aro.org, tim.c.chen@...ux.intel.com,
        mgorman@...hsingularity.net
Subject: Re: [PATCH v3 3/7] sched: rotate the cpu search window for better
 spread


On 6/28/19 11:36 AM, Parth Shah wrote:
> Hi Subhra,
>
> I ran your patch series on IBM POWER systems and this is what I have observed.
>
> On 6/27/19 6:59 AM, subhra mazumdar wrote:
>> Rotate the cpu search window for better spread of threads. This will ensure
>> an idle cpu will quickly be found if one exists.
>>
>> Signed-off-by: subhra mazumdar <subhra.mazumdar@...cle.com>
>> ---
>>   kernel/sched/fair.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index b58f08f..c1ca88e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6188,7 +6188,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>>   	u64 avg_cost, avg_idle;
>>   	u64 time, cost;
>>   	s64 delta;
>> -	int cpu, limit, floor, nr = INT_MAX;
>> +	int cpu, limit, floor, target_tmp, nr = INT_MAX;
>>
>>   	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
>>   	if (!this_sd)
>> @@ -6219,9 +6219,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>>   		}
>>   	}
>>
>> +	if (per_cpu(next_cpu, target) != -1)
>> +		target_tmp = per_cpu(next_cpu, target);
>> +	else
>> +		target_tmp = target;
>> +
>>   	time = local_clock();
>>
>> -	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
>> +	for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
>> +		per_cpu(next_cpu, target) = cpu;
> This leads to a problem of cache hotness.
> AFAIU, in most cases, `target = prev_cpu` of the task being woken up and
> selecting an idle CPU nearest to the prev_cpu is favorable.
> But since this doesn't keep track of last idle cpu per task, it fails to find the
> nearest possible idle CPU in cases when the task is being woken up after other
> scheduled task.
>
I had tested hackbench on SPARC SMT8 (see numbers in cover letter) and
showed improvement with this. Firstly it's a tradeoff between cache effects
vs time spent in searching idle CPU, and both x86 and SPARC numbers showed
tradeoff is worth it. Secondly there is a lot of cache affinity logic
in the beginning of select_idle_sibling. If select_idle_cpu is still called
that means we are past that and want any idle cpu. I don't think waking up
close to the prev cpu is the intention for starting search from there,
rather it is to spread threads across all cpus so that no cpu gets
victimized as there is no atomicity. Prev cpu just acts a good seed to do
the spreading.

Thanks,
Subhra

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ