linux-kernel - Re: [PATCH v2 4/4] numa: introduce numa cling feature

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b9a58067-194b-221d-111c-ed1f7661976b@linux.alibaba.com>
Date:   Tue, 9 Jul 2019 10:15:37 +0800
From:   王贇 <yun.wang@...ux.alibaba.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Peter Zijlstra <peterz@...radead.org>, hannes@...xchg.org,
        mhocko@...nel.org, vdavydov.dev@...il.com,
        Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, mcgrof@...nel.org, keescook@...omium.org,
        linux-fsdevel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH v2 4/4] numa: introduce numa cling feature

On 2019/7/8 下午4:07, Hillf Danton wrote:
> 
> On Mon, 8 Jul 2019 10:25:27 +0800 Michael Wang wrote:
>> /* Attempt to migrate a task to a CPU on the preferred node. */
>> static void numa_migrate_preferred(struct task_struct *p)
>> {
>> +	bool failed, target;
>> 	unsigned long interval = HZ;
>>
>> 	/* This task has no NUMA fault statistics yet */
>> @@ -1891,8 +2117,12 @@ static void numa_migrate_preferred(struct task_struct *p)
>> 	if (task_node(p) == p->numa_preferred_nid)
>> 		return;
>>
>> +	target = p->numa_preferred_nid;
>> +
> Something instead of bool can be used, too.

Thx for point out :-) to be fix in v3.

> 
>> 	/* Otherwise, try migrate to a CPU on the preferred node */
>> -	task_numa_migrate(p);
>> +	failed = (task_numa_migrate(p) != 0);
>> +
>> +	update_migrate_stat(p, target, failed);
>> }
>>
>> static void
>> @@ -6195,6 +6447,13 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>> 	if ((unsigned)i < nr_cpumask_bits)
>> 		return i;
>>
>> +	/*
>> +	 * Failed to find an idle cpu, wake affine may want to pull but
>> +	 * try stay on prev-cpu when the task cling to it.
>> +	 */
>> +	if (task_numa_cling(p, cpu_to_node(prev), cpu_to_node(target)))
>> +		return prev;
>> +
> Curious to know what test figures would look like without the above line.

It depends on the wake affine condition then, when waker task consider wakee
suitable for pull, wakee may leave the preferred node, or maybe pull to the
preferred node, just randomly and follow the fate.

In mysql case when there are many such wakeup cases and system is very busy,
the observed workloads could be 4:6 or 3:7 distributed in two nodes.

Regards,
Michael Wang

> 
>> 	return target;
>> }
>>
>> Tested on a 2 node box with 96 cpus, do sysbench-mysql-oltp_read_write
>> testing, X mysqld instances created and attached to X cgroups, X sysbench
>> instances then created and attached to corresponding cgroup to test the
>> mysql with oltp_read_write script for 20 minutes, average eps show:
>>
>> 				origin		ng + cling
>> 4 instances each 24 threads	7545.28		7790.49		+3.25%
>> 4 instances each 48 threads	9359.36		9832.30		+5.05%
>> 4 instances each 72 threads	9602.88		10196.95	+6.19%
>>
>> 8 instances each 24 threads	4478.82		4508.82		+0.67%
>> 8 instances each 48 threads	5514.90		5689.93		+3.17%
>> 8 instances each 72 threads	5582.19		5741.33		+2.85%
>