lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wog145nn.fsf@yhuang-dev.intel.com>
Date:   Mon, 29 Jul 2019 16:16:28 +0800
From:   "Huang\, Ying" <ying.huang@...el.com>
To:     Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>, Rik van Riel <riel@...hat.com>,
        Mel Gorman <mgorman@...e.de>, <jhladky@...hat.com>,
        <lvenanci@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH RESEND] autonuma: Fix scan period updating

Srikar Dronamraju <srikar@...ux.vnet.ibm.com> writes:

>> >> 
>> >> if (lr_ratio >= NUMA_PERIOD_THRESHOLD)
>> >>     slow down scanning
>> >> else if (sp_ratio >= NUMA_PERIOD_THRESHOLD) {
>> >>     if (NUMA_PERIOD_SLOTS - lr_ratio >= NUMA_PERIOD_THRESHOLD)
>> >>         speed up scanning
>> 
>> Thought about this again.  For example, a multi-threads workload runs on
>> a 4-sockets machine, and most memory accesses are shared.  The optimal
>> situation will be pseudo-interleaving, that is, spreading memory
>> accesses evenly among 4 NUMA nodes.  Where "share" >> "private", and
>> "remote" > "local".  And we should slow down scanning to reduce the
>> overhead.
>> 
>> What do you think about this?
>
> If all 4 nodes have equal access, then all 4 nodes will be active nodes.
>
> From task_numa_fault()
>
> 	if (!priv && !local && ng && ng->active_nodes > 1 &&
> 				numa_is_active_node(cpu_node, ng) &&
> 				numa_is_active_node(mem_node, ng))
> 		local = 1;
>
> Hence all accesses will be accounted as local. Hence scanning would slow
> down.

Yes.  You are right!  Thanks a lot!

There may be another case.  For example, a workload with 9 threads runs
on a 2-sockets machine, and most memory accesses are shared.  7 threads
runs on the node 0 and 2 threads runs on the node 1 based on CPU load
balancing.  Then the 2 threads on the node 1 will have "share" >>
"private" and "remote" >> "local".  But it doesn't help to speed up
scanning.

Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ