lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 3 Sep 2021 10:13:59 +0530
From:   Bharata B Rao <bharata@....com>
To:     linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     akpm@...ux-foundation.org, kamezawa.hiroyu@...fujitsu.com,
        mgorman@...e.de, Krupa.Ramakrishnan@....com,
        Sadagopan.Srinivasan@....com
Subject: Re: [FIX PATCH 2/2] mm/page_alloc: Use accumulated load when building
 node fallback list


On 8/30/2021 5:46 PM, Bharata B Rao wrote:
> From: Krupa Ramakrishnan <krupa.ramakrishnan@....com>
> 
> In build_zonelists(), when the fallback list is built for the nodes,
> the node load gets reinitialized during each iteration. This results
> in nodes with same distances occupying the same slot in different
> node fallback lists rather than appearing in the intended round-
> robin manner. This results in one node getting picked for allocation
> more compared to other nodes with the same distance.
> 
> As an example, consider a 4 node system with the following distance
> matrix.
> 
> Node 0  1  2  3
> ----------------
> 0    10 12 32 32
> 1    12 10 32 32
> 2    32 32 10 12
> 3    32 32 12 10
> 
> For this case, the node fallback list gets built like this:
> 
> Node	Fallback list
> ---------------------
> 0	0 1 2 3
> 1	1 0 3 2
> 2	2 3 0 1
> 3	3 2 0 1 <-- Unexpected fallback order

FWIW, for a dual-socket 8 node system with the following distance matrix,

node   0   1   2   3   4   5   6   7
  0:  10  12  12  12  32  32  32  32
  1:  12  10  12  12  32  32  32  32
  2:  12  12  10  12  32  32  32  32
  3:  12  12  12  10  32  32  32  32
  4:  32  32  32  32  10  12  12  12
  5:  32  32  32  32  12  10  12  12
  6:  32  32  32  32  12  12  10  12
  7:  32  32  32  32  12  12  12  10

the fallback list looks like this:

Before
=======
Fallback order for Node 0: 0 1 2 3 4 5 6 7
Fallback order for Node 1: 1 2 3 0 5 6 7 4
Fallback order for Node 2: 2 3 0 1 6 7 4 5
Fallback order for Node 3: 3 0 1 2 7 4 5 6
Fallback order for Node 4: 4 5 6 7 0 1 2 3
Fallback order for Node 5: 5 6 7 4 0 1 2 3
Fallback order for Node 6: 6 7 4 5 0 1 2 3
Fallback order for Node 7: 7 4 5 6 0 1 2 3

After the fix
==============
Fallback order for Node 0: 0 1 2 3 4 5 6 7
Fallback order for Node 1: 1 2 3 0 5 6 7 4
Fallback order for Node 2: 2 3 0 1 6 7 4 5
Fallback order for Node 3: 3 0 1 2 7 4 5 6
Fallback order for Node 4: 4 5 6 7 0 1 2 3
Fallback order for Node 5: 5 6 7 4 1 2 3 0
Fallback order for Node 6: 6 7 4 5 2 3 0 1
Fallback order for Node 7: 7 4 5 6 3 0 1 2

So the problem becomes more pronounced for bigger NUMA systems.

Regards,
Bharata.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ