lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 02 Sep 2014 10:12:59 +0900
From:	Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>
CC:	Rik van Riel <riel@...hat.com>,
	Johannes Weiner <hannes@...xchg.org>,
	David Rientjes <rientjes@...gle.com>,
	Fengguang Wu <fengguang.wu@...el.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: page_alloc: Default to node-ordering on 64-bit NUMA
 machines

(2014/09/01 21:55), Mel Gorman wrote:
> Zones are allocated by the page allocator in either node or zone order.
> Node ordering is preferred in terms of locality and is applied automatically
> in one of three cases.
>
>    1. If a node has only low memory
>
>    2. If DMA/DMA32 is a high percentage of memory
>
>    3. If low memory on a single node is greater than 70% of the node size
>
> Otherwise zone ordering is used to preserve low memory. Unfortunately
> a consequence of this is that a machine with balanced NUMA nodes will
> experience different performance characteristics depending on which node
> they happen to start from.
>
> The point of zone ordering is to protect lower nodes for devices that require
> DMA/DMA32 memory. When NUMA was first introduced, this was critical as 32-bit
> NUMA machines commonly suffered from low memory exhaustion problems. On
> 64-bit machines the primary concern is devices that are 32-bit only which
> is less severe than the low memory exhaustion problem on 32-bit NUMA. It
> seems there are really few devices that depends on it.
>
> AGP -- I assume this is getting more rare but even then I think the allocations
> 	happen early in boot time where lowmem pressure is less of a problem
>
> DRM -- If the device is 32-bit only then there may be low pressure. I didn't
> 	evaluate these in detail but it looks like some of these are mobile
> 	graphics card. Not many NUMA laptops out there. DRM folk should know
> 	better though.
>
> Some TV cards -- Much demand for 32-bit capable TV cards on NUMA machines?
>
> B43 wireless card -- again not really a NUMA thing.
>
> I cannot find a good reason to incur a performance penalty on all 64-bit NUMA
> machines in case someone throws a brain damanged TV or graphics card in there.
> This patch defaults to node-ordering on 64-bit NUMA machines. I was tempted
> to make it default everywhere but I understand that some embedded arches may
> be using 32-bit NUMA where I cannot predict the consequences.
>
> The performance impact depends on the workload and the characteristics of the
> machine and the machine I tested on had a large Normal zone on node 0 so the
> impact is within the noise for the majority of tests. The allocation stats
> show more allocation requests were from DMA32 and local node. Running SpecJBB
> with multiple JVMs and automatic NUMA balancing disabled the results were
>
> specjbb
>                       3.17.0-rc2            3.17.0-rc2
>                          vanilla        nodeorder-v1r1
> Min    1      29534.00 (  0.00%)     30020.00 (  1.65%)
> Min    10    115717.00 (  0.00%)    134038.00 ( 15.83%)
> Min    19    109718.00 (  0.00%)    114186.00 (  4.07%)
> Min    28    104459.00 (  0.00%)    103639.00 ( -0.78%)
> Min    37     98245.00 (  0.00%)    103756.00 (  5.61%)
> Min    46     97198.00 (  0.00%)     96197.00 ( -1.03%)
> Mean   1      30953.25 (  0.00%)     31917.75 (  3.12%)
> Mean   10    124432.50 (  0.00%)    140904.00 ( 13.24%)
> Mean   19    116033.50 (  0.00%)    119294.75 (  2.81%)
> Mean   28    108365.25 (  0.00%)    106879.50 ( -1.37%)
> Mean   37    102984.75 (  0.00%)    106924.25 (  3.83%)
> Mean   46    100783.25 (  0.00%)    105368.50 (  4.55%)
> Stddev 1       1260.38 (  0.00%)      1109.66 ( 11.96%)
> Stddev 10      7434.03 (  0.00%)      5171.91 ( 30.43%)
> Stddev 19      8453.84 (  0.00%)      5309.59 ( 37.19%)
> Stddev 28      4184.55 (  0.00%)      2906.63 ( 30.54%)
> Stddev 37      5409.49 (  0.00%)      3192.12 ( 40.99%)
> Stddev 46      4521.95 (  0.00%)      7392.52 (-63.48%)
> Max    1      32738.00 (  0.00%)     32719.00 ( -0.06%)
> Max    10    136039.00 (  0.00%)    148614.00 (  9.24%)
> Max    19    130566.00 (  0.00%)    127418.00 ( -2.41%)
> Max    28    115404.00 (  0.00%)    111254.00 ( -3.60%)
> Max    37    112118.00 (  0.00%)    111732.00 ( -0.34%)
> Max    46    108541.00 (  0.00%)    116849.00 (  7.65%)
> TPut   1     123813.00 (  0.00%)    127671.00 (  3.12%)
> TPut   10    497730.00 (  0.00%)    563616.00 ( 13.24%)
> TPut   19    464134.00 (  0.00%)    477179.00 (  2.81%)
> TPut   28    433461.00 (  0.00%)    427518.00 ( -1.37%)
> TPut   37    411939.00 (  0.00%)    427697.00 (  3.83%)
> TPut   46    403133.00 (  0.00%)    421474.00 (  4.55%)
>
>                              3.17.0-rc2  3.17.0-rc2
>                                 vanillanodeorder-v1r1
> DMA allocs                           0           0
> DMA32 allocs                        57     1491992
> Normal allocs                 32543566    30026383
> Movable allocs                       0           0
> Direct pages scanned                 0           0
> Kswapd pages scanned                 0           0
> Kswapd pages reclaimed               0           0
> Direct pages reclaimed               0           0
> Kswapd efficiency                 100%        100%
> Kswapd velocity                  0.000       0.000
> Direct efficiency                 100%        100%
> Direct velocity                  0.000       0.000
> Percentage direct scans             0%          0%
> Zone normal velocity             0.000       0.000
> Zone dma32 velocity              0.000       0.000
> Zone dma velocity                0.000       0.000
> THP fault alloc                  55164       52987
> THP collapse alloc                 139         147
> THP splits                          26          21
> NUMA alloc hit                 4169066     4250692
> NUMA alloc miss                      0           0
>
> Note that there were more DMA32 allocations with the patch applied.  In this
> particular case there was no difference in numa_hit and numa_miss. The
> expectation is that DMA32 was being used at the low watermark instead of
> falling into the slow path. kswapd was not woken but it's not worken for
> THP allocations.
>
> Signed-off-by: Mel Gorman <mgorman@...e.de>

I agree. When I introduced the option(at ia64 numa development), there might be
some troubles. Recent NUMA tend to have enough memory on node0.

And problems can be avoided with numa_zonelist_order" "boot option, anyway.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>





> ---
>   mm/page_alloc.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 18cee0d..20059aa 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3579,6 +3579,17 @@ static void build_zonelists_in_zone_order(pg_data_t *pgdat, int nr_nodes)
>   	zonelist->_zonerefs[pos].zone_idx = 0;
>   }
>
> +#if defined(CONFIG_64BIT)
> +
> +/* Devices that require DMA32/DMA are relatively rare and do not justify a
> + * penalty to every machine in case the specialised case applies. Default
> + * to Node-ordering on 64-bit NUMA machines
> + */
> +static int default_zonelist_order(void)
> +{
> +	return ZONELIST_ORDER_NODE;
> +}
> +#else
>   static int default_zonelist_order(void)
>   {
>   	int nid, zone_type;
> @@ -3641,6 +3652,7 @@ static int default_zonelist_order(void)
>   	}
>   	return ZONELIST_ORDER_ZONE;
>   }
> +#endif /* CONFIG_64BIT */
>
>   static void set_zonelist_order(void)
>   {
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ