[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <398e9887-6d6e-e1d3-abcf-43a6d7496bc8@intel.com>
Date: Mon, 17 Jul 2017 09:02:36 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: daniel.m.jordan@...cle.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 5/6] mm: parallelize clear_gigantic_page
On 07/14/2017 03:16 PM, daniel.m.jordan@...cle.com wrote:
> Machine: Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz, 288 cpus, 1T memory
> Test: Clear a range of gigantic pages
> nthread speedup size (GiB) min time (s) stdev
> 1 100 41.13 0.03
> 2 2.03x 100 20.26 0.14
> 4 4.28x 100 9.62 0.09
> 8 8.39x 100 4.90 0.05
> 16 10.44x 100 3.94 0.03
...
> 1 800 434.91 1.81
> 2 2.54x 800 170.97 1.46
> 4 4.98x 800 87.38 1.91
> 8 10.15x 800 42.86 2.59
> 16 12.99x 800 33.48 0.83
What was the actual test here? Did you just use sysfs to allocate 800GB
of 1GB huge pages?
This test should be entirely memory-bandwidth-limited, right? Are you
contending here that a single core can only use 1/10th of the memory
bandwidth when clearing a page?
Or, does all the gain here come because we are round-robin-allocating
the pages across all 8 NUMA nodes' memory controllers and the speedup
here is because we're not doing the clearing across the interconnect?
Powered by blists - more mailing lists