linux-kernel - Re: [RFC PATCH v1 5/6] mm: parallelize clear_gigantic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <398e9887-6d6e-e1d3-abcf-43a6d7496bc8@intel.com>
Date:   Mon, 17 Jul 2017 09:02:36 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     daniel.m.jordan@...cle.com, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [RFC PATCH v1 5/6] mm: parallelize clear_gigantic_page

On 07/14/2017 03:16 PM, daniel.m.jordan@...cle.com wrote:
> Machine:  Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz, 288 cpus, 1T memory
> Test:    Clear a range of gigantic pages
> nthread   speedup   size (GiB)   min time (s)   stdev
>       1                    100          41.13    0.03
>       2     2.03x          100          20.26    0.14
>       4     4.28x          100           9.62    0.09
>       8     8.39x          100           4.90    0.05
>      16    10.44x          100           3.94    0.03
...
>       1                    800         434.91    1.81
>       2     2.54x          800         170.97    1.46
>       4     4.98x          800          87.38    1.91
>       8    10.15x          800          42.86    2.59
>      16    12.99x          800          33.48    0.83

What was the actual test here?  Did you just use sysfs to allocate 800GB
of 1GB huge pages?

This test should be entirely memory-bandwidth-limited, right?  Are you
contending here that a single core can only use 1/10th of the memory
bandwidth when clearing a page?

Or, does all the gain here come because we are round-robin-allocating
the pages across all 8 NUMA nodes' memory controllers and the speedup
here is because we're not doing the clearing across the interconnect?