lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjVuLjZ1Wr52W=hNqh=_8gbzuKA+YpsVb4NBHCJsE6cyA@mail.gmail.com>
Date:   Thu, 6 Dec 2018 20:01:46 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Andrea Arcangeli <aarcange@...hat.com>,
        mgorman@...hsingularity.net, Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>, ying.huang@...el.com,
        s.priebe@...fihost.ag,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        alex.williamson@...hat.com, lkp@...org, kirill@...temov.name,
        Andrew Morton <akpm@...ux-foundation.org>,
        zi.yan@...rutgers.edu
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

On Thu, Dec 6, 2018 at 3:43 PM David Rientjes <rientjes@...gle.com> wrote:
>
> On Broadwell, the access latency to local small pages was +5.6%, remote
> hugepages +16.4%, and remote small pages +19.9%.
>
> On Naples, the access latency to local small pages was +4.9%, intrasocket
> hugepages +10.5%, intrasocket small pages +19.6%, intersocket small pages
> +26.6%, and intersocket hugepages +29.2%

Are those two last numbers transposed?

Or why would small page accesses be *faster* than hugepages for the
intersocket case?

Of course, depending on testing, maybe the page itself was remote, but
the page tables were random, and you happened to get a remote page
table for the hugepage case?

> The results on Murano were similar, which is why I suspect Aneesh
> introduced the __GFP_THISNODE requirement for thp in 4.0, which preferred,
> in order, local small pages, remote 1-hop hugepages, remote 2-hop
> hugepages, remote 1-hop small pages, remote 2-hop small pages.

it sounds like on the whole the TLB advantage of hugepages is smaller
than the locality advantage.

Which doesn't surprise me on x86, because TLB costs really are fairly
low. Very good TLB fills, relatively to what I've seen elsewhere.

> So it *appears* from the x86 platforms that NUMA matters much more
> significantly than hugeness, but remote hugepages are a slight win over
> remote small pages.  PPC appeared the same wrt the local node but then
> prefers hugeness over affinity when it comes to remote pages.

I do think POWER at least historically has much weaker TLB fills, but
also very costly page table creation/teardown. Constant-time O(1)
arguments about hash lookups are only worth so much when the constant
time is pretty big. They've been working on it.

So at least on POWER, afaik one issue is literally that hugepages made
the hash setup and teardown situation much better.

One thing that might be worth looking at is whether the process itself
is all that node-local. Maybe we could aim for a policy that says
"prefer local memory, but if we notice that the accesses to this vma
aren't all that local, then who cares?".

IOW, the default could be something more dynamic than just "always use
__GFP_THISNODE". It could be more along the lines of "start off using
__GFP_THISNODE, but for longer-lived processes that bounce around
across nodes, maybe relax it?"

I don't think we have that kind of information right now, though, do we?

Honestly, I think things like vm_policy etc should not be the solution
- yes, some people may know *exactly* what access patterns they want,
but for most situations, I think the policy should be that defaults
"just work".

In fact, I wish even MADV_HUGEPAGE itself were to approach being a
no-op with THP.

We already have TRANSPARENT_HUGEPAGE_ALWAYS being the default kconfig
option (but I think it's a bit detbatable, because I'm not sure
everybody always agrees about memory use), so on the whole
MADV_HUGEPAGE shouldn't really *do* anything.

                     Linus

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5120 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ