linux-kernel - Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wjm9V843eg0uesMrxKnCCq7UfWn8VJ+z-cNztb_0fVW6A@mail.gmail.com>
Date:   Wed, 5 Dec 2018 16:58:02 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     mgorman@...hsingularity.net, Vlastimil Babka <vbabka@...e.cz>,
        mhocko@...nel.org, ying.huang@...el.com, s.priebe@...fihost.ag,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        alex.williamson@...hat.com, lkp@...org,
        David Rientjes <rientjes@...gle.com>, kirill@...temov.name,
        Andrew Morton <akpm@...ux-foundation.org>,
        zi.yan@...rutgers.edu
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

On Wed, Dec 5, 2018 at 3:51 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Ok, I've applied David's latest patch.
>
> I'm not at all objecting to tweaking this further, I just didn't want
> to have this regression stand.

Hmm. Can somebody (David?) also perhaps try to state what the
different latency impacts end up being? I suspect it's been mentioned
several times during the argument, but it would be nice to have a
"going forward, this is what I care about" kind of setup for good
default behavior.

How much of the problem ends up being about the cost of compaction vs
the cost of getting a remote node bigpage?

That would seem to be a fairly major issue, but __GFP_THISNODE affects
both. It limits compaction to just this now, in addition to obviously
limiting the allocation result.

I realize that we probably do want to just have explicit policies that
do not exist right now, but what are (a) sane defaults, and (b) sane
policies?

For example, if we cannot get a hugepage on this node, but we *do* get
a node-local small page, is the local memory advantage simply better
than the possible TLB advantage?

Because if that's the case (at least commonly), then that in itself is
a fairly good argument for "hugepage allocations should always be
THISNODE".

But David also did mention the actual allocation overhead itself in
the commit, and maybe the math is more "try to get a local hugepage,
but if no such thing exists, see if you can get a remote hugepage
_cheaply_".

So another model can be "do local-only compaction, but allow non-local
allocation if the local node doesn't have anything". IOW, if other
nodes have hugepages available, pick them up, but don't try to compact
other nodes to do so?

And yet another model might be "do a least-effort thing, give me a
local hugepage if it exists, otherwise fall back to small pages".

So there are different combinations of "try compaction" vs "local-remote".

                      Linus