linux-kernel - Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181127181727.GD6923@dhcp22.suse.cz>
Date:   Tue, 27 Nov 2018 19:17:27 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     rong.a.chen@...el.com, Andrea Arcangeli <aarcange@...hat.com>,
        s.priebe@...fihost.ag, alex.williamson@...hat.com,
        mgorman@...hsingularity.net, zi.yan@...rutgers.edu,
        Vlastimil Babka <vbabka@...e.cz>, rientjes@...gle.com,
        kirill@...temov.name, Andrew Morton <akpm@...ux-foundation.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        lkp@...org
Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3%
 regression

On Tue 27-11-18 09:08:50, Linus Torvalds wrote:
> On Mon, Nov 26, 2018 at 10:24 PM kernel test robot
> <rong.a.chen@...el.com> wrote:
> >
> > FYI, we noticed a -61.3% regression of vm-scalability.throughput due
> > to commit ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for
> > MADV_HUGEPAGE mappings")
> 
> Well, that's certainly noticeable and not good.
> 
> Andrea, I suspect it might be causing fights with auto numa migration..
> 
> Lots more system time, but also look at this:
> 
> >    1122389 ±  9%     +17.2%    1315380 ±  4%  proc-vmstat.numa_hit
> >     214722 ±  5%     +21.6%     261076 ±  3%  proc-vmstat.numa_huge_pte_updates
> >    1108142 ±  9%     +17.4%    1300857 ±  4%  proc-vmstat.numa_local
> >     145368 ± 48%     +63.1%     237050 ± 17%  proc-vmstat.numa_miss
> >     159615 ± 44%     +57.6%     251573 ± 16%  proc-vmstat.numa_other
> >     185.50 ± 81%   +8278.6%      15542 ± 40%  proc-vmstat.numa_pages_migrated
> 
> Should the commit be reverted? Or perhaps at least modified?

Well, the commit is trying to revert to the behavior before
5265047ac301 because there are real usecases that suffered from that
change and bug reports as a result of that.

will-it-scale is certainly worth considering but it is an artificial
testcase. A higher NUMA miss rate is an expected side effect of the
patch because the fallback to a different NUMA node is more likely. The
__GFP_THISNODE side effect is basically introducing node-reclaim
behavior for THPages. Another thing is that there is no good behavior
for everybody. Reclaim locally vs. THP on a remote node is hard to
tell by default. We have discussed that at length and there were some
conclusions. One of them is that we need a numa policy to tell whether
a expensive localility is preferred over remote allocation.  Also we
definitely need a better pro-active defragmentation to allow larger
pages on a local node. This is a work in progress and this patch is a
stop gap fix.

-- 
Michal Hocko
SUSE Labs