lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Oct 2022 11:28:16 -0400
From:   Rik van Riel <riel@...riel.com>
To:     "Huang, Ying" <ying.huang@...el.com>,
        Nathan Chancellor <nathan@...nel.org>
Cc:     kernel test robot <yujie.liu@...el.com>, lkp@...ts.01.org,
        lkp@...el.com, Andrew Morton <akpm@...ux-foundation.org>,
        Yang Shi <shy828301@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        feng.tang@...el.com, zhengjun.xing@...ux.intel.com,
        fengwei.yin@...el.com
Subject: Re: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression

On Thu, 2022-10-20 at 13:07 +0800, Huang, Ying wrote:
> 
> Nathan Chancellor <nathan@...nel.org> writes:
> > 
> > For what it's worth, I just bisected a massive and visible
> > performance
> > regression on my Threadripper 3990X workstation to commit
> > f35b5d7d676e
> > ("mm: align larger anonymous mappings on THP boundaries"), which
> > seems
> > directly related to this report/analysis. I initially noticed this
> > because my full set of kernel builds against mainline went from 2
> > hours
> > and 20 minutes or so to over 3 hours. Zeroing in on x86_64
> > allmodconfig,
> > which I used for the bisect:
> > 
> > @ 7b5a0b664ebe ("mm/page_ext: remove unused variable in
> > offline_page_ext"):
> > 
> > Benchmark 1: make -skj128 LLVM=1 allmodconfig all
> >   Time (mean ± σ):     318.172 s ±  0.730 s    [User: 31750.902 s,
> > System: 4564.246 s]
> >   Range (min … max):   317.332 s … 318.662 s    3 runs
> > 
> > @ f35b5d7d676e ("mm: align larger anonymous mappings on THP
> > boundaries"):
> > 
> > Benchmark 1: make -skj128 LLVM=1 allmodconfig all
> >   Time (mean ± σ):     406.688 s ±  0.676 s    [User: 31819.526 s,
System: 16327.022 s]
> >   Range (min … max):   405.954 s … 407.284 s    3 run
> 
> Have you tried to build with gcc?  Want to check whether is this
> clang
> specific issue or not.

This may indeed be something LLVM specific. In previous tests,
GCC has generally seen a benefit from increased THP usage.
Many other applications also benefit from getting more THPs.

LLVM showing 10% system time before this change, and a whopping
30% system time after that change, suggests that LLVM is behaving
quite differently from GCC in some ways.

If we can figure out what these differences are, maybe we can
just fine tune the code to avoid this issue.

I'll try to play around with LLVM compilation a little bit next
week, to see if I can figure out what might be going on. I wonder
if LLVM is doing lots of mremap calls or something...

-- 
All Rights Reversed.

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ