linux-kernel - Re: [lkp-robot] [x86/mm] c4c3c3c2d0: will-it-scale.per_process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrW3G_RfMNKvw0katFuV6dB7k4AfUdVRZ603HNjY=bD4GQ@mail.gmail.com>
Date:   Mon, 16 Oct 2017 18:06:25 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     kernel test robot <xiaolong.ye@...el.com>,
        Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Markus Trippelsdorf <markus@...ppelsdorf.de>,
        Adam Borowski <kilobyte@...band.pl>,
        Brian Gerst <brgerst@...il.com>,
        Johannes Hirte <johannes.hirte@...enkhaos.de>, LKP <lkp@...org>
Subject: Re: [lkp-robot] [x86/mm] c4c3c3c2d0: will-it-scale.per_process_ops
 -61.0% regression

On Mon, Oct 16, 2017 at 3:15 AM, Borislav Petkov <bp@...en8.de> wrote:
> On Mon, Oct 16, 2017 at 10:39:17AM +0800, kernel test robot wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a -61.0% regression of will-it-scale.per_process_ops due to commit:
>>
>>
>> commit: c4c3c3c2d00826c88b5c02c20e80704664424b9b ("x86/mm: Flush more aggressively in lazy TLB mode")
>> url: https://github.com/0day-ci/linux/commits/Borislav-Petkov/x86-mm-Flush-more-aggressively-in-lazy-TLB-mode/20171011-115901
>>
>>
>> in testcase: will-it-scale
>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
>
> Say what now?
>
> This is actually what got applied upstream:
>
> b956575bed91 ("x86/mm: Flush more aggressively in lazy TLB mode")
>
> and AFAICT, that machine is BDW and it should have PCID, right?
>
> Or wait, that's a guest so PCID is probably not even usable for guests.
> Or should we disable it in VMs?

PCID works on new versions of KVM, at least, depending on configuration.

On a PCID machine, with this patch applied, we are still switching CR3
when we go idle (which is presumably what we're hitting here) -- we're
just not flushing anything.  The main cost seems to come from
serialization.  On my laptop, I think I measured about 80 ns per
non-flushing CR3 load if I do it in a loop.  The cost is larger when
it's not in a loop because the pipeline is fuller, I assume.  We also
take a bit of a hit because switch_mm is a bit complex.  I have a
patch here to try to optimize it:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=1caf24d080dac8b9f952600d1e91879aa782131c

On a non-PCID machine, this patch will increase IPIs, which doesn't
seem to be what we're seeing.

The test in question is basically the same thing as a test I ran with
very little in the way of visible regression.  I'm wondering if the
real problem is some NUMA oddity.

Xiaolong, can you send us /proc/cpuinfo on this kernel on the test
machine that's seeing this problem?

--Andy