linux-kernel - Re: [PATCH 00/11] Use global pages with PTI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0d6ea030-ec3b-d649-bad7-89ff54094e25@linux.intel.com>
Date:   Wed, 28 Mar 2018 17:17:56 -0700
From:   Dave Hansen <dave.hansen@...ux.intel.com>
To:     Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...gle.com>,
        Hugh Dickins <hughd@...gle.com>,
        Jürgen Groß <jgross@...e.com>,
        the arch/x86 maintainers <x86@...nel.org>, namit@...are.com
Subject: Re: [PATCH 00/11] Use global pages with PTI

On 03/27/2018 01:07 PM, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@...utronix.de> wrote:
>>> systems.  Atoms are going to be the easiest thing to get my hands on,
>>> but I tend to shy away from them for performance work.
>> What I have in mind is that I wonder whether the whole circus is worth it
>> when there is no performance advantage on PCID systems.

I was waiting on trying to find a relatively recent Atom system (they
actually come in reasonably sized servers [1]), but I'm hitting a snag
there, so I figured I'd just share a kernel compile using Ingo's
perf-based methodology on a Skylake desktop system with PCIDs.  Here's
the kernel compile:

No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
                             -1.195 seconds (-0.64%)

Lower is better here, obviously.

I also re-checked everything using will-it-scale's llseek1 test[2] which
is basically a microbenchmark of a halfway reasonable syscall.  Higher
here is better.

No Global pages (baseline): 15783951 lseeks/sec
28 Global pages (this set): 16054688 lseeks/sec
			     +270737 lseeks/sec (+1.71%)

So, both the kernel compile and the microbenchmark got measurably faster.

1.
https://ark.intel.com/products/97933/Intel-Atom-Processor-C3955-16M-Cache-up-to-2_40-GHz
2.
https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c