linux-kernel - Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200929141216.GO2628@hirez.programming.kicks-ass.net>
Date:   Tue, 29 Sep 2020 16:12:16 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mike Rapoport <rppt@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andy Lutomirski <luto@...nel.org>,
        Arnd Bergmann <arnd@...db.de>, Borislav Petkov <bp@...en8.de>,
        Catalin Marinas <catalin.marinas@....com>,
        Christopher Lameter <cl@...ux.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        Elena Reshetova <elena.reshetova@...el.com>,
        "H. Peter Anvin" <hpa@...or.com>, Idan Yaniv <idan.yaniv@....com>,
        Ingo Molnar <mingo@...hat.com>,
        James Bottomley <jejb@...ux.ibm.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Matthew Wilcox <willy@...radead.org>,
        Mark Rutland <mark.rutland@....com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Shuah Khan <shuah@...nel.org>, Tycho Andersen <tycho@...ho.ws>,
        Will Deacon <will@...nel.org>, linux-api@...r.kernel.org,
        linux-arch@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-nvdimm@...ts.01.org, linux-riscv@...ts.infradead.org,
        x86@...nel.org
Subject: Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize
 direct map fragmentation

On Tue, Sep 29, 2020 at 04:05:29PM +0300, Mike Rapoport wrote:
> On Fri, Sep 25, 2020 at 09:41:25AM +0200, Peter Zijlstra wrote:
> > On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote:
> > > From: Mike Rapoport <rppt@...ux.ibm.com>
> > > 
> > > Removing a PAGE_SIZE page from the direct map every time such page is
> > > allocated for a secret memory mapping will cause severe fragmentation of
> > > the direct map. This fragmentation can be reduced by using PMD-size pages
> > > as a pool for small pages for secret memory mappings.
> > > 
> > > Add a gen_pool per secretmem inode and lazily populate this pool with
> > > PMD-size pages.
> > 
> > What's the actual efficacy of this? Since the pmd is per inode, all I
> > need is a lot of inodes and we're in business to destroy the directmap,
> > no?
> > 
> > Afaict there's no privs needed to use this, all a process needs is to
> > stay below the mlock limit, so a 'fork-bomb' that maps a single secret
> > page will utterly destroy the direct map.
> 
> This indeed will cause 1G pages in the direct map to be split into 2M
> chunks, but I disagree with 'destroy' term here. Citing the cover letter
> of an earlier version of this series:

It will drop them down to 4k pages. Given enough inodes, and allocating
only a single sekrit page per pmd, we'll shatter the directmap into 4k.

>   I've tried to find some numbers that show the benefit of using larger
>   pages in the direct map, but I couldn't find anything so I've run a
>   couple of benchmarks from phoronix-test-suite on my laptop (i7-8650U
>   with 32G RAM).

Existing benchmarks suck at this, but FB had a load that had a
deterministic enough performance regression to bisect to a directmap
issue, fixed by:

  7af0145067bc ("x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text")

>   I've tested three variants: the default with 28G of the physical
>   memory covered with 1G pages, then I disabled 1G pages using
>   "nogbpages" in the kernel command line and at last I've forced the
>   entire direct map to use 4K pages using a simple patch to
>   arch/x86/mm/init.c.  I've made runs of the benchmarks with SSD and
>   tmpfs.
>   
>   Surprisingly, the results does not show huge advantage for large
>   pages. For instance, here the results for kernel build with
>   'make -j8', in seconds:

Your benchmark should stress the TLB of your uarch, such that additional
pressure added by the shattered directmap shows up.

And no, I don't have one either.

>                         |  1G    |  2M    |  4K
>   ----------------------+--------+--------+---------
>   ssd, mitigations=on	| 308.75 | 317.37 | 314.9
>   ssd, mitigations=off	| 305.25 | 295.32 | 304.92
>   ram, mitigations=on	| 301.58 | 322.49 | 306.54
>   ram, mitigations=off	| 299.32 | 288.44 | 310.65

These results lack error data, but assuming the reults are significant,
then this very much makes a case for 1G mappings. 5s on a kernel builds
is pretty good.