lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Dec 2016 06:01:30 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>, x86@...nel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Arnd Bergmann <arnd@...db.de>,
        "H. Peter Anvin" <hpa@...or.com>, Andi Kleen <ak@...ux.intel.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Andy Lutomirski <luto@...capital.net>,
        linux-arch@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC, PATCHv1 00/28] 5-level paging


* Kirill A. Shutemov <kirill.shutemov@...ux.intel.com> wrote:

> x86-64 is currently limited to 256 TiB of virtual address space and 64 TiB
> of physical address space. We are already bumping into this limit: some
> vendors offers servers with 64 TiB of memory today.
> 
> To overcome the limitation upcoming hardware will introduce support for
> 5-level paging[1]. It is a straight-forward extension of the current page
> table structure adding one more layer of translation.
> 
> It bumps the limits to 128 PiB of virtual address space and 4 PiB of
> physical address space. This "ought to be enough for anybody" ©.
> 
> This patchset is still very early. There are a number of things missing
> that we have to do before asking anyone to merge it (listed below).
> It would be great if folks can start testing applications now (in QEMU) to
> look for breakage.
> Any early comments on the design or the patches would be appreciated as
> well.
> 
> More details on the design and what’s left to implement are below.

The patches don't look too painful, so no big complaints from me - kudos!

> There is still work to do:
> 
>   - Boot-time switch between 4- and 5-level paging.
> 
>     We assume that distributions will be keen to avoid returning to the
>     i386 days where we shipped one kernel binary for each page table
>     layout.

Absolutely.

>     As page table format is the same for 4- and 5-level paging it should
>     be possible to have single kernel binary and switch between them at
>     boot-time without too much hassle.
> 
>     For now I only implemented compile-time switch.
> 
>     I hoped to bring this feature with separate patchset once basic
>     enabling is in upstream.
> 
>     Is it okay?

LGTM, but we would eventually want to convert this kind of crazy open coding:

        pgd_t *pgd, *pgd_ref;
        p4d_t *p4d, *p4d_ref;
        pud_t *pud, *pud_ref;
        pmd_t *pmd, *pmd_ref;
        pte_t *pte, *pte_ref;

To something saner that iterates and navigates the page table hierarchy in an 
extensible fashion. That would also make it (much) easier to make the paging depth 
boot time switchable.

Somehow I'm quite certain we'll see requests for more than 4 PiB memory in our 
lifetimes.

In a decade or two once global warming really gets going, especially after Trump & 
Republicans & Old Energy implement their billionaire welfare policies to mine, 
sell and burn even more coal & oil without paying for the damage caused, the U.S. 
meteorology clusters tracking Category 6 hurricanes in the Atlantic (capable of 1+ 
trillion dollars damage) in near real time at 1 meter resolution will have to run 
on something capable, right?

>   - Handle opt-in wider address space for userspace.
> 
>     Not all userspace is ready to handle addresses wider than current
>     47-bits. At least some JIT compiler make use of upper bits to encode
>     their info.
> 
>     We need to have an interface to opt-in wider addresses from userspace
>     to avoid regressions.
> 
>     For now, I've included testing-only patch which bumps TASK_SIZE to
>     56-bits. This can be handy for testing to see what breaks if we max-out
>     size of virtual address space.

So this is just a detail - but it sounds a bit limiting to me to provide an 'opt 
in' flag for something that will work just fine on the vast majority of 64-bit 
software.

Please make this an opt out compatibility flag instead: similar to how we handle 
address space layout limitations/quirks ABI details, such as ADDR_LIMIT_32BIT, 
ADDR_LIMIT_3GB, ADDR_COMPAT_LAYOUT, READ_IMPLIES_EXEC, etc.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ