lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z7zAhSAzpU_MCGnO@gmail.com>
Date: Mon, 24 Feb 2025 19:55:01 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org, tglx@...utronix.de,
	bp@...en8.de, joro@...tes.org, luto@...nel.org,
	peterz@...radead.org, kirill.shutemov@...ux.intel.com,
	rick.p.edgecombe@...el.com, jgross@...e.com
Subject: Re: [RFC][PATCH 0/8] x86/mm: Simplify PAE page table handling


* Dave Hansen <dave.hansen@...ux.intel.com> wrote:

> tl;dr: 32-bit PAE page table handing is a bit different when PTI
> is on and off. Making the handling uniform removes a good amount
> of code at the cost of not sharing kernel PMDs. The downside of
> this simplification is bloating non-PTI PAE kernels by ~2 pages
> per process.
> 
> Anyone who cares about security on 32-bit is running with PTI and
> PAE because PAE has the No-eXecute page table bit. They are already
> paying the 2-page penalty. Anyone who cares more about memory
> footprint than security is probably already running a !PAE kernel
> and will not be affected by this.
> 
> --
> 
> There are two 32-bit x86 hardware page table formats. A 2-level one
> with 32-bit pte_t's and a 3-level one with 64-bit pte_t's called PAE.
> But the PAE one is wonky. It effectively loses a bit of addressing
> radix per level since its PTEs are twice as large. It makes up for
> that by adding the third level, but with only 4 entries in the level.
> 
> This leads to all kinds of fun because this level only needs 32 bytes
> instead of a whole page. Also, since it has only 4 entries in the top
> level, the hardware just always caches the entire thing aggressively.
> Modifying a PAE pgd_t ends up needing different rules than the other
> other x86 paging modes and probably every other architecture too.
> 
> PAE support got even weirder when Xen came along. Xen wants to trap
> into the hypervisor on page table writes and so it protects the guest
> page tables with paging protections. It can't protect a 32 byte
> object with paging protections so it bloats the 32-byte object out
> to a page. Xen also didn't support sharing kernel PMD pages.  This
> is mostly moot now because the Xen support running as a 32-bit guest
> was ripped out, but there are still remnants around.
> 
> PAE also interacts with PTI in fun and exciting ways. Since pgd
> updates are so fraught, the PTI PAE implementation just chose to
> avoid pgd updates by preallocating all the PMDs up front since
> there are only 4 instead of 512 or 1024 in the other x86 paging
> modes.
> 
> Make PAE less weird:
>  * Always allocate a page for PAE PGDs. This brings them in line
>    with the other 2 paging modes. It was done for Xen and for
>    PTI already and nobody screamed, so just do it everywhere.
>  * Never share kernel PMD pages. This brings PAE in line with
>    32-bit !PAE and 64-bit.
>  * Always preallocate all PAE PMD pages. This basically makes
>    all PAE kernels behave like PTI ones. It might waste a page
>    of memory, but all 4 pages probably get allocated in the common
>    case anyway.
> 
> --
> 
>  include/asm/pgtable-2level_types.h |    2
>  include/asm/pgtable-3level_types.h |    4 -
>  include/asm/pgtable_64_types.h     |    2
>  mm/pat/set_memory.c                |    2
>  mm/pgtable.c                       |  104 +++++--------------------------------
>  5 files changed, 18 insertions(+), 96 deletions(-)

The diffstat alone is pretty nice, so I'd suggest we pursue this series 
even if continued work on 32-bit kernel features is being questioned. 
Until the code exists and isn't explicitly marked as obsolete, such 
changes are legit.

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ