[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZAYsWadAkdZBUGDb@zn.tnic>
Date: Mon, 6 Mar 2023 19:09:29 +0100
From: Borislav Petkov <bp@...en8.de>
To: Rick Edgecombe <rick.p.edgecombe@...el.com>
Cc: x86@...nel.org, "H . Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-arch@...r.kernel.org, linux-api@...r.kernel.org,
Arnd Bergmann <arnd@...db.de>,
Andy Lutomirski <luto@...nel.org>,
Balbir Singh <bsingharora@...il.com>,
Cyrill Gorcunov <gorcunov@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Eugene Syromiatnikov <esyr@...hat.com>,
Florian Weimer <fweimer@...hat.com>,
"H . J . Lu" <hjl.tools@...il.com>, Jann Horn <jannh@...gle.com>,
Jonathan Corbet <corbet@....net>,
Kees Cook <keescook@...omium.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Nadav Amit <nadav.amit@...il.com>,
Oleg Nesterov <oleg@...hat.com>, Pavel Machek <pavel@....cz>,
Peter Zijlstra <peterz@...radead.org>,
Randy Dunlap <rdunlap@...radead.org>,
Weijiang Yang <weijiang.yang@...el.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
John Allen <john.allen@....com>, kcc@...gle.com,
eranian@...gle.com, rppt@...nel.org, jamorris@...ux.microsoft.com,
dethoma@...rosoft.com, akpm@...ux-foundation.org,
Andrew.Cooper3@...rix.com, christina.schimpe@...el.com,
david@...hat.com, debug@...osinc.com
Subject: Re: [PATCH v7 25/41] x86/mm: Introduce MAP_ABOVE4G
On Mon, Feb 27, 2023 at 02:29:41PM -0800, Rick Edgecombe wrote:
> The x86 Control-flow Enforcement Technology (CET) feature includes a new
> type of memory called shadow stack. This shadow stack memory has some
> unusual properties, which require some core mm changes to function
> properly.
>
> One of the properties is that the shadow stack pointer (SSP), which is a
> CPU register that points to the shadow stack like the stack pointer points
> to the stack, can't be pointing outside of the 32 bit address space when
> the CPU is executing in 32 bit mode. It is desirable to prevent executing
> in 32 bit mode when shadow stack is enabled because the kernel can't easily
> support 32 bit signals.
>
> On x86 it is possible to transition to 32 bit mode without any special
> interaction with the kernel, by doing a "far call" to a 32 bit segment.
> So the shadow stack implementation can use this address space behavior
> as a feature, by enforcing that shadow stack memory is always crated
^^^^^^^
"created"
and I'd say "mapped" or "allocated" here. "Created" sounds weird.
> outside of the 32 bit address space. This way userspace will trigger a
> general protection fault which will in turn trigger a segfault if it
> tries to transition to 32 bit mode with shadow stack enabled.
>
> This provides a clean error generating border for the user if they try
> attempt to do 32 bit mode shadow stack, rather than leave the kernel in a
> half working state for userspace to be surprised by.
>
> So to allow future shadow stack enabling patches to map shadow stacks
> out of the 32 bit address space, introduce MAP_ABOVE4G. The behavior
I guess this needs to be documented in the mmap() manpage too.
> is pretty much like MAP_32BIT, except that it has the opposite address
> range. The are a few differences though.
>
> If both MAP_32BIT and MAP_ABOVE4G are provided, the kernel will use the
> MAP_ABOVE4G behavior. Like MAP_32BIT, MAP_ABOVE4G is ignored in a 32 bit
> syscall.
>
> Since the default search behavior is top down, the normal kaslr base can
> be used for MAP_ABOVE4G. This is unlike MAP_32BIT which has to add it's
^^^^
"its"
> own randomization in the bottom up case.
...
> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
> index 8cc653ffdccd..06378b5682c1 100644
> --- a/arch/x86/kernel/sys_x86_64.c
> +++ b/arch/x86/kernel/sys_x86_64.c
> @@ -193,7 +193,11 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
>
> info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> info.length = len;
> - info.low_limit = PAGE_SIZE;
> + if (!in_32bit_syscall() && (flags & MAP_ABOVE4G))
> + info.low_limit = 0x100000000;
We have a human readable define for that: SZ_4G
> + else
> + info.low_limit = PAGE_SIZE;
> +
> info.high_limit = get_mmap_base(0);
>
> /*
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists