linux-kernel - Re: [PATCH v3 1/1] x86/ia32: Leave NULL selector values 0~3 as is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <029284c6-f103-46e0-9acd-9e9e816e7ffb@citrix.com>
Date: Thu, 21 Nov 2024 19:43:38 +0000
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: "Xin Li (Intel)" <xin@...or.com>, linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
 brgerst@...il.com, ebiederm@...ssion.com
Subject: Re: [PATCH v3 1/1] x86/ia32: Leave NULL selector values 0~3 as is

On 21/11/2024 5:54 pm, Xin Li (Intel) wrote:
> The first GDT descriptor is reserved as 'NULL descriptor'.  As bits 0
> and 1 of a segment selector, i.e., the RPL bits, are NOT used to index
> GDT, selector values 0~3 all point to the NULL descriptor, thus values
> 0, 1, 2 and 3 are all valid NULL selector values.
>
> When a NULL selector value is to be loaded into a segment register,
> reload_segments() sets its RPL bits.  Later IRET zeros ES, FS, GS, and
> DS segment registers if any of them is found to have any nonzero NULL
> selector value.  The two operations offset each other to actually effect
> a nop.
>
> Besides, zeroing of RPL in NULL selector values is an information leak
> in pre-FRED systems as userspace can spot any interrupt/exception by
> loading a nonzero NULL selector, and waiting for it to drop to zero.
> But there is nothing software can do to prevent it before FRED.
>
> ERETU, the only legit instruction to return to userspace from kernel
> under FRED, by design does NOT zero any segment register to avoid this
> problem behavior.
>
> As such, leave NULL selector values 0~3 as is.
>
> Do the same on 32-bit kernel as well.
>
> Signed-off-by: Xin Li (Intel) <xin@...or.com>

As far as fixing up RPL goes, I think the patch is fine, and probably
wants to be taken in roughly this form (new minor points below).

However, the pre-existing code is doing something entirely bizarre,
which warrants further investigation, and maybe fixes.

> diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
> index ef654530bf5a..23879f14aa51 100644
> --- a/arch/x86/kernel/signal_32.c
> +++ b/arch/x86/kernel/signal_32.c
> @@ -33,6 +33,29 @@
>  #include <asm/smap.h>
>  #include <asm/gsseg.h>
>  
> +/*
> + * The first GDT descriptor is reserved as 'NULL descriptor'.  As bits 0
> + * and 1 of a segment selector, i.e., the RPL bits, are NOT used to index
> + * GDT, selector values 0~3 all point to the NULL descriptor, thus values
> + * 0, 1, 2 and 3 are all valid NULL selector values.
> + *
> + * However IRET zeros ES, FS, GS, and DS segment registers if any of them
> + * is found to have any nonzero NULL selector value, which can be used by
> + * userspace in pre-FRED systems to spot any interrupt/exception by loading
> + * a nonzero NULL selector and waiting for it to drop to zero.

I know I wrote "drop to zero", but in hindsight, I think "become zero"
would be better.

>   Before FRED
> + * there is nothing we can do to prevent such an information leak.
> + *
> + * ERETU, the only legit instruction to return to userspace from kernel
> + * under FRED, by design does NOT zero any segment register to avoid this
> + * problem behavior.
> + *
> + * As such, leave NULL selector values 0~3 as is.
> + */
> +static inline u16 usrseg(u16 sel)

I would suggest naming this fixup_rpl() which is a bit clearer as to its
intent.

However, I would also recommend u32 (or at least, unsigned int).

It's absolutely marginal, but you do get better code generation by
avoiding u16 specifically where possible.

https://godbolt.org/z/MnnvW461f

> +{
> +	return sel <= 3 ? sel : sel | 3;
> +}
> +
>  #ifdef CONFIG_IA32_EMULATION
>  #include <asm/unistd_32_ia32.h>
>  
> @@ -41,17 +64,17 @@ static inline void reload_segments(struct sigcontext_32 *sc)
>  	unsigned int cur;
>  
>  	savesegment(gs, cur);
> -	if ((sc->gs | 0x03) != cur)
> -		load_gs_index(sc->gs | 0x03);
> +	if (usrseg(sc->gs) != cur)
> +		load_gs_index(usrseg(sc->gs));
>  	savesegment(fs, cur);
> -	if ((sc->fs | 0x03) != cur)
> -		loadsegment(fs, sc->fs | 0x03);
> +	if (usrseg(sc->fs) != cur)
> +		loadsegment(fs, usrseg(sc->fs));
>  	savesegment(ds, cur);
> -	if ((sc->ds | 0x03) != cur)
> -		loadsegment(ds, sc->ds | 0x03);
> +	if (usrseg(sc->ds) != cur)
> +		loadsegment(ds, usrseg(sc->ds));
>  	savesegment(es, cur);
> -	if ((sc->es | 0x03) != cur)
> -		loadsegment(es, sc->es | 0x03);
> +	if (usrseg(sc->es) != cur)
> +		loadsegment(es, usrseg(sc->es));
>  }
>  
>  #define sigset32_t			compat_sigset_t
> @@ -113,10 +136,10 @@ static bool ia32_restore_sigcontext(struct pt_regs *regs,
>  	 */
>  	reload_segments(&sc);

This is the singular caller of reload_segments(), and the comment out of
context does not match the implementation.

It probably wants inlining just so all the segment juggling is in one place.

>  #else
> -	loadsegment(gs, sc.gs);
> -	regs->fs = sc.fs;
> -	regs->es = sc.es;
> -	regs->ds = sc.ds;
> +	loadsegment(gs, usrseg(sc.gs));
> +	regs->fs = usrseg(sc.fs);
> +	regs->es = usrseg(sc.es);
> +	regs->ds = usrseg(sc.ds);
>  #endif

Why is GS handled specially?

Both, 1) Why is regs->gs the only value that doesn't an RPL-adjusted
value, and 2) why do we need to reload it here?  We need to keep it as
the per_cpu pointer anyway, and we're going to reload on exit-to-user,
aren't we?

Also, why do we have such wildly-different behaviours depending on
IA32_EMULATION or not?

~Andrew