[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <029284c6-f103-46e0-9acd-9e9e816e7ffb@citrix.com>
Date: Thu, 21 Nov 2024 19:43:38 +0000
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: "Xin Li (Intel)" <xin@...or.com>, linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
brgerst@...il.com, ebiederm@...ssion.com
Subject: Re: [PATCH v3 1/1] x86/ia32: Leave NULL selector values 0~3 as is
On 21/11/2024 5:54 pm, Xin Li (Intel) wrote:
> The first GDT descriptor is reserved as 'NULL descriptor'. As bits 0
> and 1 of a segment selector, i.e., the RPL bits, are NOT used to index
> GDT, selector values 0~3 all point to the NULL descriptor, thus values
> 0, 1, 2 and 3 are all valid NULL selector values.
>
> When a NULL selector value is to be loaded into a segment register,
> reload_segments() sets its RPL bits. Later IRET zeros ES, FS, GS, and
> DS segment registers if any of them is found to have any nonzero NULL
> selector value. The two operations offset each other to actually effect
> a nop.
>
> Besides, zeroing of RPL in NULL selector values is an information leak
> in pre-FRED systems as userspace can spot any interrupt/exception by
> loading a nonzero NULL selector, and waiting for it to drop to zero.
> But there is nothing software can do to prevent it before FRED.
>
> ERETU, the only legit instruction to return to userspace from kernel
> under FRED, by design does NOT zero any segment register to avoid this
> problem behavior.
>
> As such, leave NULL selector values 0~3 as is.
>
> Do the same on 32-bit kernel as well.
>
> Signed-off-by: Xin Li (Intel) <xin@...or.com>
As far as fixing up RPL goes, I think the patch is fine, and probably
wants to be taken in roughly this form (new minor points below).
However, the pre-existing code is doing something entirely bizarre,
which warrants further investigation, and maybe fixes.
> diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
> index ef654530bf5a..23879f14aa51 100644
> --- a/arch/x86/kernel/signal_32.c
> +++ b/arch/x86/kernel/signal_32.c
> @@ -33,6 +33,29 @@
> #include <asm/smap.h>
> #include <asm/gsseg.h>
>
> +/*
> + * The first GDT descriptor is reserved as 'NULL descriptor'. As bits 0
> + * and 1 of a segment selector, i.e., the RPL bits, are NOT used to index
> + * GDT, selector values 0~3 all point to the NULL descriptor, thus values
> + * 0, 1, 2 and 3 are all valid NULL selector values.
> + *
> + * However IRET zeros ES, FS, GS, and DS segment registers if any of them
> + * is found to have any nonzero NULL selector value, which can be used by
> + * userspace in pre-FRED systems to spot any interrupt/exception by loading
> + * a nonzero NULL selector and waiting for it to drop to zero.
I know I wrote "drop to zero", but in hindsight, I think "become zero"
would be better.
> Before FRED
> + * there is nothing we can do to prevent such an information leak.
> + *
> + * ERETU, the only legit instruction to return to userspace from kernel
> + * under FRED, by design does NOT zero any segment register to avoid this
> + * problem behavior.
> + *
> + * As such, leave NULL selector values 0~3 as is.
> + */
> +static inline u16 usrseg(u16 sel)
I would suggest naming this fixup_rpl() which is a bit clearer as to its
intent.
However, I would also recommend u32 (or at least, unsigned int).
It's absolutely marginal, but you do get better code generation by
avoiding u16 specifically where possible.
https://godbolt.org/z/MnnvW461f
> +{
> + return sel <= 3 ? sel : sel | 3;
> +}
> +
> #ifdef CONFIG_IA32_EMULATION
> #include <asm/unistd_32_ia32.h>
>
> @@ -41,17 +64,17 @@ static inline void reload_segments(struct sigcontext_32 *sc)
> unsigned int cur;
>
> savesegment(gs, cur);
> - if ((sc->gs | 0x03) != cur)
> - load_gs_index(sc->gs | 0x03);
> + if (usrseg(sc->gs) != cur)
> + load_gs_index(usrseg(sc->gs));
> savesegment(fs, cur);
> - if ((sc->fs | 0x03) != cur)
> - loadsegment(fs, sc->fs | 0x03);
> + if (usrseg(sc->fs) != cur)
> + loadsegment(fs, usrseg(sc->fs));
> savesegment(ds, cur);
> - if ((sc->ds | 0x03) != cur)
> - loadsegment(ds, sc->ds | 0x03);
> + if (usrseg(sc->ds) != cur)
> + loadsegment(ds, usrseg(sc->ds));
> savesegment(es, cur);
> - if ((sc->es | 0x03) != cur)
> - loadsegment(es, sc->es | 0x03);
> + if (usrseg(sc->es) != cur)
> + loadsegment(es, usrseg(sc->es));
> }
>
> #define sigset32_t compat_sigset_t
> @@ -113,10 +136,10 @@ static bool ia32_restore_sigcontext(struct pt_regs *regs,
> */
> reload_segments(&sc);
This is the singular caller of reload_segments(), and the comment out of
context does not match the implementation.
It probably wants inlining just so all the segment juggling is in one place.
> #else
> - loadsegment(gs, sc.gs);
> - regs->fs = sc.fs;
> - regs->es = sc.es;
> - regs->ds = sc.ds;
> + loadsegment(gs, usrseg(sc.gs));
> + regs->fs = usrseg(sc.fs);
> + regs->es = usrseg(sc.es);
> + regs->ds = usrseg(sc.ds);
> #endif
Why is GS handled specially?
Both, 1) Why is regs->gs the only value that doesn't an RPL-adjusted
value, and 2) why do we need to reload it here? We need to keep it as
the per_cpu pointer anyway, and we're going to reload on exit-to-user,
aren't we?
Also, why do we have such wildly-different behaviours depending on
IA32_EMULATION or not?
~Andrew
Powered by blists - more mailing lists