[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160209120617.GC4119@pd.tnic>
Date: Tue, 9 Feb 2016 13:06:17 +0100
From: Borislav Petkov <bp@...en8.de>
To: Andy Lutomirski <luto@...nel.org>
Cc: X86 ML <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
Stas Sergeev <stsp@...t.ru>,
Cyrill Gorcunov <gorcunov@...il.com>,
Pavel Emelyanov <xemul@...allels.com>
Subject: Re: [PATCH v3 2/4] x86/signal/64: Fix SS if needed when delivering a
64-bit signal
On Mon, Jan 25, 2016 at 01:34:13PM -0800, Andy Lutomirski wrote:
> Signals are always delivered to 64-bit tasks with CS set to a long
> mode segment. In long mode, SS doesn't matter as long as it's a
> present writable segment.
>
> If SS starts out invalid (this can happen if the signal was caused
> by an IRET fault or was delivered on the way out of set_thread_area
> or modify_ldt), then IRET to the signal handler can fail, eventually
> killing the task.
>
> The straightforward fix would be to simply reset SS when delivering
> a signal. That breaks DOSEMU, though: 64-bit builds of DOSEMU rely
> on SS being set to the faulting SS when signals are delivered.
>
> As a compromise, this patch leaves SS alone so long as it's valid.
>
> The net effect should be that the behavior of successfully delivered
> signals is unchanged. Some signals that would previously have
> failed to be delivered will now be delivered successfully.
>
> This has no effect for x32 or 32-bit tasks: their signal handlers
> were already called with SS == __USER_DS.
>
> (On Xen, there's a slight hole: if a task sets SS to a writable
> *kernel* data segment, then we will fail to identify it as invalid
> and we'll still kill the task. If anyone cares, this could be fixed
> with a new paravirt hook.)
>
> Signed-off-by: Andy Lutomirski <luto@...nel.org>
> ---
> arch/x86/include/asm/desc_defs.h | 23 ++++++++++++++++++
> arch/x86/kernel/signal.c | 51 ++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 72 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/desc_defs.h b/arch/x86/include/asm/desc_defs.h
> index 278441f39856..00971705a16d 100644
> --- a/arch/x86/include/asm/desc_defs.h
> +++ b/arch/x86/include/asm/desc_defs.h
> @@ -98,4 +98,27 @@ struct desc_ptr {
>
> #endif /* !__ASSEMBLY__ */
>
> +/* Access rights as returned by LAR */
> +#define AR_TYPE_RODATA (0 * (1 << 9))
> +#define AR_TYPE_RWDATA (1 * (1 << 9))
> +#define AR_TYPE_RODATA_EXPDOWN (2 * (1 << 9))
> +#define AR_TYPE_RWDATA_EXPDOWN (3 * (1 << 9))
> +#define AR_TYPE_XOCODE (4 * (1 << 9))
> +#define AR_TYPE_XRCODE (5 * (1 << 9))
> +#define AR_TYPE_XOCODE_CONF (6 * (1 << 9))
> +#define AR_TYPE_XRCODE_CONF (7 * (1 << 9))
> +#define AR_TYPE_MASK (7 * (1 << 9))
> +
> +#define AR_DPL0 (0 * (1 << 13))
> +#define AR_DPL3 (3 * (1 << 13))
> +#define AR_DPL_MASK (3 * (1 << 13))
> +
> +#define AR_A (1 << 8) /* A means "accessed" */
> +#define AR_S (1 << 12) /* S means "not system" */
Ah, with "not system" you want to say that S=0b makes it a system
descriptor and S=1b a user. I think the SDM calls it more descriptively
the "S (descriptor type) flag" while the APM calls it simply the S-field
or S-bit.
I like "S (descriptor type) flag" more than "not system". :)
> +#define AR_P (1 << 15) /* P means "present" */
> +#define AR_AVL (1 << 20) /* AVL does nothing */
AVL = AVaiLable to software
> +#define AR_L (1 << 21) /* L means "long mode" */
> +#define AR_DB (1 << 22) /* D or B, depending on type */
> +#define AR_G (1 << 23) /* G means "limit in pages" */
Please use the names from the processor manuals. G is the Granularity
bit. "limit in pages" is only clear to the people who have already read
the Granularity bit description. :-)
> #endif /* _ASM_X86_DESC_DEFS_H */
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index cb6282c3638f..bb3e4208d90d 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -61,6 +61,35 @@
> regs->seg = GET_SEG(seg) | 3; \
> } while (0)
>
> +#ifdef CONFIG_X86_64
You already have an
#else /* !CONFIG_X86_32 */
block above the 64-bit version of __setup_rt_frame(). Just put
force_valid_ss() there without that additional ifdef. That file's
ifdeffery is beyond any readability anyway.
> +/*
> + * If regs->ss will cause an IRET fault, change it. Otherwise leave it
> + * alone. Using this generally makes no sense unless
> + * user_64bit_mode(regs) would return true.
> + */
> +static void force_valid_ss(struct pt_regs *regs)
> +{
> + u32 ar;
> + asm volatile ("lar %[old_ss], %[ar]\n\t"
> + "jz 1f\n\t" /* If invalid: */
> + "xorl %[ar], %[ar]\n\t" /* set ar = 0 */
> + "1:"
> + : [ar] "=r" (ar)
> + : [old_ss] "rm" ((u16)regs->ss));
> +
> + /*
> + * For a valid 64-bit user context, we need DPL 3, type
> + * read-write data or read-write exp-down data, and S and P
> + * set. We can't use VERW because VERW doesn't check the
> + * P bit.
> + */
> + ar &= AR_DPL_MASK | AR_S | AR_P | AR_TYPE_MASK;
> + if (ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA) &&
> + ar != (AR_DPL3 | AR_S | AR_P | AR_TYPE_RWDATA_EXPDOWN))
> + regs->ss = __USER_DS;
> +}
> +#endif
> +
> int restore_sigcontext(struct pt_regs *regs, struct sigcontext __user *sc)
> {
> unsigned long buf_val;
> @@ -459,10 +488,28 @@ static int __setup_rt_frame(int sig, struct ksignal *ksig,
>
> regs->sp = (unsigned long)frame;
>
> - /* Set up the CS register to run signal handlers in 64-bit mode,
> - even if the handler happens to be interrupting 32-bit code. */
> + /*
> + * Set up the CS and SS registers to run signal handlers in
> + * 64-bit mode, even if the handler happens to be interrupting
> + * 32-bit or 16-bit code.
> + *
> + * SS is subtle. In 64-bit mode, we don't need any particular
> + * SS descriptor, but we do need SS to be valid. It's possible
> + * that the old SS is entirely bogus -- this can happen if the
> + * signal we're trying to deliver is #GP or #SS caused by a bad
> + * SS value. We also have a compatbility issue here: DOSEMU
> + * relies on the contents of the SS register indicating the
> + * SS value at the time of the signal, even though that code in
> + * DOSEMU predates sigreturn's ability to restore SS. (DOSEMU
> + * avoids relying on sigreturn to restore SS; instead it uses
> + * a trampoline.) So we do our best: if the old SS was valid,
> + * we keep it. Otherwise we replace it.
> + */
> regs->cs = __USER_CS;
>
> + if (unlikely(regs->ss != __USER_DS))
So this is fast path AFAICT and from adding a gdb breakpoint here.
I guess we can't do the opt-in behavior and patch it out when users
don't want to run dosemu.
Or maybe we could add a CONFIG_CHECK_OLD_SS which is default y and
people can disable it... so an opt-out behavior :)
Hmmm...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
Powered by blists - more mailing lists