lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 24 Apr 2015 08:00:22 -0400
From:	Brian Gerst <brgerst@...il.com>
To:	Denys Vlasenko <dvlasenk@...hat.com>
Cc:	Andy Lutomirski <luto@...nel.org>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andy Lutomirski <luto@...capital.net>,
	Borislav Petkov <bp@...en8.de>,
	Denys Vlasenko <vda.linux@...glemail.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Will Drewry <wad@...omium.org>,
	Kees Cook <keescook@...omium.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86_64, asm: Work around AMD SYSRET SS descriptor
 attribute issue

On Fri, Apr 24, 2015 at 7:27 AM, Denys Vlasenko <dvlasenk@...hat.com> wrote:
> On 04/24/2015 04:15 AM, Andy Lutomirski wrote:
>> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
>> with SS == 0 results in an invalid usermode state in which SS is
>> apparently equal to __USER_DS but causes #SS if used.
>>
>> Work around the issue by replacing NULL SS values with __KERNEL_DS
>> in __switch_to, thus ensuring that SYSRET never happens with SS set
>> to NULL.
>>
>> This was exposed by a recent vDSO cleanup.
>>
>> Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
>> Signed-off-by: Andy Lutomirski <luto@...nel.org>
>> ---
>>
>> Tested only on Intel, which isn't very interesting.  I'll tidy up
>> and send a test case, too, once Borislav confirms that it works.
>>
>> Please don't actually apply this until we're sure we understand the
>> scope of the issue.  If this doesn't affect SYSRETQ, then we might
>> to fix it on before SYSRETL to avoid impacting 64-bit processes
>> at all.
>>
>>  arch/x86/ia32/ia32entry.S         |  5 +++++
>>  arch/x86/include/asm/cpufeature.h |  1 +
>>  arch/x86/kernel/cpu/amd.c         |  3 +++
>>  arch/x86/kernel/entry_64.S        |  6 ++++++
>>  arch/x86/kernel/process_64.c      | 25 +++++++++++++++++++++++++
>>  5 files changed, 40 insertions(+)
>>
>> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> index 3c9fadf8b95c..ff519ea8b054 100644
>> --- a/arch/x86/ia32/ia32entry.S
>> +++ b/arch/x86/ia32/ia32entry.S
>> @@ -421,6 +421,11 @@ sysretl_from_sys_call:
>>        * cs and ss are loaded from MSRs.
>>        * (Note: 32bit->32bit SYSRET is different: since r11
>>        * does not exist, it merely sets eflags.IF=1).
>> +      *
>> +      * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
>> +      * descriptor is not reinitialized.  This means that we must
>> +      * avoid SYSRET with SS == NULL.  We prevent that from happening
>> +      * by reloading SS in __switch_to.
>>        */
>>       USERGS_SYSRET32
>>
>> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
>> index 854c04b3c9c2..7e244f626301 100644
>> --- a/arch/x86/include/asm/cpufeature.h
>> +++ b/arch/x86/include/asm/cpufeature.h
>> @@ -257,6 +257,7 @@
>>  #define X86_BUG_11AP         X86_BUG(5) /* Bad local APIC aka 11AP */
>>  #define X86_BUG_FXSAVE_LEAK  X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */
>>  #define X86_BUG_CLFLUSH_MONITOR      X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */
>> +#define X86_BUG_SYSRET_SS_ATTRS      X86_BUG(8) /* SYSRET doesn't fix up SS attrs */
>>
>>  #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
>>
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index fd470ebf924e..e4cf63301ff4 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -720,6 +720,9 @@ static void init_amd(struct cpuinfo_x86 *c)
>>       if (!cpu_has(c, X86_FEATURE_3DNOWPREFETCH))
>>               if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))
>>                       set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
>> +
>> +     /* AMD CPUs don't reset SS attributes on SYSRET */
>> +     set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>>  }
>>
>>  #ifdef CONFIG_X86_32
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 0034012d04ea..ffeaed0534d8 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -295,6 +295,12 @@ system_call_fastpath:
>>        * rflags from r11 (but RF and VM bits are forced to 0),
>>        * cs and ss are loaded from MSRs.
>>        * Restoration of rflags re-enables interrupts.
>> +      *
>> +      * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
>> +      * descriptor is not reinitialized.  This means that we should
>> +      * avoid SYSRET with SS == NULL.  We prevent that from happening
>> +      * by reloading SS in __switch_to.  (Actually detecting the failure
>> +      * in 64-bit userspace is tricky but can be done.)
>>        */
>>       USERGS_SYSRET64
>>
>> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
>> index 4baaa972f52a..919d6c2abded 100644
>> --- a/arch/x86/kernel/process_64.c
>> +++ b/arch/x86/kernel/process_64.c
>> @@ -419,6 +419,31 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>>                    task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
>>               __switch_to_xtra(prev_p, next_p, tss);
>>
>> +     if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) {
>> +             /*
>> +              * AMD CPUs have a misfeature: SYSRET sets the SS selector
>> +              * but does not update the cached descriptor.  As a result,
>> +              * if we do SYSRET while SS is NULL, we'll end up in user
>> +              * mode with SS apparently equal to __USER_DS but actually
>> +              * unusable.
>
> Perhaps it makes sense to mention here in comment
> _how_ SS may become NULL: namely, that interrupts/exceptions
> from !CPL0 load SS with null selector.
> Before this discussion, I had only vaguest idea that
> "I read something about that somewhere in the docs".
>
> Which also suggests that this if() is unnecessary if
> interrupts never cause task to switch, when they always return
> to the same task they interrupted. They return with IRET,
> thus bad SS cached descriptor is not leaked to userspace.
>
> IOW: frame this if() in "#ifdef CONFIG_PREEMPT".

Intel's docs say that SS will be set to NULL if the CPL changes, or an
IST stack is used.  It implies that SS is not changed if CPL doesn't
change and IST=0.

AMD also changes to NULL SS on a CPL change, but does not mention it
for IST stack switches.

So actually this isn't a preemption issue, as the NULL SS is coming
from an interrupt from userspace (timer tick, etc.).

Another alternative to consider is setting SS=__KERNEL_DS on interrupt
entry if it's NULL.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ