linux-kernel - Re: [PATCH] x86 : Ensure X86_FLAGS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUVYY8E+EcfZp6xjN60zwvNM9jPgi5bYjeUxE-Vhne6ow@mail.gmail.com>
Date:	Mon, 29 Sep 2014 11:43:38 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Sebastian Lackner <sebastian@...-team.de>
Cc:	Anish Bhatt <anish@...lsio.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	X86 ML <x86@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86 : Ensure X86_FLAGS_NT is cleared on syscall entry

On Mon, Sep 29, 2014 at 11:30 AM, Sebastian Lackner
<sebastian@...-team.de> wrote:
> On 29.09.2014 19:40, Andy Lutomirski wrote:
>> On 09/25/2014 12:42 PM, Anish Bhatt wrote:
>>> The MSR_SYSCALL_MASK, which is responsible for clearing specific EFLAGS on
>>>  syscall entry, should also clear the nested task (NT) flag to be safe from
>>>  userspace injection. Without this fix the application segmentation
>>>  faults on syscall return because of the changed meaning of the IRET
>>>  instruction.
>>>
>>> Further details can be seen here https://bugs.winehq.org/show_bug.cgi?id=33275
>>>
>>> Signed-off-by: Anish Bhatt <anish@...lsio.com>
>>> Signed-off-by: Sebastian Lackner <sebastian@...-team.de>
>>> ---
>>>  arch/x86/kernel/cpu/common.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
>>> index e4ab2b4..3126558 100644
>>> --- a/arch/x86/kernel/cpu/common.c
>>> +++ b/arch/x86/kernel/cpu/common.c
>>> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>>>      /* Flags to clear on syscall */
>>>      wrmsrl(MSR_SYSCALL_MASK,
>>>             X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
>>> -           X86_EFLAGS_IOPL|X86_EFLAGS_AC);
>>> +           X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>>
>> Something's weird here, and at the very least the changelog is
>> insufficiently informative.
>>
>> The Intel SDM says:
>>
>> If the NT flag is set and the processor is in IA-32e mode, the IRET
>> instruction causes a general protection exception.
>>
>> Presumably interrupt delivery clears NT.  I haven't spotted where that's
>> documented yet.
>
> Well, the best documentation I've found is something like
> http://www.fermimn.gov.it/linux/quarta/x86/int.htm
>
> which states:
>
> --- snip ---
> INTERRUPT-TO-INNER-PRIVILEGE:
>    [...]
>    TF := 0;
>    NT := 0;
> --- snip ---
> (Doesn't say anything about HW interrupts though)
>
> This also makes sense at my opinion, since the interrupt handler has to know if it should return
> to the previous task (when NT=1) or to the same task (when NT=0).
>
>>
>> sysret doesn't appear to care about NT at all.
>>
>> So: the test code doesn't appear to do anything interesting *unless* it
>> goes through syscall followed by the iret exit path.  Then it receives
>> #GP on return, which turns into a signal.
>
> Yep, thats also my interpretation of this issue. If the processor would be in 32-bit/protected-mode the
> NT flag would be interpreted as a task return, and it would probably cause a different exception,
> because the kernel never uses the task link property of the TSS.
>
>>
>> On the premise that the slow and fast return paths ought to be
>> indistinguishable from userspace, I think we should fix this.  But I
>> want to understand it better first.
>
> A reliable way to force the slow return path is to use ptrace, see:
> http://lxr.free-electrons.com/source/arch/x86/kernel/entry_64.S#L544
>
> This also matches the experience: The test application only crashes with a small probability,
> except you use strace, then it will always crash (because the kernel forces the slow return path).
>
> Two additional remarks:
>
> * A reliable way to let it crash without strace, is to run the fork()/clone() syscall afterwards and
>   compile as 32-bit.
>
> * When you run exec*() afterwards, the crash will happen at the entry of the new executable. Doesn't
>   matter if the target process is SUID or not. I don't see a way to exploit this issue, though, but
>   probably some more people should take a look at it...
>
>>
>> Also, 32-bit may need more care here.
>
> That might be possible. It probably makes sense to review other parts of the code, for similar issues.

sysenter probably has the same problem.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/