linux-kernel - save_xstate_sig (Re: frequent lockups in 3.18rc4)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <54934487.3010608@mit.edu>
Date:	Thu, 18 Dec 2014 13:17:59 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>, Chris Mason <clm@...com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
CC:	Suresh Siddha <sbsiddha@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Anvin <hpa@...ux.intel.com>
Subject: save_xstate_sig (Re: frequent lockups in 3.18rc4)

On 12/14/2014 09:47 PM, Linus Torvalds wrote:
> On Sun, Dec 14, 2014 at 4:38 PM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> Can anybody make sense of that backtrace, keeping in mind that we're
>> looking for some kind of endless loop where we don't make progress?
>
> So looking at all the backtraces, which is kind of messy because
> there's some missing data (presumably buffers overflowed from all the
> CPU's printing at the same time), it looks  like:
>
>   - CPU 0 is missing. No idea why.
>   - CPU's 1-3 all have the same trace for
>
>      int_signal ->
>      do_notify_resume ->
>      do_signal ->
>        ....
>      page_fault ->
>      do_page_fault
>
> and "save_xstate_sig+0x81" shows up on all stacks, although only on
> CPU1 does it show up as a "guaranteed" part of the stack chain (ie it
> matches frame pointer data too). CPU1 also has that __clear_user show
> up (which is called from save_xstate_sig), but not other CPU's.  CPU2
> and CPU3 have "save_xstate_sig+0x98" in addition to that +0x81 thing.
>
> My guess is that "save_xstate_sig+0x81" is the instruction after the
> __clear_user call, and that CPU1 took the fault in __clear_user(),
> while CPU2 and CPU3 took the fault at "save_xstate_sig+0x98" instead,
> which I'd guess is the
>
>          xsave64 (%rdi)

I admit that my understanding of the disaster that is x86's FPU handling 
is limited, but I'm moderately confident that save_xstate_sig is broken.

The code is:

	if (user_has_fpu()) {
		/* Save the live register state to the user directly. */
		if (save_user_xstate(buf_fx))
			return -1;
		/* Update the thread's fxstate to save the fsave header. */
		if (ia32_fxstate)
			fpu_fxsave(&tsk->thread.fpu);
	} else {
		sanitize_i387_state(tsk);
		if (__copy_to_user(buf_fx, xsave, xstate_size))
			return -1;
	}

Suppose that user_has_fpu() returns true, we call save_user_xstate, and 
the xsave instruction (or anything else in there, for that matter) 
causes a page fault.

The page fault handler is well within its rights to schedule.  At that 
point, *we might not own the FPU any more*, depending on the vagaries of 
eager vs lazy mode.  So, when we schedule back in and resume from the 
page fault, we are in the wrong branch of the if statement.

At this point, we're going to write garbage (possibly sensitive garbage) 
to the userspace signal frame.  I don't see why this would cause an 
infinite loop, but I don't think it's healthy.

FWIW, if xsave traps with cr2 value, then there would indeed be an 
infinite loop in here.  It seems to work right on my machine.  Dave, 
want to run the attached little test?

--Andy

View attachment "xsave_cr2.c" of type "text/plain" (2841 bytes)