linux-kernel - Re: BUG: unable to handle kernel paging request from pty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56CF756E.5000704@hurleysoftware.com>
Date:	Thu, 25 Feb 2016 13:43:10 -0800
From:	Peter Hurley <peter@...leysoftware.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Jiri Slaby <jslaby@...e.cz>, Greg KH <gregkh@...uxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	stable <stable@...r.kernel.org>, lwn@....net,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: BUG: unable to handle kernel paging request from pty_write [was:
 Linux 4.4.2]

On 02/25/2016 12:51 PM, Linus Torvalds wrote:
> On Thu, Feb 25, 2016 at 12:32 PM, Peter Hurley <peter@...leysoftware.com> wrote:
>>> But yes, the call trace looks accurate and makes sense, we haveL
>>>
>>>   tty_flip_buffer_push ->
>>>     (queue_work is inline) ->
>>>     queue_work_on ->
>>>       __queue_work ->
>>>         insert_work ->
>>>           (wake_up_worker is inlined)
>>>           wake_up_process ->
>>
>>               try_to_wake_up ->
>>
>>>             *insane non-code address*
> 
> The thing is, we don't actually have that try_to_wake_up() on the
> stack in the oops report.

I know, but last execution prior to things going sideways
was definitely in try_to_wake_up().

> There are other thigns on the stack, but the
> first stack entry that is dumped that is a text address is that
> "ffffffff810a5585" which is wake_up_process.
> 
> That's why I said it might be stack corruption: we might be returning
> from try_to_wake_up(), but with a corrupt stack entry, and returning
> to garbage.
> 
> If it was one of the calls _in_ try_to_wake_up() that called to insane
> code, I would have expected to see try_to_wake_up on the stack.

Agreed, how execution got from try_to_wake_up() to mysterious
percpu address without call is the question.

> That's particularly true on modern machines, where things like the
> percpu area is nopefully marked NX, so that we shouldn't be executing
> random instructions. Which is the fault that actually triggers
> ("kernel tried to execute NX-protected page"), so the "we corrupted
> the stack by running random code at the original target of the jump"
> scenario sounds much less likely.
> 
> So the whole oops looks odd. If it really was one of the calls from
> try_to_wake_up(), why isn't that return address on the stack?

I don't think it's anything from code flow.

> Since this is under qemu, I'm wondering if this is a qemu bug, where
> the NX fault processing of a call instruction happens before the stack
> is pushed, but when the instruction pointer already points to the new
> address.

Or any fault processing really; an iret to the bogus address
would then trigger NX fault without leaving a trace of the broken
exception handling.


> Another alternative *might* be that gcc has turned an indirect
> tail-call call into a "jmp *", but I certainly don't see that when I
> compile the file myself. I've seen it in the past in some (very
> unusual) cases, so it's possible - gcc definitely knows about
> tail-call jmp conversion (even if it makes debugging sometimes a
> pain).
> 
> Jiri, can you check your try_to_wake_up() disassembly for some
> indirect "jmp" instructions?
> 
>                         Linus
>