linux-kernel - Re: BUG: unable to handle kernel paging request from pty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAP145phdemqK0hCSw9m5E2TZbrBojnzqAVZ97O5APtLA6Abr7g@mail.gmail.com>
Date:	Fri, 26 Feb 2016 20:59:59 +0100
From:	Robert Święcki <robert@...ecki.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Peter Hurley <peter@...leysoftware.com>,
	Jiri Slaby <jslaby@...e.cz>,
	Greg KH <gregkh@...uxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	stable <stable@...r.kernel.org>, lwn@....net,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: BUG: unable to handle kernel paging request from pty_write [was:
 Linux 4.4.2]

2016-02-26 20:44 GMT+01:00 Linus Torvalds <torvalds@...ux-foundation.org>:

>> I've contacted Robert Święcki (who found the microcode problem) in
>> case he wants to weigh in in this thread.. He was talking to some AMD
>> people, but I don't know the exactly who.
>
> And since it's looking increasingly likely that it's the same issue,
> I'm adding Robert here explicitly to the cc so that he sees the
> thread...

Thx,

Some data I was able to gather:

It happens only with 0x6000832 ucode, and Piledriver-based CPUs: i.e.
newer AMD FX, and Opteron 300 series (4300, 6300 etc.).

The visible effects are in ~80% of cases incorrect RSP leading to bad
'rets' into kernel data/bss or stack-protector faults. But there are
also more elusive ones, like registers being cleared before use in
indirect memory fetches or so.

I can trigger it from within qemu guest (non-root), causing bad RIP in
the host kernel. When testing, a couple of times (maybe 2) out of
maybe 30 seen oopses, I was able to set it to user-space addresses
mapped in the guest. It greatly depends on timing, but I think with
some more effort and populating kernel stack with guest addresses it'd
be possible to create a more reliable qemu-guest to host ring0 escape.

I CC'd some AMD engineers from this list, and on of them replied with
"We are working on the final testing of a new microcode patch to
replace 0x06000832."
but without specifying any errata no, or ETA for the new ucode.

I can only now suggest not using 0x06000832 is possible (i.e. if it's
not embedded in BIOS), I tested a few from
http://www.amd64.org/microcode.html and only this version seemed
vulnerable.

PS. There's a bug on vmware pages -
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2061211
- which looks very similar to this problem (affects Opteron 6300 which
is Piledriver-based), and it was "somehow" patched by vmware in their
kernel. It points to AMD errata #815 -
http://support.amd.com/TechDocs/48063_15h_Mod_00h-0Fh_Rev_Guide.pdf -
but I cannot tell whether it's really the same problem, or whether it
can be somehow by-passed on the kernel side.

-- 
Robert Święcki