linux-kernel - Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7B91910F-C8B5-4E8B-BE1C-15B1E45FA345@zytor.com>
Date:	Tue, 29 Sep 2015 11:18:37 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Andy Lutomirski <luto@...capital.net>,
	Ingo Molnar <mingo@...nel.org>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Vrabel <david.vrabel@...rix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Brian Gerst <brgerst@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	Gleb Natapov <gleb@...nel.org>,
	Kees Cook <keescook@...omium.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Joerg Roedel <joro@...tes.org>, X86 ML <x86@...nel.org>,
	kvm list <kvm@...r.kernel.org>
Subject: Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

SGDT would be easy to use, and it is logical that it is faster since it reads an internal register.  SIDT does too but unlike the GDT has a secondary limit (it can never be larger than 4096 bytes) and so all limits in the range 4095-65535 are exactly equivalent.

Anything that causes a write to the GDT will #PF if read-only.  So yes, we need to force the accessed bit to set.  This shouldn't be a problem and in fact ought to be a performance improvement.

On September 29, 2015 10:35:38 AM PDT, Andy Lutomirski <luto@...capital.net> wrote:
>On Sep 29, 2015 2:01 AM, "Ingo Molnar" <mingo@...nel.org> wrote:
>>
>>
>> * Denys Vlasenko <dvlasenk@...hat.com> wrote:
>>
>> > On 09/28/2015 09:58 AM, Ingo Molnar wrote:
>> > >
>> > > * Denys Vlasenko <dvlasenk@...hat.com> wrote:
>> > >
>> > >> On 09/26/2015 09:50 PM, H. Peter Anvin wrote:
>> > >>> NAK.  We really should map the GDT read-only on all 64 bit
>systems,
>> > >>> since we can't hide the address from SLDT.  Same with the IDT.
>> > >>
>> > >> Sorry, I don't understand your point.
>> > >
>> > > So the problem is that right now the SGDT instruction (which is
>unprivileged)
>> > > leaks the real address of the kernel image:
>> > >
>> > >  fomalhaut:~> ./sgdt
>> > >  SGDT: ffff88303fd89000 / 007f
>> > >
>> > > that 'ffff88303fd89000' is a kernel address.
>> >
>> > Thank you.
>> > I do know that SGDT and friends are unprivileged on x86
>> > and thus they allow userspace (and guest kernels in paravirt)
>> > learn things they don't need to know.
>> >
>> > I don't see how making GDT page-aligned and page-sized
>> > changes anything in this regard. SGDT will still work,
>> > and still leak GDT address.
>>
>> Well, as I try to explain it in the other part of my mail, doing so
>enables us to
>> remap the GDT to a less security sensitive virtual address that does
>not leak the
>> kernel's randomized address:
>>
>> > > Your observation in the changelog and your patch:
>> > >
>> > >>>> It is page-sized because of paravirt. [...]
>> > >
>> > > ... conflicts with the intention to mark (remap) the primary GDT
>address read-only
>> > > on native kernels as well.
>> > >
>> > > So what we should do instead is to use the page alignment
>properly and remap the
>> > > GDT to a read-only location, and load that one.
>> >
>> > If we'd have a small GDT (i.e. what my patch does), we still can
>remap the
>> > entire page which contains small GDT, and simply don't care that
>some other data
>> > is also visible through that RO page.
>>
>> That's generally considered fragile: suppose an attacker has a
>limited information
>> leak that can read absolute addresses with system privilege but he
>doesn't know
>> the kernel's randomized base offset. With a 'partial page' mapping
>there could be
>> function pointers near the GDT, part of the page the GDT happens to
>be on, that
>> leak this information.
>>
>> (Same goes for crypto keys or other critical information (like canary
>information,
>> salts, etc.) accidentally ending up nearby.)
>>
>> Arguably it's a bit tenuous, but when playing remapping games it's
>generally
>> considered good to be page aligned and page sized, with zero padding.
>>
>> > > This would have a couple of advantages:
>> > >
>> > >  - This would give kernel address randomization more teeth on
>x86.
>> > >
>> > >  - An additional advantage would be that rootkits overwriting the
>GDT would have
>> > >    a bit more work to do.
>> > >
>> > >  - A third advantage would be that for NUMA systems we could
>'mirror' the GDT into
>> > >    node-local memory and load those. This makes GDT load
>cache-misses a bit less
>> > >    expensive.
>> >
>> > GDT is per-cpu. Isn't per-cpu memory already NUMA-local?
>>
>> Indeed it is:
>>
>> fomalhaut:~> for ((cpu=1; cpu<9; cpu++)); do taskset $cpu ./sgdt ;
>done
>> SGDT: ffff88103fa09000 / 007f
>> SGDT: ffff88103fa29000 / 007f
>> SGDT: ffff88103fa29000 / 007f
>> SGDT: ffff88103fa49000 / 007f
>> SGDT: ffff88103fa49000 / 007f
>> SGDT: ffff88103fa49000 / 007f
>> SGDT: ffff88103fa29000 / 007f
>> SGDT: ffff88103fa69000 / 007f
>>
>> I confused it with the IDT, which is still global.
>>
>> This also means that the GDT in itself does not leak kernel addresses
>at the
>> moment, except it leaks the layout of the percpu area.
>>
>> So my suggestion would be to:
>>
>>  - make the GDT unconditionally page aligned and sized, then remap it
>to a
>>    read-only address unconditionally as well, like we do it for the
>IDT.
>
>Does anyone know what happens if you stick a non-accessed segment in
>the GDT, map the GDT RO, and access it?  The docs are extremely vague
>on the interplay between segmentation and paging on the segmentation
>structures themselves.  My guess is that it causes #PF.  This might
>break set_thread_area users unless we change set_thread_area to force
>the accessed bit on.
>
>There's a possible worse failure mode: if someone pokes an un-accessed
>segment into SS or CS using sigreturn, then it's within the realm of
>possibility that IRET would generate #PF (hey Intel and AMD, please
>document this!).  I don't think that would be rootable, but at the
>very least we'd want to make sure it doesn't OOPS by either making it
>impossible or adding an explicit test to sigreturn.c.
>
>hpa pointed out in another thread that the GDT *must* be writable on
>32-bit kernels because we use a task gate for NMI and jumping through
>a task gate writes to the GDT.
>
>On another note, SGDT is considerably faster than LSL, at least on
>Sandy Bridge.  The vdso might be able to take advantage of that for
>getcpu.
>
>--Andy

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/