[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrU=YL8yWpp29xO0N7TEVogX1j5Fyk5M_FpJTa9ZOS21Zw@mail.gmail.com>
Date: Wed, 14 Oct 2015 14:39:58 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Matt Fleming <matt@...eblueprint.co.uk>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
stable <stable@...r.kernel.org>,
Laszlo Ersek <lersek@...hat.com>,
Matt Fleming <matt.fleming@...el.com>,
Borislav Petkov <bp@...e.de>,
"linux-efi@...r.kernel.org" <linux-efi@...r.kernel.org>
Subject: Re: [PATCH] x86: setup: extend low identity map to cover whole kernel range
On Wed, Oct 14, 2015 at 2:00 PM, Matt Fleming <matt@...eblueprint.co.uk> wrote:
> On Wed, 14 Oct, at 09:22:03AM, Andy Lutomirski wrote:
>> On Wed, Oct 14, 2015 at 6:52 AM, Matt Fleming <matt@...eblueprint.co.uk> wrote:
>> > (Pulling in luto for low-level x86 fu)
>> >
>> > On Wed, 14 Oct, at 01:30:45PM, Paolo Bonzini wrote:
>> >> On 32-bit systems, the initial_page_table is reused by
>> >> efi_call_phys_prolog as an identity map to call
>> >> SetVirtualAddressMap. efi_call_phys_prolog takes care of
>> >> converting the current CPU's GDT to a physical address too.
>> >>
>> >> For PAE kernels the identity mapping is achieved by aliasing the
>> >> first PDPE for the kernel memory mapping into the first PDPE
>> >> of initial_page_table. This makes the EFI stub's trick "just work".
>> >>
>> >> However, for non-PAE kernels there is no guarantee that the identity
>> >> mapping in the initial_page_table extends as far as the GDT; in this
>> >> case, accesses to the GDT will cause a page fault (which quickly becomes
>> >> a triple fault). Fix this by copying the kernel mappings from
>> >> swapper_pg_dir to initial_page_table twice, both at PAGE_OFFSET and at
>> >> identity mapping.
>> >
>> > Oops, good catch guys. This is clearly a bug, but...
>> >
>> >> For some reason, this is only reproducible with QEMU's dynamic translation
>> >> mode, and not for example with KVM. However, even under KVM one can clearly
>> >> see that the page table is bogus:
>>
>> I haven't looked at the code, but it wouldn't surprise me if this is
>> some kind of TLB issue. With the hardware TLB (which is in use on
>> KVM), it seems quite likely that the GDT is pretty much always in the
>> TLB and, if nothing flushes global mappings, then it'll probably stick
>> around.
>
> From some quick experiments it appears that you can skate past this
> issue if you don't receive any interrupts while the bogus GDT pointer
> is loaded, or if you avoid reloading the segment registers in general.
> Which is interesting because I assumed that writing to GDTR took
> immediate effect.
Trivia for your amusement:
AFAICT it's entirely permissible for the GDTR and/or LDT descriptor to
point to unmapped memory. Any attempt to use them (segment loads,
interrupts, IRET, etc) will try to access that memory as if the access
came from CPL 0 and, if the access fails, will generate a valid page
fault with CR2 pointing into the GDT or LDT.
Xen is nuts^Wclever and actually uses this.
Of course, if your #PF vector references a GDT or LDT descriptor and
trying to load that descriptor results in a page fault, you get a
double fault.
I learned this while trying to puzzle out why v1 of my LDT
synchronization patch caused random faults on Xen.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists