[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a62de3e-defa-6681-c853-cd5140c5f0b5@intel.com>
Date:   Tue, 4 Sep 2018 14:21:35 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Andy Lutomirski <luto@...nel.org>,
        Sean Christopherson <sean.j.christopherson@...el.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
        Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH] x86/pkeys: Explicitly treat PK #PF on kernel address as a
 bad area
On 09/04/2018 12:56 PM, Andy Lutomirski wrote:
> I have no objection to this patch.
> 
> Dave, why did you think that we could get a PK fault on the vsyscall
> page, even on kernels that still marked it executable?  Sure, you
> could get an instruction in the vsyscall page to get a PK fault, but
> CR2 wouldn't point to the vsyscall page, right?
I'm inferring the CR2 value from the page fault trace point.  I see
entries like this:
 protection_keys-4313  [002] d... 420257.094541: page_fault_user:
address=_end ip=_end error_code=0x15
But, that's not a PK fault, and it triggers the "misaligned vsyscall
(exploit attempt or buggy program)" stuff in dmesg.  It's just the
symptom of trying to execute the non-executable vsyscall page.
I'm not a super big fan of this particular patch, though.  The
fault_in_kernel_space() check is really presuming two things:
1. pkey faults (PF_PK=1) only occur on user pages (_PAGE_USER=1)
2. fault_in_kernel_space()==1 addresses are never user pages
#1 is a hardware expectation.  We *can* look for that directly by just
making sure that X86_PF_PK is only set when it also comes with
X86_PF_USER in the hardware page fault error code.
(...
	Aside: We should probably explicitly separate out the hardware
	error code from the software-munged version, like we do here:
	>         if (user_mode(regs)) {
	>                 local_irq_enable();
	>                 error_code |= X86_PF_USER)
But, #2 is a bit of a more loose check.  It wasn't true for the recent
vsyscall, and I've also seen goofy drivers map memory out to userspace
quite a few times in the kernel address space.
So, I'd much rather see a X86_PF_USER check than a X86_PF_USER check.
But, as for pkeys...
The original intent here was to relay: "protection key faults can never
be spurious".  The reason in my silly comment was that we don't do lazy
flushing, but that's imprecise: the real reasoning is that we don't ever
have kernel pages on which we can take protection key faults.
IOW, I think the check here should be for "protection key faults only
occur on user pages", and all the *spurious* checking should be looking
at *just* user vs. kernel pages, like:
static int spurious_fault_check(unsigned long error_code, pte_t *pte)
{
	/* Only expect spurious faults on kernel pages: */
	WARN_ON_ONCE(pte_flags(*pte) & _PAGE_USER);
	/* Only expect spurious faults originating from kernel code: */
	WARN_ON_ONCE(error_code & X86_PF_USER);
	...
Powered by blists - more mailing lists
 
