linux-kernel - Re: [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <70e08bd5-187a-daee-2822-1d9a437a9cff@intel.com>
Date:   Fri, 11 Mar 2022 12:59:17 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Nadav Amit <namit@...are.com>
Cc:     Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Xu <peterx@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Nick Piggin <npiggin@...il.com>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault

On 3/11/22 12:38, Nadav Amit wrote:
>> On Mar 11, 2022, at 11:41 AM, Dave Hansen <dave.hansen@...el.com> wrote:
...
>> Can any sane code trigger this?
> 
> Well, regarding this question and the previous one: I do not think that
> this scenario is possible today since mprotect() holds the mmap_lock
> for write. There is no other code that I am aware of that toggles
> the NX bit on a present entry.
> 
> But I will not bet my life on it. That’s the reason for the somewhat
> vague phrasing that I used.

>From the userspace perspective, mmap(MAP_FIXED) can do this too.  But,
sane userspace can't rely on the syscall to have done any work and the
TLB flushing is currently done before the syscall returns.

I'd put it this way:

	Today, it is possible for a thread to end up in access_error()
	for a PF_INSN fault and observe a VM_EXEC VMA.  If you are
	generous, this could be considered a spurious fault.

	However, the faulting thread would have had to race with the
	thread which was changing the PTE and the VMA and is currently
	*in* mprotect() (or some other syscall).  In other words, the
	faulting thread can encounter this situation, but it never had
	any assurance from the kernel that it wouldn't fault.  This is
	because the faulting thread never had a chance to observe the
	syscall return.

	There is no evidence that the existing behavior can cause any
	issues with sane userspace.

>>> index d0074c6ed31a..ad0ef0a6087a 100644
>>> --- a/arch/x86/mm/fault.c
>>> +++ b/arch/x86/mm/fault.c
>>> @@ -1107,10 +1107,28 @@ access_error(unsigned long error_code, struct vm_area_struct *vma)
>>> 				       (error_code & X86_PF_INSTR), foreign))
>>> 		return 1;
>>>
>>> -	if (error_code & X86_PF_WRITE) {
>>> +	if (error_code & (X86_PF_WRITE | X86_PF_INSTR)) {
>>> +		/*
>>> +		 * CPUs are not expected to set the two error code bits
>>> +		 * together, but to ensure that hypervisors do not misbehave,
>>> +		 * run an additional sanity check.
>>> +		 */
>>> +		if ((error_code & (X86_PF_WRITE|X86_PF_INSTR)) ==
>>> +					(X86_PF_WRITE|X86_PF_INSTR)) {
>>> +			WARN_ON_ONCE(1);
>>> +			return 1;
>>> +		}
>>
>> access_error() is only used on the do_user_addr_fault() side of things.
>> Can we stick this check somewhere that also works for kernel address
>> faults?  This is a generic sanity check.  It can also be in a separate
>> patch.
> 
> I can wrap it in a different function and also call it from
> do_kern_addr_fault() or spurious_kernel_fault().
> 
> Anyhow, spurious_kernel_fault() should handle spurious faults on
> executable code correctly. 

This is really about checking the sanity of the "hardware"-provided
error code.  Let's just do it in  handle_page_fault(), maybe hidden in a
function like:

void check_error_code_sanity(unsigned long error_code)
{
	WARN_ON_ONCE(...);
}

You can leave the X86_PF_PK check in place for now.  It's probably going
away soon anyway.

>> Also, we should *probably* stop talking about CPUs here.  If there's
>> ever something wonky with error code bits, I'd put my money on a weird
>> hypervisor before any kind of CPU issue.
> 
> I thought I manage to convey exactly that in the comment. Can you provide
> a better phrasing?

Maybe:

	/*
	 * X86_PF_INSTR for instruction _fetches_.  Fetches never write.
	 * X86_PF_WRITE should never be set with X86_PF_INSTR.
	 *
	 * This is most likely due to a buggy hypervisor.
	 */