[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <00C2DC4B-A77D-4B32-B7F7-2291830BC2D2@gmail.com>
Date: Mon, 25 Oct 2021 09:19:35 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Linux-MM <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Peter Xu <peterx@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Nick Piggin <npiggin@...il.com>, x86@...nel.org
Subject: Re: [PATCH v2 3/5] x86/mm: check exec permissions on fault
> On Oct 25, 2021, at 7:20 AM, Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 10/21/21 5:21 AM, Nadav Amit wrote:
>> access_error() currently does not check for execution permission
>> violation.
> Ye
>
>> As a result, spurious page-faults due to execution permission
>> violation cause SIGSEGV.
>
> While I could totally believe that something is goofy when VMAs are
> being changed underneath a page fault, I'm having trouble figuring out
> why the "if (error_code & X86_PF_WRITE)" code is being modified.
In the scenario I mentioned the VMAs are not changed underneath the
page-fault. They change *before* the page-fault, but there are
residues of the old PTE in the TLB.
>
>> It appears not to be an issue so far, but the next patches avoid TLB
>> flushes on permission promotion, which can lead to this scenario. nodejs
>> for instance crashes when TLB flush is avoided on permission promotion.
>
> Just to be clear, "promotion" is going from something like:
>
> W=0->W=1
> or
> NX=1->NX=0
>
> right? I tend to call that "relaxing" permissions.
I specifically talk about NX=1>NX=0.
I can change the language to “relaxing”.
>
> Currently, X86_PF_WRITE faults are considered an access error unless the
> VMA to which the write occurred allows writes. Returning "no access
> error" permits continuing and handling the copy-on-write.
>
> It sounds like you want to expand that. You want to add a whole class
> of new faults that can be ignored: not just that some COW handling might
> be necessary, but that the PTE itself might be out of date. Just like
> a "COW fault" may just result in setting the PTE.W=1 and moving on with
> our day, an instruction fault might now just end up with setting
> PTE.NX=0 and also moving on with our day.
You raise an interesting idea (which can easily be implemented with uffd),
but no - I had none of that in my mind.
My only purpose is to deal with actual spurious page-faults that I
encountered when I removed the TLB flush the happens after NX=1->NX=0.
I am actually surprised that the kernel makes such a strong assumption
that every change of NX=1->NX=0 would be followed by a TLB flush, and
that during these changes the mm is locked for write. But that is the
case. If you do not have this change and a PTE is changed from
NX=1->NX=0 and *later* you access the page, you can have a page-fault
due to stale PTE, and get a SIGSEGV since access_error() is wrong to
assume that this is an invalid access.
I did not change and there are no changes to the VMA during the
page-fault. The page-fault handler would do pretty much nothing and
return to user-space which would retry the instruction. [ page-fault
triggers an implicit TLB flush of the offending PTE ]
>
> I'm really confused why the "error_code & X86_PF_WRITE" case is getting
> modified. I would have expected it to be something like just adding:
>
> /* read, instruction fetch */
> if (error_code & X86_PF_INSN) {
> /* Avoid enforcing access error if spurious: */
> if (unlikely(!(vma->vm_flags & VM_EXEC)))
> return 1;
> return 0;
> }
>
> I'm really confused what X86_PF_WRITE and X86_PF_INSN have in common
> other than both being able to (now) be generated spuriously.
That was my first version, but I was concerned that perhaps there is
some strange scenario in which both X86_PF_WRITE and X86_PF_INSN can
be set. That is the reason that Peter asked you whether this is
something that might happen.
If you confirm they cannot be both set, I would the version you just
mentioned.
Powered by blists - more mailing lists