[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e1201f5-da25-6040-8230-c84856221838@redhat.com>
Date: Tue, 21 Feb 2023 09:38:26 +0100
From: David Hildenbrand <david@...hat.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
"bsingharora@...il.com" <bsingharora@...il.com>,
"hpa@...or.com" <hpa@...or.com>,
"Syromiatnikov, Eugene" <esyr@...hat.com>,
"peterz@...radead.org" <peterz@...radead.org>,
"rdunlap@...radead.org" <rdunlap@...radead.org>,
"keescook@...omium.org" <keescook@...omium.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"Eranian, Stephane" <eranian@...gle.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"fweimer@...hat.com" <fweimer@...hat.com>,
"nadav.amit@...il.com" <nadav.amit@...il.com>,
"jannh@...gle.com" <jannh@...gle.com>,
"dethoma@...rosoft.com" <dethoma@...rosoft.com>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"kcc@...gle.com" <kcc@...gle.com>, "pavel@....cz" <pavel@....cz>,
"oleg@...hat.com" <oleg@...hat.com>,
"hjl.tools@...il.com" <hjl.tools@...il.com>,
"bp@...en8.de" <bp@...en8.de>,
"Lutomirski, Andy" <luto@...nel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"arnd@...db.de" <arnd@...db.de>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"Schimpe, Christina" <christina.schimpe@...el.com>,
"x86@...nel.org" <x86@...nel.org>,
"mike.kravetz@...cle.com" <mike.kravetz@...cle.com>,
"Yang, Weijiang" <weijiang.yang@...el.com>,
"debug@...osinc.com" <debug@...osinc.com>,
"jamorris@...ux.microsoft.com" <jamorris@...ux.microsoft.com>,
"john.allen@....com" <john.allen@....com>,
"rppt@...nel.org" <rppt@...nel.org>,
"andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"corbet@....net" <corbet@....net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
"gorcunov@...il.com" <gorcunov@...il.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Cc: "Yu, Yu-cheng" <yu-cheng.yu@...el.com>
Subject: Re: [PATCH v6 14/41] x86/mm: Introduce _PAGE_SAVED_DIRTY
On 20.02.23 22:38, Edgecombe, Rick P wrote:
> On Mon, 2023-02-20 at 12:32 +0100, David Hildenbrand wrote:
>> On 18.02.23 22:14, Rick Edgecombe wrote:
>>> Some OSes have a greater dependence on software available bits in
>>> PTEs than
>>> Linux. That left the hardware architects looking for a way to
>>> represent a
>>> new memory type (shadow stack) within the existing bits. They chose
>>> to
>>> repurpose a lightly-used state: Write=0,Dirty=1. So in order to
>>> support
>>> shadow stack memory, Linux should avoid creating memory with this
>>> PTE bit
>>> combination unless it intends for it to be shadow stack.
>>>
>>> The reason it's lightly used is that Dirty=1 is normally set by HW
>>> _before_ a write. A write with a Write=0 PTE would typically only
>>> generate
>>> a fault, not set Dirty=1. Hardware can (rarely) both set Dirty=1
>>> *and*
>>> generate the fault, resulting in a Write=0,Dirty=1 PTE. Hardware
>>> which
>>> supports shadow stacks will no longer exhibit this oddity.
>>>
>>> So that leaves Write=0,Dirty=1 PTEs created in software. To achieve
>>> this,
>>> in places where Linux normally creates Write=0,Dirty=1, it can use
>>> the
>>> software-defined _PAGE_SAVED_DIRTY in place of the hardware
>>> _PAGE_DIRTY.
>>> In other words, whenever Linux needs to create Write=0,Dirty=1, it
>>> instead
>>> creates Write=0,SavedDirty=1 except for shadow stack, which is
>>> Write=0,Dirty=1. Further differentiated by VMA flags, these PTE bit
>>> combinations would be set as follows for various types of memory:
>>
>> I would simplify (see below) and not repeat what the patch contains
>> as
>> comments already that detailed.
>
> This verbiage has had quite a bit of x86 maintainer attention already.
> I hear what you are saying, but I'm a bit hesitant to take style
> suggestions at this point for fear of the situation where people ask
> for changes back and forth across different versions. Unless any x86
> maintainers want to chime in again? More responses below.
Sure, for my taste this is (1) too repetitive (2) too verbose (3) to
specialized. But whatever x86 maintainers prefer.
[...]
>> "
>> However, there are valid cases where the kernel might create read-
>> only
>> PTEs that are dirty (e.g., fork(), mprotect(), uffd-wp(), soft-dirty
>> tracking). In this case, the _PAGE_SAVED_DIRTY bit is used instead
>> of
>> the HW-dirty bit, to avoid creating a wrong "shadow stack" PTEs.
>> Such
>> PTEs have (Write=0,SavedDirty=1,Dirty=0) set.
>>
>> Note that on processors without shadow stack support, the
>> _PAGE_SAVED_DIRTY remains unused.
>> "
>>
>> The I would simply drop below (which is also too COW-specific I
>> think).
>
> COW is the main situation where shadow stacks become read-only. So, as
> an example it is nice in that COW covers all the scenarios discussed.
> Again, do any x86 maintainers want to weigh in here?
Again, I'd not specialize on COW in all patches to much (IMHO, it
creates more confusion than it actually helps for understanding what's
happening) and just call it a read-only PTE that is dirty. Simple as
that. And it's easy to see why that's problematic, because read-only
PTEs that are dirty would be identified as shadow stack PTEs, which we
want to work around.
Again, just my 2 cents. I'm not an x86 maintainer ;)
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists