[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <819b6d6a-64ea-d908-76ad-0a6366ed0d53@intel.com>
Date: Wed, 10 Feb 2021 12:28:39 -0800
From: "Yu, Yu-cheng" <yu-cheng.yu@...el.com>
To: Kees Cook <keescook@...omium.org>
Cc: x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-arch@...r.kernel.org, linux-api@...r.kernel.org,
Arnd Bergmann <arnd@...db.de>,
Andy Lutomirski <luto@...nel.org>,
Balbir Singh <bsingharora@...il.com>,
Borislav Petkov <bp@...en8.de>,
Cyrill Gorcunov <gorcunov@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Eugene Syromiatnikov <esyr@...hat.com>,
Florian Weimer <fweimer@...hat.com>,
"H.J. Lu" <hjl.tools@...il.com>, Jann Horn <jannh@...gle.com>,
Jonathan Corbet <corbet@....net>,
Mike Kravetz <mike.kravetz@...cle.com>,
Nadav Amit <nadav.amit@...il.com>,
Oleg Nesterov <oleg@...hat.com>, Pavel Machek <pavel@....cz>,
Peter Zijlstra <peterz@...radead.org>,
Randy Dunlap <rdunlap@...radead.org>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Vedvyas Shanbhogue <vedvyas.shanbhogue@...el.com>,
Dave Martin <Dave.Martin@....com>,
Weijiang Yang <weijiang.yang@...el.com>,
Pengfei Xu <pengfei.xu@...el.com>, haitao.huang@...el.com
Subject: Re: [PATCH v20 08/25] x86/mm: Introduce _PAGE_COW
On 2/10/2021 11:42 AM, Kees Cook wrote:
> On Wed, Feb 10, 2021 at 09:56:46AM -0800, Yu-cheng Yu wrote:
>> There is essentially no room left in the x86 hardware PTEs on some OSes
>> (not Linux). That left the hardware architects looking for a way to
>> represent a new memory type (shadow stack) within the existing bits.
>> They chose to repurpose a lightly-used state: Write=0, Dirty=1.
>>
>> The reason it's lightly used is that Dirty=1 is normally set by hardware
>> and cannot normally be set by hardware on a Write=0 PTE. Software must
>> normally be involved to create one of these PTEs, so software can simply
>> opt to not create them.
>>
>> In places where Linux normally creates Write=0, Dirty=1, it can use the
>> software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other
>> words, whenever Linux needs to create Write=0, Dirty=1, it instead creates
>> Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This
>> clearly separates shadow stack from other data, and results in the
>> following:
>>
>> (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1)
>> (b) A R/O page that has been COW'ed: (Write=0, Cow=1)
>> The user page is in a R/O VMA, and get_user_pages() needs a writable
>> copy. The page fault handler creates a copy of the page and sets
>> the new copy's PTE as Write=0 and Cow=1.
>> (c) A shadow stack PTE: (Write=0, Dirty=1)
>> (d) A shared shadow stack PTE: (Write=0, Cow=1)
>> When a shadow stack page is being shared among processes (this happens
>> at fork()), its PTE is made Dirty=0, so the next shadow stack access
>> causes a fault, and the page is duplicated and Dirty=1 is set again.
>> This is the COW equivalent for shadow stack pages, even though it's
>> copy-on-access rather than copy-on-write.
>> (e) A page where the processor observed a Write=1 PTE, started a write, set
>> Dirty=1, but then observed a Write=0 PTE. That's possible today, but
>> will not happen on processors that support shadow stack.
>>
>> Define _PAGE_COW and update pte_*() helpers and apply the same changes to
>> pmd and pud.
>
> I still find this commit confusing mostly due to _PAGE_COW being 0
> without CET enabled. Shouldn't this just get changed universally? Why
> should this change depend on CET?
>
For example, in...
static inline int pte_write(pte_t pte)
{
if (cpu_feature_enabled(X86_FEATURE_SHSTK))
return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY);
else
return pte_flags(pte) & _PAGE_RW;
}
There are four cases:
(a) RW=1, Dirty=1 -> writable
(b) RW=1, Dirty=0 -> writable
(c) RW=0, Dirty=0 -> not writable
(d) RW=0, Dirty=1 -> shadow stack, or not-writable if !X86_FEATURE_SHSTK
Case (d) is ture only when shadow stack is enabled, otherwise it is not
writable. With shadow stack feature, the usual dirty, copy-on-write PTE
becomes RW=0, Cow=1.
We can get this changed universally, but all usual dirty, copy-on-write
PTEs need the Dirty/Cow swapping, always. Is that desirable?
--
Yu-cheng
[...]
Powered by blists - more mailing lists