[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <05242256-4B5F-4AD6-B7DA-46A583335E5C@gmail.com>
Date: Mon, 23 Aug 2021 22:34:42 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Mike Rapoport <rppt@...nel.org>
Cc: Linux-MM <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Ira Weiny <ira.weiny@...el.com>,
Kees Cook <keescook@...omium.org>,
Mike Rapoport <rppt@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Vlastimil Babka <vbabka@...e.cz>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 4/4] x86/mm: write protect (most) page tables
Sorry for sending twice. The mail app decided to use HTML for some
reason.
On Aug 23, 2021, at 10:32 PM, Nadav Amit <nadav.amit@...il.com> wrote:
>
> On Aug 23, 2021, at 6:25 AM, Mike Rapoport <rppt@...nel.org> wrote:
>
> From: Mike Rapoport <rppt@...ux.ibm.com>
>
> Allocate page table using __GFP_PTE_MAPPED so that they will have 4K PTEs
> in the direct map. This allows to switch _PAGE_RW bit each time a page
> table page needs to be made writable or read-only.
>
> The writability of the page tables is toggled only in the lowest level page
> table modifiction functions and immediately switched off.
>
> The page tables created early in the boot (including the direct map page
> table) are not write protected.
>
>
[ snip ]
> +static void pgtable_write_set(void *pg_table, bool set)
> +{
> + int level = 0;
> + pte_t *pte;
> +
> + /*
> + * Skip the page tables allocated from pgt_buf break area and from
> + * memblock
> + */
> + if (!after_bootmem)
> + return;
> + if (!PageTable(virt_to_page(pg_table)))
> + return;
> +
> + pte = lookup_address((unsigned long)pg_table, &level);
> + if (!pte || level != PG_LEVEL_4K)
> + return;
> +
> + if (set) {
> + if (pte_write(*pte))
> + return;
> +
> + WRITE_ONCE(*pte, pte_mkwrite(*pte));
I think that the pte_write() test (and the following one) might hide
latent bugs. Either you know whether the PTE is write-protected or you
need to protect against nested/concurrent calls to pgtable_write_set()
by disabling preemption/IRQs.
Otherwise, you risk in having someone else write-protecting the PTE
after it is write-unprotected and before it is written - causing a crash,
or write-unprotecting it after it is protected - which circumvents the
protection.
Therefore, I would think that instead you should have:
VM_BUG_ON(pte_write(*pte)); // (or WARN_ON_ONCE())
In addition, if there are assumptions on the preemptability of the code,
it would be nice to have some assertions. I think that the code assumes
that all calls to pgtable_write_set() are done while holding the
page-table lock. If that is the case, perhaps adding some lockdep
assertion would also help to confirm the correctness.
[ I put aside the lack of TLB flushes, which make the whole matter of
delivered protection questionable. I presume that once PKS is used,
this is not an issue. ]
Powered by blists - more mailing lists