[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YST1zQ0FugifJnfi@kernel.org>
Date: Tue, 24 Aug 2021 16:36:13 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Nadav Amit <nadav.amit@...il.com>
Cc: Linux-MM <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Ira Weiny <ira.weiny@...el.com>,
Kees Cook <keescook@...omium.org>,
Mike Rapoport <rppt@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
Vlastimil Babka <vbabka@...e.cz>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 4/4] x86/mm: write protect (most) page tables
On Mon, Aug 23, 2021 at 10:34:42PM -0700, Nadav Amit wrote:
> On Aug 23, 2021, at 10:32 PM, Nadav Amit <nadav.amit@...il.com> wrote:
> >
> > On Aug 23, 2021, at 6:25 AM, Mike Rapoport <rppt@...nel.org> wrote:
> >
> > From: Mike Rapoport <rppt@...ux.ibm.com>
> >
> > Allocate page table using __GFP_PTE_MAPPED so that they will have 4K PTEs
> > in the direct map. This allows to switch _PAGE_RW bit each time a page
> > table page needs to be made writable or read-only.
> >
> > The writability of the page tables is toggled only in the lowest level page
> > table modifiction functions and immediately switched off.
> >
> > The page tables created early in the boot (including the direct map page
> > table) are not write protected.
> >
> >
>
> [ snip ]
>
> > +static void pgtable_write_set(void *pg_table, bool set)
> > +{
> > + int level = 0;
> > + pte_t *pte;
> > +
> > + /*
> > + * Skip the page tables allocated from pgt_buf break area and from
> > + * memblock
> > + */
> > + if (!after_bootmem)
> > + return;
> > + if (!PageTable(virt_to_page(pg_table)))
> > + return;
> > +
> > + pte = lookup_address((unsigned long)pg_table, &level);
> > + if (!pte || level != PG_LEVEL_4K)
> > + return;
> > +
> > + if (set) {
> > + if (pte_write(*pte))
> > + return;
> > +
> > + WRITE_ONCE(*pte, pte_mkwrite(*pte));
>
> I think that the pte_write() test (and the following one) might hide
> latent bugs. Either you know whether the PTE is write-protected or you
> need to protect against nested/concurrent calls to pgtable_write_set()
> by disabling preemption/IRQs.
>
> Otherwise, you risk in having someone else write-protecting the PTE
> after it is write-unprotected and before it is written - causing a crash,
> or write-unprotecting it after it is protected - which circumvents the
> protection.
>
> Therefore, I would think that instead you should have:
>
> VM_BUG_ON(pte_write(*pte)); // (or WARN_ON_ONCE())
>
> In addition, if there are assumptions on the preemptability of the code,
> it would be nice to have some assertions. I think that the code assumes
> that all calls to pgtable_write_set() are done while holding the
> page-table lock. If that is the case, perhaps adding some lockdep
> assertion would also help to confirm the correctness.
>
> [ I put aside the lack of TLB flushes, which make the whole matter of
> delivered protection questionable. I presume that once PKS is used,
> this is not an issue. ]
As I said in another reply, the actual page table protection is merely to
exercise the allocator. I'll consider to actually use PKS for the next
versions (unless Rick beats me to it).
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists