lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YST1zQ0FugifJnfi@kernel.org>
Date:   Tue, 24 Aug 2021 16:36:13 +0300
From:   Mike Rapoport <rppt@...nel.org>
To:     Nadav Amit <nadav.amit@...il.com>
Cc:     Linux-MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Kees Cook <keescook@...omium.org>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Vlastimil Babka <vbabka@...e.cz>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 4/4] x86/mm: write protect (most) page tables

On Mon, Aug 23, 2021 at 10:34:42PM -0700, Nadav Amit wrote:
> On Aug 23, 2021, at 10:32 PM, Nadav Amit <nadav.amit@...il.com> wrote:
> > 
> > On Aug 23, 2021, at 6:25 AM, Mike Rapoport <rppt@...nel.org> wrote:
> > 
> > From: Mike Rapoport <rppt@...ux.ibm.com>
> > 
> > Allocate page table using __GFP_PTE_MAPPED so that they will have 4K PTEs
> > in the direct map. This allows to switch _PAGE_RW bit each time a page
> > table page needs to be made writable or read-only.
> > 
> > The writability of the page tables is toggled only in the lowest level page
> > table modifiction functions and immediately switched off.
> > 
> > The page tables created early in the boot (including the direct map page
> > table) are not write protected.
> > 
> > 
> 
> [ snip ]
> 
> > +static void pgtable_write_set(void *pg_table, bool set)
> > +{
> > +	int level = 0;
> > +	pte_t *pte;
> > +
> > +	/*
> > +	 * Skip the page tables allocated from pgt_buf break area and from
> > +	 * memblock
> > +	 */
> > +	if (!after_bootmem)
> > +		return;
> > +	if (!PageTable(virt_to_page(pg_table)))
> > +		return;
> > +
> > +	pte = lookup_address((unsigned long)pg_table, &level);
> > +	if (!pte || level != PG_LEVEL_4K)
> > +		return;
> > +
> > +	if (set) {
> > +		if (pte_write(*pte))
> > +			return;
> > +
> > +		WRITE_ONCE(*pte, pte_mkwrite(*pte));
> 
> I think that the pte_write() test (and the following one) might hide
> latent bugs. Either you know whether the PTE is write-protected or you
> need to protect against nested/concurrent calls to pgtable_write_set()
> by disabling preemption/IRQs.
> 
> Otherwise, you risk in having someone else write-protecting the PTE
> after it is write-unprotected and before it is written - causing a crash,
> or write-unprotecting it after it is protected - which circumvents the
> protection.
> 
> Therefore, I would think that instead you should have:
> 
> 	VM_BUG_ON(pte_write(*pte));  // (or WARN_ON_ONCE())
> 
> In addition, if there are assumptions on the preemptability of the code,
> it would be nice to have some assertions. I think that the code assumes
> that all calls to pgtable_write_set() are done while holding the
> page-table lock. If that is the case, perhaps adding some lockdep
> assertion would also help to confirm the correctness.
> 
> [ I put aside the lack of TLB flushes, which make the whole matter of
> delivered protection questionable. I presume that once PKS is used, 
> this is not an issue. ]

As I said in another reply, the actual page table protection is merely to
exercise the allocator. I'll consider to actually use PKS for the next
versions (unless Rick beats me to it).

-- 
Sincerely yours,
Mike.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ