lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADrL8HVNBD=5akoTi3e0d6w=162Wak2cB2bc7jwQ8-DCAC4N1Q@mail.gmail.com>
Date:   Tue, 5 Dec 2023 09:54:44 -0800
From:   James Houghton <jthoughton@...gle.com>
To:     Ryan Roberts <ryan.roberts@....com>
Cc:     Steve Capper <steve.capper@....com>, Will Deacon <will@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Catalin Marinas <catalin.marinas@....com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] arm64: hugetlb: Fix page fault loop for
 sw-dirty/hw-clean contiguous PTEs

On Tue, Dec 5, 2023 at 6:43 AM Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 04/12/2023 17:26, James Houghton wrote:
> > It is currently possible for a userspace application to enter a page
> > fault loop when using HugeTLB pages implemented with contiguous PTEs
> > when HAFDBS is not available. This happens because:
> > 1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean
> >    (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).
>
> Hi James,
>
> Do you know how this happens?

Hi Ryan,

Thanks for taking a look! I do understand why this is happening. There
is an explanation in the reproducer[1] and also in this cover letter
(though I realize I could have been a little clearer). See below.

> AFAIK, this is the set of valid bit combinations, and
> PTE_RDONLY|PTE_WRITE|PTE_DIRTY is not one of them. Perhaps the real solution is
> to understand how this is happening and prevent it?
>
> /*
>  * PTE bits configuration in the presence of hardware Dirty Bit Management
>  * (PTE_WRITE == PTE_DBM):
>  *
>  * Dirty  Writable | PTE_RDONLY  PTE_WRITE  PTE_DIRTY (sw)
>  *   0      0      |   1           0          0
>  *   0      1      |   1           1          0
>  *   1      0      |   1           0          1
>  *   1      1      |   0           1          x
>  *
>  * When hardware DBM is not present, the sofware PTE_DIRTY bit is updated via
>  * the page fault mechanism. Checking the dirty status of a pte becomes:
>  *
>  *   PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY)
>  */

Thanks for pointing this out. So (1) is definitely a bug. The second
patch in this series makes it impossible to create such a PTE via
pte_modify (by forcing sw-dirty PTEs to be hw-dirty as well).

> > The second patch in this series makes step (1) less likely to occur.

It makes it impossible to create this invalid set of bits via
pte_modify(). Assuming all PTE pgprot updates are done via the proper
interfaces, patch #2 might actually make this invalid bit combination
impossible to produce (that's certainly the goal). So perhaps language
stronger than "less likely" is appropriate.

Here's the sequence of events to trigger this bug, via mprotect():

> > Without this patch, we can get the kernel to write a sw-dirty, hw-clean
> > PTE with the following steps (showing the relevant VMA flags and pgprot
> > bits):
> > i.   Create a valid, writable contiguous PTE.
> >        VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
> >        VMA pgprot bits: PTE_RDONLY | PTE_WRITE
> >        PTE pgprot bits: PTE_DIRTY | PTE_WRITE
> > ii.  mprotect the VMA to PROT_NONE.
> >        VMA vmflags:     VM_SHARED
> >        VMA pgprot bits: PTE_RDONLY
> >        PTE pgprot bits: PTE_DIRTY | PTE_RDONLY
> > iii. mprotect the VMA back to PROT_READ | PROT_WRITE.
> >        VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
> >        VMA pgprot bits: PTE_RDONLY | PTE_WRITE
> >        PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY

With patch #2, the PTE pgprot bits in step iii become PTE_DIRTY |
PTE_WRITE (hw-dirtiness is set, as the PTE is sw-dirty).

Thanks!

> > [1]: https://gist.github.com/48ca/11d1e466deee032cb35aa8c2280f93b0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ