lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 6 Oct 2022 17:23:59 +0200
From:   Jann Horn <jannh@...gle.com>
To:     Linux-MM <linux-mm@...ck.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Christoph Hellwig <hch@....de>
Cc:     kernel list <linux-kernel@...r.kernel.org>,
        David Hildenbrand <david@...hat.com>,
        Jason Gunthorpe <jgg@...lanox.com>
Subject: ptep_get_lockless() on 32-bit x86/mips/sh looks wrong

ptep_get_lockless() does the following under CONFIG_GUP_GET_PTE_LOW_HIGH:

pte_t pte;
do {
  pte.pte_low = ptep->pte_low;
  smp_rmb();
  pte.pte_high = ptep->pte_high;
  smp_rmb();
} while (unlikely(pte.pte_low != ptep->pte_low));

It has a comment above it that argues that this is correct because:
1. A present PTE can't become non-present and then become a present
PTE pointing to another page without a TLB flush in between.
2. TLB flushes involve IPIs.

As far as I can tell, in particular on x86, _both_ of those
assumptions are false; perhaps on mips and sh only one of them is?

Number 2 is straightforward: X86 can run under hypervisors, and when
it runs under hypervisors, the MMU paravirtualization code (including
the KVM version) can implement remote TLB flushes without IPIs.

Number 1 is gnarlier, because breaking that assumption implies that
there can be a situation where different threads see different memory
at the same virtual address because their TLBs are incoherent. But as
far as I know, it can happen when MADV_DONTNEED races with an
anonymous page fault, because zap_pte_range() does not always flush
stale TLB entries before dropping the page table lock. I think that's
probably fine, since it's a "garbage in, garbage out" kind of
situation - but if a concurrent GUP-fast can then theoretically end up
returning a completely unrelated page, that's bad.


Sadly, mips and sh don't define arch_cmpxchg_double(), so we can't
just change ptep_get_lockless() to use arch_cmpxchg_double() and be
done with it...

Powered by blists - more mailing lists