lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Aug 2023 11:54:39 -0700 (PDT)
From:   Hugh Dickins <hughd@...gle.com>
To:     Jann Horn <jannh@...gle.com>
cc:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...nel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Qi Zheng <zhengqi.arch@...edance.com>,
        Yang Shi <shy828301@...il.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Peter Xu <peterx@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Alistair Popple <apopple@...dia.com>,
        Ralph Campbell <rcampbell@...dia.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Steven Price <steven.price@....com>,
        SeongJae Park <sj@...nel.org>,
        Lorenzo Stoakes <lstoakes@...il.com>,
        Huang Ying <ying.huang@...el.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Zack Rusin <zackr@...are.com>, Jason Gunthorpe <jgg@...pe.ca>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Minchan Kim <minchan@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Song Liu <song@...nel.org>,
        Thomas Hellstrom <thomas.hellstrom@...ux.intel.com>,
        Russell King <linux@...linux.org.uk>,
        "David S. Miller" <davem@...emloft.net>,
        Michael Ellerman <mpe@...erman.id.au>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Heiko Carstens <hca@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Claudio Imbrenda <imbrenda@...ux.ibm.com>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Vishal Moola <vishal.moola@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>, Zi Yan <ziy@...dia.com>,
        Zach O'Keefe <zokeefe@...gle.com>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        sparclinux@...r.kernel.org,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        linux-s390 <linux-s390@...r.kernel.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH mm-unstable] mm/khugepaged: fix collapse_pte_mapped_thp()
 versus uffd

On Tue, 22 Aug 2023, Jann Horn wrote:
> On Tue, Aug 22, 2023 at 4:51 AM Hugh Dickins <hughd@...gle.com> wrote:
> > On Mon, 21 Aug 2023, Jann Horn wrote:
> > > On Mon, Aug 21, 2023 at 9:51 PM Hugh Dickins <hughd@...gle.com> wrote:
> > > > Just for this case, take the pmd_lock() two steps earlier: not because
> > > > it gives any protection against this case itself, but because ptlock
> > > > nests inside it, and it's the dropping of ptlock which let the bug in.
> > > > In other cases, continue to minimize the pmd_lock() hold time.
> > >
> > > Special-casing userfaultfd like this makes me a bit uncomfortable; but
> > > I also can't find anything other than userfaultfd that would insert
> > > pages into regions that are khugepaged-compatible, so I guess this
> > > works?
> >
> > I'm as sure as I can be that it's solely because userfaultfd breaks
> > the usual rules here (and in fairness, IIRC Andrea did ask my permission
> > before making it behave that way on shmem, COWing without a source page).
> >
> > Perhaps something else will want that same behaviour in future (it's
> > tempting, but difficult to guarantee correctness); for now, it is just
> > userfaultfd (but by saying "_armed" rather than "_missing", I'm half-
> > expecting uffd to add more such exceptional modes in future).
> 
> Hm, yeah, sounds okay. (I guess we'd also run into this if we ever
> wanted to make it possible to reliably install PTE markers with
> madvise() or something like that, which might be nice for allowing
> userspace to create guard pages without unnecessary extra VMAs...)

I see the mailthread has taken inspiration from your comment there,
and veered off in that direction: but I'll ignore those futures.

> 
> > > I guess an alternative would be to use a spin_trylock() instead of the
> > > current pmd_lock(), and if that fails, temporarily drop the page table
> > > lock and then restart from step 2 with both locks held - and at that
> > > point the page table scan should be fast since we expect it to usually
> > > be empty.
> >
> > That's certainly a good idea, if collapse on userfaultfd_armed private
> > is anything of a common case (I doubt, but I don't know).  It may be a
> > better idea anyway (saving a drop and retake of ptlock).
> 
> I was thinking it also has the advantage that it would still perform
> okay if we got rid of the userfaultfd_armed() condition at some point
> - though I realize that designing too much for hypothetical future
> features is an antipattern.
> 
> > I gave it a try, expecting to end up with something that would lead
> > me to say "I tried it, but it didn't work out well"; but actually it
> > looks okay to me.  I wouldn't say I prefer it, but it seems reasonable,
> > and no more complicated (as Peter rightly observes) than the original.
> >
> > It's up to you and Peter, and whoever has strong feelings about it,
> > to choose between them: I don't mind (but I shall be sad if someone
> > demands that I indent that comment deeper - I'm not a fan of long
> > multi-line comments near column 80).
> 
> I prefer this version because it would make it easier to remove the
> "userfaultfd_armed()" check in the future if we have to, but I guess
> we could also always change it later if that becomes necessary, so I
> don't really have strong feelings on it at this point.

Thanks for considering them both, Jann.  I do think your trylock way,
as in v2, is in principle superior, and we may well have good reason
to switch over to it in future; but I find it slightly more confusing,
so will follow your and Peter's "no strong feelings" for now, and ask
Andrew please to take the original (implicit v1).

Overriding reason: I realized overnight that v2 is not quite correct:
I was clever enough to realize that nr_ptes needed to be reset to 0
to get the accounting right with a recheck pass, but not clever enough
to realize that resetting it to 0 there would likely skip the abort
path's flush_tlb_mm(mm), when we actually had cleared entries on the
first pass.  It needs a separate bool to decide the flush_tlb_mm(mm),
or it needs that (ridiculously minor!) step 3 to be moved down.

But rather than reworking it, please let's just go with v1 for now.

Thanks,
Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ