lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y+LbEY62MT/3yo7I@x1n>
Date:   Tue, 7 Feb 2023 18:13:21 -0500
From:   Peter Xu <peterx@...hat.com>
To:     James Houghton <jthoughton@...gle.com>
Cc:     Mike Kravetz <mike.kravetz@...cle.com>,
        David Hildenbrand <david@...hat.com>,
        Muchun Song <songmuchun@...edance.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Zach O'Keefe <zokeefe@...gle.com>,
        Manish Mishra <manish.mishra@...anix.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for
 walk_hugetlb_range

James,

On Tue, Feb 07, 2023 at 02:46:04PM -0800, James Houghton wrote:
> > Here is the result: [1] (sorry it took a little while heh). The

Thanks.  From what I can tell, that number shows that it'll be great we
start with your rfcv1 mapcount approach, which mimics what's proposed by
Matthew for generic folio.

> > implementation of the "RFC v1" way is pretty horrible[2] (and this

Any more information on why it's horrible? :)

A quick comment is I'm wondering whether that "whether we should boost the
mapcount" value can be hidden in hugetlb_pte* so you don't need to pass
over a lot of bool* deep into the hgm walk routines.

> > implementation probably has bugs anyway; it doesn't account for the
> > folio_referenced() problem).

I thought we reached a consensus on the resolution, by a proposal to remove
folio_referenced_arg.mapcount.  Is it not working for some reason?

> >
> > Matthew is trying to solve the same problem with THPs right now: [3].
> > I haven't figured out how we can apply Matthews's approach to HGM
> > right now, but there probably is a way. (If we left the mapcount
> > increment bits in the same place, we couldn't just check the
> > hstate-level PTE; it would have already been made present.)

I'm just worried that (1) this may add yet another dependency to your work
which is still during discussion phase, and (2) whether the folio approach
is easily applicable here, e.g., we may not want to populate all the ptes
for hugetlb HGMs by default.

> >
> > We could:
> > - use the THP-like way and tolerate ~1 second collapses
> 
> Another thought here. We don't necessarily *need* to collapse the page
> table mappings in between mmu_notifier_invalidate_range_start() and
> mmu_notifier_invalidate_range_end(), as the pfns aren't changing,
> we aren't punching any holes, and we aren't changing permission bits.
> If we had an MMU notifier that simply informed KVM that we collapsed
> the page tables *after* we finished collapsing, then it would be ok
> for hugetlb_collapse() to be slow.

That's a great point!  It'll definitely apply to either approach.

> 
> If this MMU notifier is something that makes sense, it probably
> applies to MADV_COLLAPSE for THPs as well.

THPs are definitely different, mmu notifiers should be required there,
afaict.  Isn't that what the current code does?

See collapse_and_free_pmd() for shmem and collapse_huge_page() for anon.

> 
> 
> > - use the (non-RFC) v1 way and tolerate the migration/smaps differences
> > - use the RFC v1 way and tolerate the complicated mapcount accounting
> > - flesh out [3] and see if it can be applied to HGM nicely
> >
> > I'm happy to go with any of these approaches.
> >
> > [1]: https://pastebin.com/raw/hJzFJHiD
> > [2]: https://github.com/48ca/linux/commit/4495f16a09b660aff44b3edcc125aa3a3df85976
> > [3]: https://lore.kernel.org/linux-mm/Y+FkV4fBxHlp6FTH@casper.infradead.org/
> 
> - James
> 

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ