lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADrL8HX997CguZWkH3vB4+cYjwLc0mYV4GkroE41bCLRqFiPpg@mail.gmail.com>
Date:   Wed, 21 Dec 2022 19:02:53 -0500
From:   James Houghton <jthoughton@...gle.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Peter Xu <peterx@...hat.com>,
        Muchun Song <songmuchun@...edance.com>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        "Zach O'Keefe" <zokeefe@...gle.com>,
        Manish Mishra <manish.mishra@...anix.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM

On Wed, Dec 21, 2022 at 5:32 PM Mike Kravetz <mike.kravetz@...cle.com> wrote:
>
> On 12/21/22 17:10, Peter Xu wrote:
> > On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote:
> > > On 12/21/22 15:21, James Houghton wrote:
> > > > Thanks for bringing this up, Peter. I think the main reason was:
> > > > having separate UFFD_FEATUREs clearly indicates to userspace what is
> > > > and is not supported.
> > >
> > > IIRC, I think we wanted to initially limit the usage to the very
> > > specific use case (live migration).  The idea is that we could then
> > > expand usage as more use cases came to light.
> > >
> > > Another good thing is that userfaultfd has versioning built into the
> > > API.  Thus a user can determine if HGM is enabled in their running
> > > kernel.
> >
> > I don't worry much on this one, afaiu if we have any way to enable hgm then
> > the user can just try enabling it on a test vma, just like when an app
> > wants to detect whether a new madvise() is present on the current host OS.

That would be enough to test if HGM was merely present, but if
specific features like 4K UFFDIO_CONTINUEs or 4K UFFDIO_WRITEPROTECTs
were available. You could always check these by making a HugeTLB VMA
and setting it up correctly for userfaultfd/etc., but that's a little
messy.

> >
> > Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm
> > would work too.

I'm not opposed to this.

> >
> > >
> > > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller
> > > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't
> > > > allowed as of this patch series, but it could be allowed in the
> > > > future. To add support in the same way as this series, we would add
> > > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that
> > > > having to add another feature isn't great; is this what you're
> > > > concerned about?
> > > >
> > > > Considering MADV_ENABLE_HUGETLB...
> > > > 1. If a user provides this, then the contract becomes: "the kernel may
> > > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at
> > > > high-granularities, provided the support exists", but it becomes
> > > > unclear to userspace to know what's supported and what isn't.
> > > > 2. We would then need to keep track if a user explicitly enabled it,
> > > > or if it got enabled automatically in response to memory poison, for
> > > > example. Not a big problem, just a complication. (Otherwise, if HGM
> > > > got enabled for poison, suddenly userspace would be allowed to do
> > > > things it wasn't allowed to do before.)
> >
> > We could alternatively have two flags for each vma: (a) hgm_advised and (b)
> > hgm_enabled.  (a) always sets (b) but not vice versa.  We can limit poison
> > to set (b) only.  For this patchset, it can be all about (a).

My thoughts exactly. :)

> >
> > > > 3. This API makes sense for enabling HGM for something outside of
> > > > userfaultfd, like MADV_DONTNEED.
> > >
> > > I think #3 is key here.  Once we start applying HGM to things outside
> > > userfaultfd, then more thought will be required on APIs.  The API is
> > > somewhat limited by design until the basic functionality is in place.
> >
> > Mike, could you elaborate what's the major concern of having hgm used
> > outside uffd and live migration use cases?
> >
> > I feel like I miss something here.  I can understand we want to limit the
> > usage only when the user specifies using hgm because we want to keep the
> > old behavior intact.  However if we want another way to enable hgm it'll
> > still need one knob anyway even outside uffd, and I thought that'll service
> > the same purpose, or maybe not?
>
> I am not opposed to using hgm outside the use cases targeted by this series.
>
> It seems that when we were previously discussing the API we spent a bunch of
> time going around in circles trying to get the API correct.  That is expected
> as it is more difficult to take all users/uses/abuses of the API into account.
>
> Since the initial use case was fairly limited, it seemed like a good idea to
> limit the API to userfaultfd.  In this way we could focus on the underlying
> code/implementation and then expand as needed.  Of course, with an eye on
> anything that may be a limiting factor in the future.
>
> I was not aware of the uffd-wp use case, and am more than happy to discuss
> expanding the API.

So considering two API choices:

1. What we have now: UFFD_FEATURE_MINOR_HUGETLBFS_HGM for
UFFDIO_CONTINUE, and later UFFD_FEATURE_WP_HUGETLBFS_HGM for
UFFDIO_WRITEPROTECT. For MADV_DONTNEED, we could just suddenly start
allowing high-granularity choices (not sure if this is bad; we started
allowing it for HugeTLB recently with no other API change, AFAIA).

2. MADV_ENABLE_HGM or something similar. The changes to
UFFDIO_CONTINUE/UFFDIO_WRITEPROTECT/MADV_DONTNEED come automatically,
provided they are implemented.

I don't mind one way or the other. Peter, I assume you prefer #2.
Mike, what about you? If we decide on something other than #1, I'll
make the change before sending v1 out.

- James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ