lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y6OJUtVkvdptEgW7@monkey>
Date:   Wed, 21 Dec 2022 14:31:46 -0800
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     James Houghton <jthoughton@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Zach O'Keefe <zokeefe@...gle.com>,
        Manish Mishra <manish.mishra@...anix.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Yang Shi <shy828301@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v2 33/47] userfaultfd: add
 UFFD_FEATURE_MINOR_HUGETLBFS_HGM

On 12/21/22 17:10, Peter Xu wrote:
> On Wed, Dec 21, 2022 at 01:39:39PM -0800, Mike Kravetz wrote:
> > On 12/21/22 15:21, James Houghton wrote:
> > > On Wed, Dec 21, 2022 at 2:23 PM Peter Xu <peterx@...hat.com> wrote:
> > > >
> > > > James,
> > > >
> > > > On Wed, Nov 16, 2022 at 03:30:00PM -0800, James Houghton wrote:
> > > > > On Wed, Nov 16, 2022 at 2:28 PM Peter Xu <peterx@...hat.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 21, 2022 at 04:36:49PM +0000, James Houghton wrote:
> > > > > > > Userspace must provide this new feature when it calls UFFDIO_API to
> > > > > > > enable HGM. Userspace can check if the feature exists in
> > > > > > > uffdio_api.features, and if it does not exist, the kernel does not
> > > > > > > support and therefore did not enable HGM.
> > > > > > >
> > > > > > > Signed-off-by: James Houghton <jthoughton@...gle.com>
> > > > > >
> > > > > > It's still slightly a pity that this can only be enabled by an uffd context
> > > > > > plus a minor fault, so generic hugetlb users cannot directly leverage this.
> > > > >
> > > > > The idea here is that, for applications that can conceivably benefit
> > > > > from HGM, we have a mechanism for enabling it for that application. So
> > > > > this patch creates that mechanism for userfaultfd/UFFDIO_CONTINUE. I
> > > > > prefer this approach over something more general like MADV_ENABLE_HGM
> > > > > or something.
> > > >
> > > > Sorry to get back to this very late - I know this has been discussed since
> > > > the very early stage of the feature, but is there any reasoning behind?
> > > >
> > > > When I start to think seriously on applying this to process snapshot with
> > > > uffd-wp I found that the minor mode trick won't easily play - normally
> > > > that's a case where all the pages were there mapped huge, but when the app
> > > > wants UFFDIO_WRITEPROTECT it may want to remap the huge pages into smaller
> > > > pages, probably some size that the user can specify.  It'll be non-trivial
> > > > to enable HGM during that phase using MINOR mode because in that case the
> > > > pages are all mapped.
> > > >
> > > > For the long term, I am just still worried the current interface is still
> > > > not as flexible.
> > > 
> > > Thanks for bringing this up, Peter. I think the main reason was:
> > > having separate UFFD_FEATUREs clearly indicates to userspace what is
> > > and is not supported.
> > 
> > IIRC, I think we wanted to initially limit the usage to the very
> > specific use case (live migration).  The idea is that we could then
> > expand usage as more use cases came to light.
> > 
> > Another good thing is that userfaultfd has versioning built into the
> > API.  Thus a user can determine if HGM is enabled in their running
> > kernel.
> 
> I don't worry much on this one, afaiu if we have any way to enable hgm then
> the user can just try enabling it on a test vma, just like when an app
> wants to detect whether a new madvise() is present on the current host OS.
> 
> Besides, I'm wondering whether something like /sys/kernel/vm/hugepages/hgm
> would work too.
> 
> > 
> > > For UFFDIO_WRITEPROTECT, a user could remap huge pages into smaller
> > > pages by issuing a high-granularity UFFDIO_WRITEPROTECT. That isn't
> > > allowed as of this patch series, but it could be allowed in the
> > > future. To add support in the same way as this series, we would add
> > > another feature, say UFFD_FEATURE_WP_HUGETLBFS_HGM. I agree that
> > > having to add another feature isn't great; is this what you're
> > > concerned about?
> > > 
> > > Considering MADV_ENABLE_HUGETLB...
> > > 1. If a user provides this, then the contract becomes: "the kernel may
> > > allow UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT for HugeTLB at
> > > high-granularities, provided the support exists", but it becomes
> > > unclear to userspace to know what's supported and what isn't.
> > > 2. We would then need to keep track if a user explicitly enabled it,
> > > or if it got enabled automatically in response to memory poison, for
> > > example. Not a big problem, just a complication. (Otherwise, if HGM
> > > got enabled for poison, suddenly userspace would be allowed to do
> > > things it wasn't allowed to do before.)
> 
> We could alternatively have two flags for each vma: (a) hgm_advised and (b)
> hgm_enabled.  (a) always sets (b) but not vice versa.  We can limit poison
> to set (b) only.  For this patchset, it can be all about (a).
> 
> > > 3. This API makes sense for enabling HGM for something outside of
> > > userfaultfd, like MADV_DONTNEED.
> > 
> > I think #3 is key here.  Once we start applying HGM to things outside
> > userfaultfd, then more thought will be required on APIs.  The API is
> > somewhat limited by design until the basic functionality is in place.
> 
> Mike, could you elaborate what's the major concern of having hgm used
> outside uffd and live migration use cases?
> 
> I feel like I miss something here.  I can understand we want to limit the
> usage only when the user specifies using hgm because we want to keep the
> old behavior intact.  However if we want another way to enable hgm it'll
> still need one knob anyway even outside uffd, and I thought that'll service
> the same purpose, or maybe not?

I am not opposed to using hgm outside the use cases targeted by this series.

It seems that when we were previously discussing the API we spent a bunch of
time going around in circles trying to get the API correct.  That is expected
as it is more difficult to take all users/uses/abuses of the API into account.

Since the initial use case was fairly limited, it seemed like a good idea to
limit the API to userfaultfd.  In this way we could focus on the underlying
code/implementation and then expand as needed.  Of course, with an eye on
anything that may be a limiting factor in the future.

I was not aware of the uffd-wp use case, and am more than happy to discuss
expanding the API.
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ