lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201005171632.GB2990415@carbon.DHCP.thefacebook.com>
Date:   Mon, 5 Oct 2020 10:16:32 -0700
From:   Roman Gushchin <guro@...com>
To:     Zi Yan <ziy@...dia.com>
CC:     David Hildenbrand <david@...hat.com>,
        Michal Hocko <mhocko@...e.com>, <linux-mm@...ck.org>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Rik van Riel <riel@...riel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Yang Shi <shy828301@...il.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        William Kucharski <william.kucharski@...cle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        John Hubbard <jhubbard@...dia.com>,
        David Nellans <dnellans@...dia.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64

On Mon, Oct 05, 2020 at 11:03:56AM -0400, Zi Yan wrote:
> On 2 Oct 2020, at 4:30, David Hildenbrand wrote:
> 
> > On 02.10.20 10:10, Michal Hocko wrote:
> >> On Fri 02-10-20 09:50:02, David Hildenbrand wrote:
> >>>>>> - huge page sizes controllable by the userspace?
> >>>>>
> >>>>> It might be good to allow advanced users to choose the page sizes, so they
> >>>>> have better control of their applications.
> >>>>
> >>>> Could you elaborate more? Those advanced users can use hugetlb, right?
> >>>> They get a very good control over page size and pool preallocation etc.
> >>>> So they can get what they need - assuming there is enough memory.
> >>>>
> >>>
> >>> I am still not convinced that 1G THP (TGP :) ) are really what we want
> >>> to support. I can understand that there are some use cases that might
> >>> benefit from it, especially:
> >>
> >> Well, I would say that internal support for larger huge pages (e.g. 1GB)
> >> that can transparently split under memory pressure is a useful
> >> funtionality. I cannot really judge how complex that would be
> >
> > Right, but that's then something different than serving (scarce,
> > unmovable) gigantic pages from CMA / reserved hugetlbfs pool. Nothing
> > wrong about *real* THP support, meaning, e.g., grouping consecutive
> > pages and converting them back and forth on demand. (E.g., 1GB ->
> > multiple 2MB -> multiple single pages), for example, when having to
> > migrate such a gigantic page. But that's very different from our
> > existing gigantic page code as far as I can tell.
> 
> Serving 1GB PUD THPs from CMA is a compromise, since we do not want to
> bump MAX_ORDER to 20 to enable 1GB page allocation in buddy allocator,
> which needs section size increase. In addition, unmoveable pages cannot
> be allocated in CMA, so allocating 1GB pages has much higher chance from
> it than from ZONE_NORMAL.

s/higher chances/non-zero chances

Currently we have nothing that prevents the fragmentation of the memory
with unmovable pages on the 1GB scale. It means that in a common case
it's highly unlikely to find a continuous GB without any unmovable page.
As now CMA seems to be the only working option.

However it seems there are other use cases for the allocation of continuous
1GB pages: e.g. secretfd ( https://lwn.net/Articles/831628/ ), where using
1GB pages can reduce the fragmentation of the direct mapping.

So I wonder if we need a new mechanism to avoid fragmentation on 1GB/PUD scale.
E.g. something like a second level of pageblocks. That would allow to group
all unmovable memory in few 1GB blocks and have more 1GB regions available for
gigantic THPs and other use cases. I'm looking now into how it can be done.
If anybody has any ideas here, I'll appreciate a lot.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ