linux-kernel - Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <021D3552-4C75-4B82-BDE5-AFA6E0315051@nvidia.com>
Date:   Mon, 5 Oct 2020 11:34:02 -0400
From:   Zi Yan <ziy@...dia.com>
To:     David Hildenbrand <david@...hat.com>, Roman Gushchin <guro@...com>
CC:     Michal Hocko <mhocko@...e.com>, <linux-mm@...ck.org>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Rik van Riel <riel@...riel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Yang Shi <shy828301@...il.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        "Mike Kravetz" <mike.kravetz@...cle.com>,
        William Kucharski <william.kucharski@...cle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "John Hubbard" <jhubbard@...dia.com>,
        David Nellans <dnellans@...dia.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH v2 00/30] 1GB PUD THP support on x86_64

On 2 Oct 2020, at 3:50, David Hildenbrand wrote:

>>>> - huge page sizes controllable by the userspace?
>>>
>>> It might be good to allow advanced users to choose the page sizes, so they
>>> have better control of their applications.
>>
>> Could you elaborate more? Those advanced users can use hugetlb, right?
>> They get a very good control over page size and pool preallocation etc.
>> So they can get what they need - assuming there is enough memory.
>>
>
> I am still not convinced that 1G THP (TGP :) ) are really what we want
> to support. I can understand that there are some use cases that might
> benefit from it, especially:
>
> "I want a lot of memory, give me memory in any granularity you have, I
> absolutely don't care - but of course, more TGP might be good for
> performance." Say, you want a 5GB region, but only have a single 1GB
> hugepage lying around. hugetlbfs allocation will fail.
>
>
> But then, do we really want to optimize for such (very special?) use
> cases via " 58 files changed, 2396 insertions(+), 460 deletions(-)" ?

I am planning to further refactor my code to reduce the size and make
it more general to support any size of THPs. As Matthew’s patchset[1]
is removing kernel’s THP size assumption, it might be a good time to
make THP support more general.

>
> I think gigantic pages are a sparse resource. Only selected applications
> *really* depend on them and benefit from them. Let these special
> applications handle it explicitly.
>
> Can we have a summary of use cases that would really benefit from this
> change?

For large machine learning applications, 1GB pages give good performance boost[2].
NVIDIA DGX A100 box now has 1TB memory, which means 1GB pages are not
that sparse in GPU-equipped infrastructure[3].

In addition, @Roman Gushchin should be able to provide a more concrete
story from his side.


[1] https://lore.kernel.org/linux-mm/20200908195539.25896-1-willy@infradead.org/
[2] http://learningsys.org/neurips19/assets/papers/18_CameraReadySubmission_MLSys_NeurIPS_2019.pdf
[3] https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/nvidia-dgx-a100-datasheet.pdf

—
Best Regards,
Yan Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)