linux-kernel - Re: [RFC PATCH 00/16] 1GB THP support on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3CDAD67E-23A1-4D84-BF19-FFE1CF956779@nvidia.com>
Date:   Tue, 8 Sep 2020 11:09:25 -0400
From:   Zi Yan <ziy@...dia.com>
To:     Michal Hocko <mhocko@...e.com>
CC:     Roman Gushchin <guro@...com>, <linux-mm@...ck.org>,
        Rik van Riel <riel@...riel.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        David Nellans <dnellans@...dia.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64

On 7 Sep 2020, at 3:20, Michal Hocko wrote:

> On Fri 04-09-20 14:10:45, Roman Gushchin wrote:
>> On Fri, Sep 04, 2020 at 09:42:07AM +0200, Michal Hocko wrote:
> [...]
>>> An explicit opt-in sounds much more appropriate to me as well. If we go
>>> with a specific API then I would not make it 1GB pages specific. Why
>>> cannot we have an explicit interface to "defragment" address space
>>> range into large pages and the kernel would use large pages where
>>> appropriate? Or is the additional copying prohibitively expensive?
>>
>> Can you, please, elaborate a bit more here? It seems like madvise(MADV_HUGEPAGE)
>> provides something similar to what you're describing, but there are lot
>> of details here, so I'm probably missing something.
>
> MADV_HUGEPAGE is controlling a preference for THP to be used for a
> particular address range. So it looks similar but the historical
> behavior is to control page faults as well and the behavior depends on
> the global setup.
>
> I've had in mind something much simpler. Effectively an API to invoke
> khugepaged (like) functionality synchronously from the calling context
> on the specific address range. It could be more aggressive than the
> regular khugepaged and create even 1G pages (or as large THPs as page
> tables can handle on the particular arch for that matter).
>
> As this would be an explicit call we do not have to be worried about
> the resulting latency because it would be an explicit call by the
> userspace.  The default khugepaged has a harder position there because
> has no understanding of the target address space and cannot make any
> cost/benefit evaluation so it has to be more conservative.

Something like MADV_HUGEPAGE_SYNC? It would be useful, since users have
better and clearer control of getting huge pages from the kernel and
know when they will pay the cost of getting the huge pages.

I would think the suggestion is more about the huge page control options
currently provided by the kernel do not have predictable performance
outcome, since MADV_HUGEPAGE is a best-effort option and does not tell
users whether the marked virtual address range is backed by huge pages
or not when the madvise returns. MADV_HUGEPAGE_SYNC would provide a
deterministic result to users on whether the huge page(s) are formed
or not.

—
Best Regards,
Yan Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)