lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Sep 2020 16:22:11 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Zi Yan <ziy@...dia.com>
Cc:     Roman Gushchin <guro@...com>,
        "Kirill A. Shutemov" <kirill@...temov.name>, linux-mm@...ck.org,
        Rik van Riel <riel@...riel.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        David Nellans <dnellans@...dia.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 00/16] 1GB THP support on x86_64

On 08.09.20 16:05, Zi Yan wrote:
> On 8 Sep 2020, at 7:57, David Hildenbrand wrote:
> 
>> On 03.09.20 18:30, Roman Gushchin wrote:
>>> On Thu, Sep 03, 2020 at 05:23:00PM +0300, Kirill A. Shutemov wrote:
>>>> On Wed, Sep 02, 2020 at 02:06:12PM -0400, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@...dia.com>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> This patchset adds support for 1GB THP on x86_64. It is on top of
>>>>> v5.9-rc2-mmots-2020-08-25-21-13.
>>>>>
>>>>> 1GB THP is more flexible for reducing translation overhead and increasing the
>>>>> performance of applications with large memory footprint without application
>>>>> changes compared to hugetlb.
>>>>
>>>> This statement needs a lot of justification. I don't see 1GB THP as viable
>>>> for any workload. Opportunistic 1GB allocation is very questionable
>>>> strategy.
>>>
>>> Hello, Kirill!
>>>
>>> I share your skepticism about opportunistic 1 GB allocations, however it might be useful
>>> if backed by an madvise() annotations from userspace application. In this case,
>>> 1 GB THPs might be an alternative to 1 GB hugetlbfs pages, but with a more convenient
>>> interface.
>>
>> I have concerns if we would silently use 1~GB THPs in most scenarios
>> where be would have used 2~MB THP. I'd appreciate a trigger to
>> explicitly enable that - MADV_HUGEPAGE is not sufficient because some
>> applications relying on that assume that the THP size will be 2~MB
>> (especially, if you want sparse, large VMAs).
> 
> This patchset is not intended to silently use 1GB THP in place of 2MB THP.
> First of all, there is a knob /sys/kernel/mm/transparent_hugepage/enable_1GB
> to enable 1GB THP explicitly. Also, 1GB THP is allocated from a reserved CMA
> region (although I had alloc_contig_pages as a fallback, which can be removed
> in next version), so users need to add hugepage_cma=nG kernel parameter to
> enable 1GB THP allocation. If a finer control is necessary, we can add
> a new MADV_HUGEPAGE_1GB for 1GB THP.

Thanks for the information - I would have loved to see important
information like that (esp. how to use) in the cover letter.

So what you propose is (excluding alloc_contig_pages()) really just
automatically using (previously reserved) 1GB huge pages as 1GB THP
instead of explicitly using them in an application using hugetlbfs.
Still, not convinced how helpful that actually is - most certainly you
really want a mechanism to control this per application (+ maybe make
the application indicate actual ranges where it makes sense - but then
you can directly modify the application to use hugetlbfs).

I guess the interesting thing of this approach is that we can
mix-and-match THP of differing granularity within a single mapping -
whereby a hugetlbfs allocation would fail in case there isn't sufficient
1GB pages available. However, there are no guarantees for applications
anymore (thinking about RT KVM and similar, we really want gigantic
pages and cannot tolerate falling back to smaller granularity).

What are intended use cases/applications that could benefit? I doubt
databases and virtualization are really a good fit - they know how to
handle hugetlbfs just fine.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ