linux-kernel - Re: [PATCH] mm: clear 1G pages with streaming stores on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2c309d66-60bb-fcfe-97f5-0828a88ead56@oracle.com>
Date:   Mon, 9 Mar 2020 11:01:56 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Michal Hocko <mhocko@...nel.org>,
        "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Cannon Matthews <cannonmatthews@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        David Rientjes <rientjes@...gle.com>,
        Greg Thelen <gthelen@...gle.com>,
        Salman Qazi <sqazi@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, ak@...ux.intel.com, x86@...nel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86

On 3/9/20 5:26 AM, Michal Hocko wrote:
> On Mon 09-03-20 14:36:58, Kirill A. Shutemov wrote:
>> On Mon, Mar 09, 2020 at 10:06:30AM +0100, Michal Hocko wrote:
>>> On Mon 09-03-20 03:08:20, Kirill A. Shutemov wrote:
>>>> On Fri, Mar 06, 2020 at 05:03:53PM -0800, Cannon Matthews wrote:
>>>>> Reimplement clear_gigantic_page() to clear gigabytes pages using the
>>>>> non-temporal streaming store instructions that bypass the cache
>>>>> (movnti), since an entire 1GiB region will not fit in the cache anyway.
>>>>>
>>> Gigantic huge pages are a bit different. They are much less dynamic from
>>> the usage POV in my experience. Micro-optimizations for the first access
>>> tends to not matter at all as it is usually pre-allocation scenario.
>>
>> The page got cleared not on reservation, but on allocation, including page
>> fault time. Keeping the page around the fault address can still be
>> beneficial.
> 
> You are right of course. What I meant to say that GB pages backed
> workloads I have seen tend to pre-allocate during the startup so they do
> not realy on lazy initialization duing #PF. This is slightly easier to
> handle for resource that is essentially impossible to get on-demand so
> an early failure is easier to handle.
> 
> If there are workloads which can benefit from page fault
> microptimizations then all good but this can be done on top and
> demonstrate by numbers. It is much more easier to demonstrate the speed
> up on pre-initialization workloads. That's all I wanted to say here.

I tend to agree that use of GB pages is generally going to be in workloads
that will do a one time setup/allocation/initialization of all the pages
it will use.  These improvements will help most in those workloads.

Workloads which include random faults should still benefit.  And, yes
the code could be made even better by taking the faulting address into
account.

I would be happy to see this go forward after addressing the few nits
mentioned by Andrew.  However, I myself can not comment with authority
on the low level x86 specific code.
-- 
Mike Kravetz