linux-kernel - Re: [PATCH] mm: clear 1G pages with streaming stores on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200311183240.GA3880414@rani.riverdale.lan>
Date:   Wed, 11 Mar 2020 14:32:41 -0400
From:   Arvind Sankar <nivedita@...m.mit.edu>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Arvind Sankar <nivedita@...m.mit.edu>,
        Cannon Matthews <cannonmatthews@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Greg Thelen <gthelen@...gle.com>,
        Salman Qazi <sqazi@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86

On Wed, Mar 11, 2020 at 11:16:07AM +0300, Kirill A. Shutemov wrote:
> On Tue, Mar 10, 2020 at 11:35:54PM -0400, Arvind Sankar wrote:
> > 
> > The rationale for MOVNTI instruction is supposed to be that it avoids
> > cache pollution. Aside from the bench that shows MOVNTI to be faster for
> > the move itself, shouldn't it have an additional benefit in not trashing
> > the CPU caches?
> > 
> > As string instructions improve, why wouldn't the same improvements be
> > applied to MOVNTI?
> 
> String instructions inherently more flexible. Implementation can choose
> caching strategy depending on the operation size (cx) and other factors.
> Like if operation is large enough and cache is full of dirty cache lines
> that expensive to free up, it can choose to bypass cache. MOVNTI is more
> strict on semantics and more opaque to CPU.

But with today's processors, wouldn't writing 1G via the string
operations empty out almost the whole cache? Or are there already
optimizations to prevent one thread from hogging the L3?

If we do want to just use the string operations, it seems like the
clear_page routines should just call memset instead of duplicating it.

> 
> And more importantly string instructions, unlike MOVNTI, is something that
> generated often by compiler and used in standard libraries a lot. It is
> and will be focus of optimization of CPU architects.
> 
> -- 
>  Kirill A. Shutemov