linux-kernel - Re: [PATCH] mm: clear 1G pages with streaming stores on x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200311203246.GA3971914@rani.riverdale.lan>
Date:   Wed, 11 Mar 2020 16:32:47 -0400
From:   Arvind Sankar <nivedita@...m.mit.edu>
To:     Arvind Sankar <nivedita@...m.mit.edu>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        Cannon Matthews <cannonmatthews@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Greg Thelen <gthelen@...gle.com>,
        Salman Qazi <sqazi@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86

On Wed, Mar 11, 2020 at 02:32:41PM -0400, Arvind Sankar wrote:
> On Wed, Mar 11, 2020 at 11:16:07AM +0300, Kirill A. Shutemov wrote:
> > On Tue, Mar 10, 2020 at 11:35:54PM -0400, Arvind Sankar wrote:
> > > 
> > > The rationale for MOVNTI instruction is supposed to be that it avoids
> > > cache pollution. Aside from the bench that shows MOVNTI to be faster for
> > > the move itself, shouldn't it have an additional benefit in not trashing
> > > the CPU caches?
> > > 
> > > As string instructions improve, why wouldn't the same improvements be
> > > applied to MOVNTI?
> > 
> > String instructions inherently more flexible. Implementation can choose
> > caching strategy depending on the operation size (cx) and other factors.
> > Like if operation is large enough and cache is full of dirty cache lines
> > that expensive to free up, it can choose to bypass cache. MOVNTI is more
> > strict on semantics and more opaque to CPU.
> 
> But with today's processors, wouldn't writing 1G via the string
> operations empty out almost the whole cache? Or are there already
> optimizations to prevent one thread from hogging the L3?

Also, currently the stringop is only done 4k at a time, so it would
likely not trigger any future cache-bypassing optimizations in any case.

> 
> If we do want to just use the string operations, it seems like the
> clear_page routines should just call memset instead of duplicating it.
> 
> > 
> > And more importantly string instructions, unlike MOVNTI, is something that
> > generated often by compiler and used in standard libraries a lot. It is
> > and will be focus of optimization of CPU architects.
> > 
> > -- 
> >  Kirill A. Shutemov