[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200311203246.GA3971914@rani.riverdale.lan>
Date: Wed, 11 Mar 2020 16:32:47 -0400
From: Arvind Sankar <nivedita@...m.mit.edu>
To: Arvind Sankar <nivedita@...m.mit.edu>
Cc: "Kirill A. Shutemov" <kirill@...temov.name>,
Cannon Matthews <cannonmatthews@...gle.com>,
Matthew Wilcox <willy@...radead.org>,
Andi Kleen <ak@...ux.intel.com>,
Michal Hocko <mhocko@...nel.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Greg Thelen <gthelen@...gle.com>,
Salman Qazi <sqazi@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86
On Wed, Mar 11, 2020 at 02:32:41PM -0400, Arvind Sankar wrote:
> On Wed, Mar 11, 2020 at 11:16:07AM +0300, Kirill A. Shutemov wrote:
> > On Tue, Mar 10, 2020 at 11:35:54PM -0400, Arvind Sankar wrote:
> > >
> > > The rationale for MOVNTI instruction is supposed to be that it avoids
> > > cache pollution. Aside from the bench that shows MOVNTI to be faster for
> > > the move itself, shouldn't it have an additional benefit in not trashing
> > > the CPU caches?
> > >
> > > As string instructions improve, why wouldn't the same improvements be
> > > applied to MOVNTI?
> >
> > String instructions inherently more flexible. Implementation can choose
> > caching strategy depending on the operation size (cx) and other factors.
> > Like if operation is large enough and cache is full of dirty cache lines
> > that expensive to free up, it can choose to bypass cache. MOVNTI is more
> > strict on semantics and more opaque to CPU.
>
> But with today's processors, wouldn't writing 1G via the string
> operations empty out almost the whole cache? Or are there already
> optimizations to prevent one thread from hogging the L3?
Also, currently the stringop is only done 4k at a time, so it would
likely not trigger any future cache-bypassing optimizations in any case.
>
> If we do want to just use the string operations, it seems like the
> clear_page routines should just call memset instead of duplicating it.
>
> >
> > And more importantly string instructions, unlike MOVNTI, is something that
> > generated often by compiler and used in standard libraries a lot. It is
> > and will be focus of optimization of CPU architects.
> >
> > --
> > Kirill A. Shutemov
Powered by blists - more mailing lists