[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200309183704.GA1573@bombadil.infradead.org>
Date: Mon, 9 Mar 2020 11:37:04 -0700
From: Matthew Wilcox <willy@...radead.org>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Michal Hocko <mhocko@...nel.org>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Cannon Matthews <cannonmatthews@...gle.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Greg Thelen <gthelen@...gle.com>,
Salman Qazi <sqazi@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH] mm: clear 1G pages with streaming stores on x86
On Mon, Mar 09, 2020 at 08:38:31AM -0700, Andi Kleen wrote:
> > Gigantic huge pages are a bit different. They are much less dynamic from
> > the usage POV in my experience. Micro-optimizations for the first access
> > tends to not matter at all as it is usually pre-allocation scenario. On
> > the other hand, speeding up the initialization sounds like a good thing
> > in general. It will be a single time benefit but if the additional code
> > is not hard to maintain then I would be inclined to take it even with
> > "artificial" numbers state above. There really shouldn't be other downsides
> > except for the code maintenance, right?
>
> There's a cautious tale of the old crappy RAID5 XOR assembler functions which
> were optimized a long time ago for the Pentium1, and stayed around,
> even though the compiler could actually do a better job.
>
> String instructions are constantly improving in performance (Broadwell is
> very old at this point) Most likely over time (and maybe even today
> on newer CPUs) you would need much more sophisticated unrolled MOVNTI variants
> (or maybe even AVX-*) to be competitive.
Presumably you have access to current and maybe even some unreleased
CPUs ... I mean, he's posted the patches, so you can test this hypothesis.
Powered by blists - more mailing lists