[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAA1CXcBnFrDWsGcGfPj-kjkDXnUN=ZBY1Fgbk9FDSNYx9B_QhQ@mail.gmail.com>
Date: Thu, 15 Dec 2022 15:02:44 -0700
From: Nico Pache <npache@...hat.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
muchun.song@...ux.dev, mike.kravetz@...cle.com,
akpm@...ux-foundation.org, gerald.schaefer@...ux.ibm.com,
Waiman Long <llong@...hat.com>
Subject: Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order
On Thu, Dec 15, 2022 at 2:47 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Thu, Dec 15, 2022 at 02:38:28PM -0700, Nico Pache wrote:
> > To expand a little more on the analysis:
> > I computed the latency/throughput between <+24> and <+27> using
> > intel's manual (APPENDIX D):
> >
> > The bitmath solutions shows a total latency of 2.5 with a Throughput of 0.5.
> > The branch solution show a total latency of 4 and throughput of 1.5.
> >
> > Given this is not a tight loop, and the next instruction is requiring
> > the data computed, better (lower) latency is the more ideal situation.
> >
> > Just wanted to add that little piece :)
>
> I appreciate how hard you're working on this, but it really is straining
> at gnats ;-) For a modern cpu, the most important thing is cache misses
> and avoiding dirtying cachelines. Cycle counting isn't that important
> when an L3 cache miss takes 2000 (or more) cycles.
Haha yeah I figured so once I saw the results, but I figured I'd share.
We have HPC systems in the TiB of memory so sometimes gnats matter ;p
The 2-3 extra cycles may turn into 2million extra cycles on a 2TiB
system full of THPs-- I guess that's not a significant amount of
cycles either in the grand scheme of things.
Cheers,
-- Nico
>
Powered by blists - more mailing lists