[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y5uV3sS/nb+J+Akx@casper.infradead.org>
Date: Thu, 15 Dec 2022 21:47:10 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Nico Pache <npache@...hat.com>
Cc: Sidhartha Kumar <sidhartha.kumar@...cle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
muchun.song@...ux.dev, mike.kravetz@...cle.com,
akpm@...ux-foundation.org, gerald.schaefer@...ux.ibm.com,
Waiman Long <llong@...hat.com>
Subject: Re: [RFC V2] mm: add the zero case to page[1].compound_nr in
set_compound_order
On Thu, Dec 15, 2022 at 02:38:28PM -0700, Nico Pache wrote:
> To expand a little more on the analysis:
> I computed the latency/throughput between <+24> and <+27> using
> intel's manual (APPENDIX D):
>
> The bitmath solutions shows a total latency of 2.5 with a Throughput of 0.5.
> The branch solution show a total latency of 4 and throughput of 1.5.
>
> Given this is not a tight loop, and the next instruction is requiring
> the data computed, better (lower) latency is the more ideal situation.
>
> Just wanted to add that little piece :)
I appreciate how hard you're working on this, but it really is straining
at gnats ;-) For a modern cpu, the most important thing is cache misses
and avoiding dirtying cachelines. Cycle counting isn't that important
when an L3 cache miss takes 2000 (or more) cycles.
Powered by blists - more mailing lists