linux-kernel - Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y5uV3sS/nb+J+Akx@casper.infradead.org>
Date:   Thu, 15 Dec 2022 21:47:10 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     Nico Pache <npache@...hat.com>
Cc:     Sidhartha Kumar <sidhartha.kumar@...cle.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        muchun.song@...ux.dev, mike.kravetz@...cle.com,
        akpm@...ux-foundation.org, gerald.schaefer@...ux.ibm.com,
        Waiman Long <llong@...hat.com>
Subject: Re: [RFC V2] mm: add the zero case to page[1].compound_nr in
 set_compound_order

On Thu, Dec 15, 2022 at 02:38:28PM -0700, Nico Pache wrote:
> To expand a little more on the analysis:
> I computed the latency/throughput between <+24> and <+27> using
> intel's manual (APPENDIX D):
> 
> The bitmath solutions shows a total latency of 2.5 with a Throughput of 0.5.
> The branch solution show a total latency of 4 and throughput of 1.5.
> 
> Given this is not a tight loop, and the next instruction is requiring
> the data computed, better (lower) latency is the more ideal situation.
> 
> Just wanted to add that little piece :)

I appreciate how hard you're working on this, but it really is straining
at gnats ;-)  For a modern cpu, the most important thing is cache misses
and avoiding dirtying cachelines.  Cycle counting isn't that important
when an L3 cache miss takes 2000 (or more) cycles.