lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFUsyfJ33cKFQdUagHQ_b4N80CfBtGQZhyA4CN_JLgEmXEX=DA@mail.gmail.com>
Date:   Fri, 26 Nov 2021 12:50:28 -0600
From:   Noah Goldstein <goldstein.w.n@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     tglx@...utronix.de, mingo@...hat.com,
        Borislav Petkov <bp@...en8.de>, dave.hansen@...ux.intel.com,
        X86 ML <x86@...nel.org>, hpa@...or.com, peterz@...radead.org,
        alexanderduyck@...com, open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] x86/lib: Optimize 8x loop and memory clobbers in csum_partial.c

On Fri, Nov 26, 2021 at 12:27 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Nov 26, 2021 at 10:17 AM Noah Goldstein <goldstein.w.n@...il.com> wrote:
> >
>
> >
> > Makes sense. Although if you inline I think you definitely will want a more
> > conservative clobber than just "memory". Also I think with 40 you also will
> > get some value from two counters.
> >
> > Did you see the number/question I posted about two accumulators for 32
> > byte case?
> > Its a judgement call about latency vs throughput that I don't really have an
> > answer for.
> >
>
> The thing I do not know is if using more units would slow down the
> hyper thread ?

There are more uops in the two accumulator version so it could be concern
iff the other hyperthread is bottlenecked on p06 throughput. My general
understanding is this is not the common case and that the very premise of
hyperthreads is that most bottlenecks are related to memory fetch or resolving
control flow.

>
> Would using ADCX/ADOX would be better in this respect ?

What would code using those instructions look like? Having trouble
seeing how to use them here.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ