lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Oct 2016 09:44:15 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     doug@...yco.com
Cc:     Shaohua Li <shli@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-raid <linux-raid@...r.kernel.org>,
        Neil Brown <neilb@...e.de>
Subject: Re: [GIT PULL] MD update for 4.9

On Thu, Oct 6, 2016 at 10:39 PM, Doug Dumitru <doug@...yco.com> wrote:
>
> There is another thread in [linux-raid] discussing pre-fetches in the
> raid-6 AVX2 code.  My testing implies that the prefetch distance is
> too short.  In your new AVX512 code, it looks like there are 24
> instructions, each with latencies of 1, between the prefetch and the
> actual memory load.  I don't have a AVX512 CPU to try this on, but the
> prefetch might do better at a bigger distance.  If I am not mistaken,
> it takes a lot longer than 24 clocks to fetch 4 cache lines.

We have basically never had a case where prefetches were actually a good idea.

If the hardware doesn't do prefetching on its own (partly with just
physical memory patterns in the memory controller, partly just with
aggressive OoO), software isn't going to be able to improve on the
situation in general.

SW prefetching is a broken concept. You can make big differences for
very specific microarchitectures (usually the broken shit ones are the
ones that show it best), but in the general case it's pretty much
always a lost cause. We've had real cases where prefetching just then
made things worse on other hardware.

So just don't do it. It's benchmarketing for specific hardware, it's
not worth worrying about in the bigger picture. You'll find people
spend a lot of time tuning things for their particular hardware, and
it not helping at all on anything else.

Waste of time. Life is too short (and software is too complex) to try
to work around broken microarchitectures with sw prefetching.

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ