linux-kernel - Re: [GIT PULL] MD update for 4.9

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFxocN1pqucTfOSGe5RLsB0QsRnwr9TANJQkYAiibMty-w@mail.gmail.com>
Date:   Fri, 7 Oct 2016 09:44:15 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     doug@...yco.com
Cc:     Shaohua Li <shli@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-raid <linux-raid@...r.kernel.org>,
        Neil Brown <neilb@...e.de>
Subject: Re: [GIT PULL] MD update for 4.9

On Thu, Oct 6, 2016 at 10:39 PM, Doug Dumitru <doug@...yco.com> wrote:
>
> There is another thread in [linux-raid] discussing pre-fetches in the
> raid-6 AVX2 code.  My testing implies that the prefetch distance is
> too short.  In your new AVX512 code, it looks like there are 24
> instructions, each with latencies of 1, between the prefetch and the
> actual memory load.  I don't have a AVX512 CPU to try this on, but the
> prefetch might do better at a bigger distance.  If I am not mistaken,
> it takes a lot longer than 24 clocks to fetch 4 cache lines.

We have basically never had a case where prefetches were actually a good idea.

If the hardware doesn't do prefetching on its own (partly with just
physical memory patterns in the memory controller, partly just with
aggressive OoO), software isn't going to be able to improve on the
situation in general.

SW prefetching is a broken concept. You can make big differences for
very specific microarchitectures (usually the broken shit ones are the
ones that show it best), but in the general case it's pretty much
always a lost cause. We've had real cases where prefetching just then
made things worse on other hardware.

So just don't do it. It's benchmarketing for specific hardware, it's
not worth worrying about in the bigger picture. You'll find people
spend a lot of time tuning things for their particular hardware, and
it not helping at all on anything else.

Waste of time. Life is too short (and software is too complex) to try
to work around broken microarchitectures with sw prefetching.

              Linus