lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 21 Jun 2019 18:29:27 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andrey Konovalov <andreyknvl@...gle.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Rich Felker <dalias@...c.org>,
        "David S. Miller" <davem@...emloft.net>,
        Christoph Hellwig <hch@....de>,
        James Hogan <jhogan@...nel.org>,
        Khalid Aziz <khalid.aziz@...cle.com>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        linux-mips@...r.kernel.org, Linux-MM <linux-mm@...ck.org>,
        linuxppc-dev@...ts.ozlabs.org,
        Linux-sh list <linux-sh@...r.kernel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Paul Burton <paul.burton@...s.com>,
        Paul Mackerras <paulus@...ba.org>, sparclinux@...r.kernel.org,
        the arch/x86 maintainers <x86@...nel.org>,
        Yoshinori Sato <ysato@...rs.sourceforge.jp>
Subject: Re: [PATCH 16/16] mm: pass get_user_pages_fast iterator arguments in
 a structure

Linus Torvalds's on June 21, 2019 3:21 am:
> On Thu, Jun 20, 2019 at 5:19 AM Nicholas Piggin <npiggin@...il.com> wrote:
>>
>> The processor aliasing problem happens because the struct will
>> be initialised with stores using one base register (e.g., stack
>> register), and then same memory is loaded using a different
>> register (e.g., parameter register).
> 
> Hmm. Honestly, I've never seen anything like that in any kernel profiles.
> 
> Compared to the problems I _do_ see (which is usually the obvious
> cache misses, and locking), it must either be in the noise or it's
> some problem specific to whatever CPU you are doing performance work
> on?

No you're right, the performance hit from these flushes is not a
big hit that stands out in cycle counts. I just look at kernel code
for various flushes. Branches not surprisingly are usually the main
culprit, but they're normally not so interesting.

Static alias prediction seems to work well outside this case. It's
interesting, you need both a store ; load sequence that does not
predict well (e.g., using a different base register), and you also
need that load to be executed ahead of the store.

The small stack structure for arguments is the perfect case. Bad
pattern, and load executed right after store. Even then you also need
a reason to delay the store (e.g., source not ready or store queue
full), but those hazards do show up.

Now, even when all that goes wrong, there are dynamic heuristics that
can take over. So if you run a repetitive microbenchmark you won't
see it.

Some CPUs seem to be quite aggressive about giving up and turning off
the alias prediction globally if you take misses (Intel x86 used to do
that IIRC, not sure if they still do). So in that case you wouldn't
even see it show up in one place, everything will just run slightly
slower.

What I worry about is high rate direct IO workloads that see single
flushes in these paths as significant. Or if this thing creeps in to
the kernel too much and just slightly raises global misses enough,
then it will cause disambiguation to be significantly shut down.

Thanks,
Nick

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ