[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1561104674.cxm7sn77rx.astroid@bobo.none>
Date: Fri, 21 Jun 2019 18:29:27 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@...gle.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Rich Felker <dalias@...c.org>,
"David S. Miller" <davem@...emloft.net>,
Christoph Hellwig <hch@....de>,
James Hogan <jhogan@...nel.org>,
Khalid Aziz <khalid.aziz@...cle.com>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
linux-mips@...r.kernel.org, Linux-MM <linux-mm@...ck.org>,
linuxppc-dev@...ts.ozlabs.org,
Linux-sh list <linux-sh@...r.kernel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Paul Burton <paul.burton@...s.com>,
Paul Mackerras <paulus@...ba.org>, sparclinux@...r.kernel.org,
the arch/x86 maintainers <x86@...nel.org>,
Yoshinori Sato <ysato@...rs.sourceforge.jp>
Subject: Re: [PATCH 16/16] mm: pass get_user_pages_fast iterator arguments in
a structure
Linus Torvalds's on June 21, 2019 3:21 am:
> On Thu, Jun 20, 2019 at 5:19 AM Nicholas Piggin <npiggin@...il.com> wrote:
>>
>> The processor aliasing problem happens because the struct will
>> be initialised with stores using one base register (e.g., stack
>> register), and then same memory is loaded using a different
>> register (e.g., parameter register).
>
> Hmm. Honestly, I've never seen anything like that in any kernel profiles.
>
> Compared to the problems I _do_ see (which is usually the obvious
> cache misses, and locking), it must either be in the noise or it's
> some problem specific to whatever CPU you are doing performance work
> on?
No you're right, the performance hit from these flushes is not a
big hit that stands out in cycle counts. I just look at kernel code
for various flushes. Branches not surprisingly are usually the main
culprit, but they're normally not so interesting.
Static alias prediction seems to work well outside this case. It's
interesting, you need both a store ; load sequence that does not
predict well (e.g., using a different base register), and you also
need that load to be executed ahead of the store.
The small stack structure for arguments is the perfect case. Bad
pattern, and load executed right after store. Even then you also need
a reason to delay the store (e.g., source not ready or store queue
full), but those hazards do show up.
Now, even when all that goes wrong, there are dynamic heuristics that
can take over. So if you run a repetitive microbenchmark you won't
see it.
Some CPUs seem to be quite aggressive about giving up and turning off
the alias prediction globally if you take misses (Intel x86 used to do
that IIRC, not sure if they still do). So in that case you wouldn't
even see it show up in one place, everything will just run slightly
slower.
What I worry about is high rate direct IO workloads that see single
flushes in these paths as significant. Or if this thing creeps in to
the kernel too much and just slightly raises global misses enough,
then it will cause disambiguation to be significantly shut down.
Thanks,
Nick
Powered by blists - more mailing lists