lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 4 Jun 2024 20:27:38 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Andrew Lunn <andrew@...n.ch>
Cc: Jason Gunthorpe <jgg@...pe.ca>, Paolo Abeni <pabeni@...hat.com>, Mina
 Almasry <almasrymina@...gle.com>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-alpha@...r.kernel.org, linux-mips@...r.kernel.org,
 linux-parisc@...r.kernel.org, sparclinux@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
 bpf@...r.kernel.org, linux-kselftest@...r.kernel.org,
 linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org, "David S.
 Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>, Donald Hunter <donald.hunter@...il.com>,
 Jonathan Corbet <corbet@....net>, Richard Henderson
 <richard.henderson@...aro.org>, Ivan Kokshaysky <ink@...assic.park.msu.ru>,
 Matt Turner <mattst88@...il.com>, Thomas Bogendoerfer
 <tsbogend@...ha.franken.de>, "James E.J. Bottomley"
 <James.Bottomley@...senpartnership.com>, Helge Deller <deller@....de>,
 Andreas Larsson <andreas@...sler.com>, Jesper Dangaard Brouer
 <hawk@...nel.org>, Ilias Apalodimas <ilias.apalodimas@...aro.org>, Masami
 Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>, Arnd Bergmann <arnd@...db.de>, Alexei
 Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
 <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu
 <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>, John Fastabend
 <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, Stanislav
 Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, Jiri Olsa
 <jolsa@...nel.org>, Steffen Klassert <steffen.klassert@...unet.com>,
 Herbert Xu <herbert@...dor.apana.org.au>, David Ahern <dsahern@...nel.org>,
 Willem de Bruijn <willemdebruijn.kernel@...il.com>, Shuah Khan
 <shuah@...nel.org>, Sumit Semwal <sumit.semwal@...aro.org>, Christian
 König <christian.koenig@....com>, Pavel Begunkov
 <asml.silence@...il.com>, David Wei <dw@...idwei.uk>, Yunsheng Lin
 <linyunsheng@...wei.com>, Shailend Chand <shailend@...gle.com>, Harshitha
 Ramamurthy <hramamurthy@...gle.com>, Shakeel Butt <shakeel.butt@...ux.dev>,
 Jeroen de Borst <jeroendb@...gle.com>, Praveen Kaligineedi
 <pkaligineedi@...gle.com>, Willem de Bruijn <willemb@...gle.com>, Kaiyuan
 Zhang <kaiyuanz@...gle.com>
Subject: Re: [PATCH net-next v10 05/14] netdev: netdevice devmem allocator

On Wed, 5 Jun 2024 01:44:37 +0200
Andrew Lunn <andrew@...n.ch> wrote:

> > Interesting, as I sped up the ftrace ring buffer by a substantial amount by
> > adding strategic __always_inline, noinline, likely() and unlikely()
> > throughout the code. It had to do with what was considered the fast path
> > and slow path, and not actually the size of the function. gcc got it
> > horribly wrong.  
> 
> And what did the compiler people say when you reported gcc was getting
> it wrong?
> 
> Our assumption is, the compiler is better than a human at deciding
> this. Or at least, a human who does not spend a long time profiling
> and tuning. If this assumption is not true, we probably should be
> trying to figure out why, and improving the compiler when
> possible. That will benefit everybody.
> 

How is the compiler going to know which path is going to be taken the most?
There's two main paths in the ring buffer logic. One when an event stays on
the sub-buffer, the other when the event crosses over to a new sub buffer.
As there's 100s of events that happen on the same sub-buffer for every one
time there's a cross over, I optimized the paths that stayed on the
sub-buffer, which caused the time for those events to go from 250ns down to
150 ns!. That's a 40% speed up.

I added the unlikely/likely and 'always_inline' and 'noinline' paths to
make sure the "staying on the buffer" path was always the hot path, and
keeping it tight in cache.

How is a compiler going to know that?

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ