lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZXHQcyZbXwesy0MV@shell.armlinux.org.uk>
Date: Thu, 7 Dec 2023 14:02:27 +0000
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Serge Semin <fancer.lancer@...il.com>
Cc: Andrew Lunn <andrew@...n.ch>,
	Maxime Chevallier <maxime.chevallier@...tlin.com>,
	Heiner Kallweit <hkallweit1@...il.com>,
	Alexandre Torgue <alexandre.torgue@...s.st.com>,
	Jose Abreu <joabreu@...opsys.com>,
	Jose Abreu <Jose.Abreu@...opsys.com>,
	Tomer Maimon <tmaimon77@...il.com>,
	Rob Herring <robh+dt@...nel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>,
	Conor Dooley <conor+dt@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	openbmc@...ts.ozlabs.org, netdev@...r.kernel.org,
	devicetree@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next 09/16] net: mdio: Add Synopsys DW XPCS
 management interface support

On Thu, Dec 07, 2023 at 04:35:47PM +0300, Serge Semin wrote:
> Hi Andrew
> 
> On Wed, Dec 06, 2023 at 06:01:30PM +0100, Andrew Lunn wrote:
> > The compiler does a better job at deciding what to inline than we
> > humans do. If you can show the compiler is doing it wrong, then we
> > might accept them.
> 
> In general I would have agreed with you especially if the methods were
> heavier than what they are:
> static inline ptrdiff_t dw_xpcs_mmio_addr_format(int dev, int reg)
> {               
>         return FIELD_PREP(0x1f0000, dev) | FIELD_PREP(0xffff, reg);
> }               
>         
> static inline u16 dw_xpcs_mmio_addr_page(ptrdiff_t csr)
> {       
>         return FIELD_GET(0x1fff00, csr);
> }       
> 
> static inline ptrdiff_t dw_xpcs_mmio_addr_offset(ptrdiff_t csr)
> {
>         return FIELD_GET(0xff, csr);
> }
> 
> > But in general, netdev does not like inline in .C
> > file.
> 
> I see. I'll do as you say if you don't change your mind after my
> reasoning below.

This isn't Andrew saying it - you seem to have missed the detail that
"netdev". If Andrew doesn't say it, then DaveM, Jakub or Paolo will.

Have you read the "Inline functions" section in
Documentation/process/4.Coding.rst ?

> > Also, nothing in MDIO is hot path, it spends a lot of time
> > waiting for a slow bus. So inline is likely to just bloat the code for
> > no gain.
> 
> I would have been absolutely with you in this matter, if we were talking
> about a normal MDIO bus. In this case the devices are accessed over
> the system IO-memory. So the bus isn't that slow.
> 
> Regarding the compiler knowing better when to inline the code. Here is
> what it does with the methods above. If the inline keyword is
> specified the compiler will inline all three methods. If the keyword isn't
> specified then dw_xpcs_mmio_addr_format() won't be inlined while the rest
> two functions will be. So the only part at consideration is the
> dw_xpcs_mmio_addr_format() method since the rest of the functions are
> inlined anyway.
> 
> The dw_xpcs_mmio_addr_format() function body is of the 5 asm
> instructions length (on MIPS). Since the function call in this case
> requires two jump instructions (to function and back), one instruction
> to save the previous return address on stack and two instructions for
> the function arguments, the trade-off of having non-inlined function
> are those five additional instructions on each call. There are four
> dw_xpcs_mmio_addr_format() calls. So here is what we get in both
> cases:
> inlined:     5 func ins * 4 calls = 20 ins
> non-inlined: (5 func + 1 jump) ins + (1 jump + 1 ra + 2 arg) ins * 4 calls  = 22 ins
> but seeing the return address needs to be saved anyway in the callers
> here is what we finally get:
> non-inlined: (5 func + 1 jump) ins + (1 jump + 2 arg) ins * 4 calls  = 18 ins
> 
> So unless I am mistaken in some of the aspects if we have the function
> non-inlined then we'll save 2 instructions in the object file, but
> each call would require additional _4_ instructions to execute (2
> jumps and 2 arg creations), which makes the function execution almost
> two times longer than it would have been should it was inlined.

Rather than just focusing on instruction count, you also need to
consider things like branch prediction, prefetching and I-cache
usage. Modern CPUs don't execute instruction-by-instruction anymore.

It is entirely possible that the compiler is making better choices
even if it results in more jumps in the code.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ