lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Mar 2017 12:12:06 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Shannon Nelson' <shannon.nelson@...cle.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>
CC:     "sparclinux@...r.kernel.org" <sparclinux@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 net-next 4/5] sunvnet: count multicast packets

From: Shannon Nelson
> Sent: 16 March 2017 00:18
> To: David Laight; netdev@...r.kernel.org; davem@...emloft.net
> On 3/15/2017 1:50 AM, David Laight wrote:
> > From: Shannon Nelson
> >> Sent: 14 March 2017 17:25
> > ...
> >> +	if (unlikely(is_multicast_ether_addr(eth_hdr(skb)->h_dest)))
> >> +		dev->stats.multicast++;
> >
> > I'd guess that:
> > 	dev->stats.multicast += is_multicast_ether_addr(eth_hdr(skb)->h_dest);
> > generates faster code.
> > Especially if is_multicast_ether_addr(x) is (*x >> 7).

I'd clearly got brain-fade there, mcast bit is the first transmitted bit
(on ethernet) but the bytes are sent LSB first (like async).
> > 	David
> 
> Hi David, thanks for the comment.  My local instruction level
> performance guru is on vacation this week so I can't do a quick check
> with him today on this.  However, I"m not too worried here since the
> inline code for is_multicast_ether_addr() is simply
> 
> 	return 0x01 & addr[0];
> 
> and objdump tells me that on sparc it compiles down to a simple single
> byte load and compare:
> 
>      325c:	c2 08 80 03 	ldub  [ %g2 + %g3 ], %g1
>      3260:	80 88 60 01 	btst  1, %g1
>      3264:	32 60 00 b3 	bne,a,pn   %xcc, 3530 <vnet_rx_one+0x430>
>      3268:	c2 5c 61 68 	ldx  [ %l1 + 0x168 ], %g1
> 		dev->stats.multicast++;

Followed by a branch that might be marked 'assume taken' so the
normal path takes the branch.
I guess that is followed by 'add 1 to %g1', 'stx %g1, [ %l1 + 0x168 ]'
and a branch to 3530.
GCC must be using that condition to generate get the bottom of a loop
to 'fallthrough' to its top!

My version should generate something like:
		ldub  [ %g2 + %g3 ], %g1
		ldx   [ %l1 + 0x168 ], %g2
		and   1, %g1
		add   %g1, %g2, %g2
		stx   %g2, [ %l1 + 0x168 ]
While this looks like 5 instructions (rather than 2) it has fewer pipeline
stalls and can be 'spread out' into the surrounding lines of code to
reduce the stalls further.

> I don't think this driver will ever be used on anything bug sparc, so
> I'm not worried about how x86 might compile this.

On x86 gcc is likely to ignore the 'unlikely' and generate a forwards
(predicted not taken) branch around the increment.
I've had to but asm comments in the else part of conditionals like
that to force gcc to generate a forwards jump to the 'unlikely' statements.

	David



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ