lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 17 Mar 2014 16:00:12 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	'Thomas Graf' <tgraf@...g.ch>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	David Miller <davem@...emloft.net>,
	John Fastabend <john.fastabend@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH net-next] net: sched: use no more than one page in
 struct fw_head

From: Thomas Graf
> On 03/17/14 at 03:28pm, Thomas Graf wrote:
> > On 03/17/14 at 07:13am, Eric Dumazet wrote:
> > > On Mon, 2014-03-17 at 13:51 +0000, Thomas Graf wrote:
> > > > On 03/16/14 at 09:06am, Eric Dumazet wrote:
> > > > > From: Eric Dumazet <edumazet@...gle.com>
> > > > >
> > > > > In commit b4e9b520ca5d ("[NET_SCHED]: Add mask support to fwmark
> > > > > classifier") Patrick added an u32 field in fw_head, making it slightly
> > > > > bigger than one page.
> > > > >
> > > > > Change the layout of this structure and let compiler emit a reciprocal
> > > > > divide for fw_hash(), as this makes the core more readable and
> > > > > more efficient those days.
> > > >
> > > > I think you  need to educate me a bit on this. objdump
> > > > spits out the following:
> > > >
> > > > static u32 fw_hash(u32 handle)
> > > > {
> > > >         return handle % HTSIZE;
> > > >   1d:   bf ff 01 00 00          mov    edi,0x1ff
> > > >   22:   89 f0                   mov    eax,esi
> > > >   24:   31 d2                   xor    edx,edx
> > > >   26:   f7 f7                   div    edi
> > > >
> > > > Doesn't look like a reciprocal div to me. Where did I
> > > > screw up or why doesn't gcc optimize it properly?
> > > > --
> > >
> > > Thats because on your cpu, gcc knows the divide is cheaper than anything
> > > else (a multiply followed by a shift)
> >
> > OK.
> 
> [0] lists 7-17 cycles for DIV r32 on Nehalem or 17-28 in terms of
> latency. Benefit of the data fitting into a single page clearly
> outweights this slight increase in instructions.

Actually the -Os forces the divide.
With -O3 the generated code I get for amd64 is a multiply by reciprocal
to get the quotient followed by an open coded multiply by 0x1ff and
then a subtract. 13 instructions, only a few of which are register renames
or are non-dependant.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ