[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D0F6DF5DA@AcuExch.aculab.com>
Date: Mon, 17 Mar 2014 16:00:12 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Thomas Graf' <tgraf@...g.ch>,
Eric Dumazet <eric.dumazet@...il.com>
CC: David Miller <davem@...emloft.net>,
John Fastabend <john.fastabend@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH net-next] net: sched: use no more than one page in
struct fw_head
From: Thomas Graf
> On 03/17/14 at 03:28pm, Thomas Graf wrote:
> > On 03/17/14 at 07:13am, Eric Dumazet wrote:
> > > On Mon, 2014-03-17 at 13:51 +0000, Thomas Graf wrote:
> > > > On 03/16/14 at 09:06am, Eric Dumazet wrote:
> > > > > From: Eric Dumazet <edumazet@...gle.com>
> > > > >
> > > > > In commit b4e9b520ca5d ("[NET_SCHED]: Add mask support to fwmark
> > > > > classifier") Patrick added an u32 field in fw_head, making it slightly
> > > > > bigger than one page.
> > > > >
> > > > > Change the layout of this structure and let compiler emit a reciprocal
> > > > > divide for fw_hash(), as this makes the core more readable and
> > > > > more efficient those days.
> > > >
> > > > I think you need to educate me a bit on this. objdump
> > > > spits out the following:
> > > >
> > > > static u32 fw_hash(u32 handle)
> > > > {
> > > > return handle % HTSIZE;
> > > > 1d: bf ff 01 00 00 mov edi,0x1ff
> > > > 22: 89 f0 mov eax,esi
> > > > 24: 31 d2 xor edx,edx
> > > > 26: f7 f7 div edi
> > > >
> > > > Doesn't look like a reciprocal div to me. Where did I
> > > > screw up or why doesn't gcc optimize it properly?
> > > > --
> > >
> > > Thats because on your cpu, gcc knows the divide is cheaper than anything
> > > else (a multiply followed by a shift)
> >
> > OK.
>
> [0] lists 7-17 cycles for DIV r32 on Nehalem or 17-28 in terms of
> latency. Benefit of the data fitting into a single page clearly
> outweights this slight increase in instructions.
Actually the -Os forces the divide.
With -O3 the generated code I get for amd64 is a multiply by reciprocal
to get the quotient followed by an open coded multiply by 0x1ff and
then a subtract. 13 instructions, only a few of which are register renames
or are non-dependant.
David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists