netdev - Re: [PATCH net-next] net: sched: use no more than one page in struct fw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140317155024.GD8956@casper.infradead.org>
Date:	Mon, 17 Mar 2014 15:50:24 +0000
From:	Thomas Graf <tgraf@...g.ch>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	John Fastabend <john.fastabend@...il.com>,
	netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: sched: use no more than one page in struct
 fw_head

On 03/17/14 at 03:28pm, Thomas Graf wrote:
> On 03/17/14 at 07:13am, Eric Dumazet wrote:
> > On Mon, 2014-03-17 at 13:51 +0000, Thomas Graf wrote:
> > > On 03/16/14 at 09:06am, Eric Dumazet wrote:
> > > > From: Eric Dumazet <edumazet@...gle.com>
> > > > 
> > > > In commit b4e9b520ca5d ("[NET_SCHED]: Add mask support to fwmark
> > > > classifier") Patrick added an u32 field in fw_head, making it slightly
> > > > bigger than one page.
> > > > 
> > > > Change the layout of this structure and let compiler emit a reciprocal
> > > > divide for fw_hash(), as this makes the core more readable and
> > > > more efficient those days.
> > > 
> > > I think you  need to educate me a bit on this. objdump
> > > spits out the following:
> > > 
> > > static u32 fw_hash(u32 handle)
> > > {
> > >         return handle % HTSIZE;
> > >   1d:   bf ff 01 00 00          mov    edi,0x1ff
> > >   22:   89 f0                   mov    eax,esi
> > >   24:   31 d2                   xor    edx,edx
> > >   26:   f7 f7                   div    edi
> > > 
> > > Doesn't look like a reciprocal div to me. Where did I
> > > screw up or why doesn't gcc optimize it properly?
> > > --
> > 
> > Thats because on your cpu, gcc knows the divide is cheaper than anything
> > else (a multiply followed by a shift)
> 
> OK.

[0] lists 7-17 cycles for DIV r32 on Nehalem or 17-28 in terms of
latency. Benefit of the data fitting into a single page clearly
outweights this slight increase in instructions.

Acked-by: Thomas Graf <tgraf@...g.ch>

[0] http://www.agner.org/optimize/instruction_tables.pdf
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html