lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1389739537.31367.273.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Tue, 14 Jan 2014 14:45:37 -0800
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Austin S Hemmelgarn <ahferroin7@...il.com>
Cc:	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	netdev@...r.kernel.org, dborkman@...hat.com,
	linux-kernel@...r.kernel.org, darkjames-ws@...kjames.pl
Subject: Re: [PATCH RFC] reciprocal_divide: correction/update of the
 algorithm

On Tue, 2014-01-14 at 15:53 -0500, Austin S Hemmelgarn wrote:
> On 2014-01-14 14:50, Eric Dumazet wrote:
> > On Tue, 2014-01-14 at 14:22 -0500, Austin S Hemmelgarn wrote:
> > 
> >> I disagree with the statement that current CPU's have reasonably fast
> >> dividers.  A lot of embedded processors and many low-end x86 CPU's do
> >> not in-fact have any hardware divider, and usually provide it using
> >> microcode based emulation if they provide it at all.  The AMD Jaguar
> >> micro-architecture in particular comes to mind, it uses an iterative
> >> division algorithm provided by the microcode that only produces 2 bits
> >> of quotient per cycle, even in the best case (2 8-bit integers and an
> >> integral 8-bit quotient) this still takes 4 cycles, which is twice as
> >> slow as any other math operation on the same processor.
> > 
> > I doubt you run any BPF filter with a divide instruction in it on these
> > platform.
> > 
> > Get real, do not over optimize things where it does not matter.
> > 
> Actually, I have three Jaguar based routers, and use BPF regularly as
> part of their iptables rules to log certain packet types.



1) Have you enabled BPF JIT

2) Do you have divide instructions in a BPF filter, 
   if yes, I would like to have an example.
   (current code works well for small divisors anyway)

3) How much time is spent in BPF compared to the rest of the stack,
   especially if you run iptables...

Spending 2 or 3 days of work to save ~7 cycles for a divide that
probably can be replaced by a shift anyway, while spending 5000 cycles
per packet is what I call not a wise optimization, especially
if dealing with old hardware.

Even on a Jaguar, the proposed alternative 

+       u32 t = (u32)(((u64)a * R.m) >> 32);
+       return (t + ((a - t) >> R.sh1)) >> R.sh2;

will have a similar cost.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ