netdev - Re: [PATCH] tcp_cubic: use 32 bit math

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070310114826.GB1608@1wt.eu>
Date:	Sat, 10 Mar 2007 12:48:26 +0100
From:	Willy Tarreau <w@....eu>
To:	David Miller <davem@...emloft.net>
Cc:	shemminger@...ux-foundation.org, rkuhn@....physik.tu-muenchen.de,
	andi@...stfloor.org, dada1@...mosbay.com, jengelh@...ux01.gwdg.de,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH] tcp_cubic: use 32 bit math

On Wed, Mar 07, 2007 at 07:51:35PM -0800, David Miller wrote:
> From: Stephen Hemminger <shemminger@...ux-foundation.org>
> Date: Wed, 07 Mar 2007 19:10:47 -0800
> 
> > David Miller wrote:
> > > What about Willy Tarreau's supposedly even faster variant?
> > > Or does this incorporate that set of improvements?
> > >   
> > That's what this is:
> >     x = (2 * x + (uint32_t)div64_64(a, (uint64_t)x*(uint64_t)x)) / 3;
> 
> Great, thanks for the clarification.

Oh BTW, I have a newer version with a first approximation of the
cbrt() before the div64_64, which allows us to reduce from 3 div64
to only 2 div64. This results in a version which is twice as fast
as the initial one (ncubic), but with slightly less accuracy (0.286%
compared to 0.247). But I see that other functions such as hcbrt()
had a 1.5% avg error, so I think this is not dramatic.

Also, I managed to remove all other divides, to be kind with CPUs
having a slow divide instruction or no divide at all. Since we compute
on limited range (22 bits), we can multiply then shift right. It shows
me even slightly better time on pentium-m and athlon, with a slightly
higher avg error (0.297% compared to 0.286%), and slightly smaller
code.

I just have to clean experiments from my code to provide a patch.
David, Stephen, are you interested ?

$ ./bictcp
fls(0)=0, fls(1)=1, fls(256)=9
Calibrating
Function     clocks  mean(us) max(us)  std(us)  Avg error
bictcp          936     0.61    24.28     1.99 0.172%
ocubic          886     0.57    23.51     3.18 0.274%
ncubic          644     0.42    16.59     2.18 0.247%
ncubic32        444     0.29    11.47     1.50 0.247%
ncubic32_1      444     0.29    11.56     1.88 0.238%
ncubic32b3      337     0.22     8.67     0.88 0.286%
ncubic_ndiv3    329     0.21     8.46     0.69 0.297%
acbrt           707     0.46    18.05     0.80 0.275%
hcbrt           644     0.42    16.44     0.51 1.580%

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html