linux-kernel - Re: [PATCH v1] lib/vsprintf.c: Even faster decimal conversion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20150311145229.523118d7190cebece1304d2a@linux-foundation.org>
Date:	Wed, 11 Mar 2015 14:52:29 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Rasmus Villemoes <linux@...musvillemoes.dk>
Cc:	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Tejun Heo <tj@...nel.org>, Joe Perches <joe@...ches.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] lib/vsprintf.c: Even faster decimal conversion

On Wed, 11 Mar 2015 00:01:11 +0100 Rasmus Villemoes <linux@...musvillemoes.dk> wrote:

> The most expensive part of decimal conversion is the divisions by 10
> (albeit done using reciprocal multiplication with appropriately chosen
> constants). I decided to see if one could eliminate around half of
> these multiplications by emitting two digits at a time, at the cost of
> a 200 byte lookup table, and it does indeed seem like there is
> something to be gained, especially on 64 bits. Microbenchmarking shows
> improvements ranging from -50% (for numbers uniformly distributed in
> [0, 2^64-1]) to -25% (for numbers heavily biased toward the smaller
> end, a more realistic distribution).
> 
> On a larger scale, perf shows that top, one of the big consumers of
> /proc data, uses 0.5-1.0% fewer cpu cycles.
> 
> I had to jump through some hoops to get the 32 bit code to compile and
> run on my 64 bit machine, so I'm not sure how relevant these numbers
> are, but just for comparison the microbenchmark showed improvements
> between -30% and -10%.
> 
> The bloat-o-meter costs are around 150 bytes (the generated code is a
> little smaller, so it's not the full 200 bytes) on both 32 and 64
> bit. I'm aware that extra cache misses won't show up in a
> microbenchmark as used above, but on the other hand decimal
> conversions often happen in bulk (for example in the case of top).
> 
> I have of course tested that the new code generates the same output as
> the old, for both the first and last 1e10 numbers in [0,2^64-1] and
> 4e9 'random' numbers in-between.
> 
> ...
>
> Test and verification code on github
> <https://github.com/Villemoes/dec>.  It would be nice if someone could
> verify the code on architectures other than x86 - in particular, I
> believe it _should_ work on big-endian, but I have no way of testing
> that. Benchmark numbers would be nice too, of course.

That would be nice.

It would be best to make this as easy as possible for people.  Create a
tarball, attach that to a request sent to linuxppc-dev@...ts.ozlabs.org
along with simple instructions along the lines of

	tar xfz tarball.tar.gz
	cd ./dec
	make
	./whatever

and magic might happen!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/