[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200705281652.09273.dhazelton@enter.net>
Date: Mon, 28 May 2007 16:52:09 -0400
From: Daniel Hazelton <dhazelton@...er.net>
To: Adrian Bunk <bunk@...sta.de>
Cc: Nitin Gupta <nitingupta910@...il.com>,
lkml <linux-kernel@...r.kernel.org>, linux-mm-cc@...top.org,
linuxcompressed-devel@...ts.sourceforge.net,
Andrew Morton <akpm@...ux-foundation.org>,
Richard Purdie <richard@...nedhand.com>,
Bret Towe <magnade@...il.com>,
Satyam Sharma <satyam.sharma@...il.com>
Subject: Re: [RFC] LZO de/compression support - take 6
On Monday 28 May 2007 16:18:40 Daniel Hazelton wrote:
> On Monday 28 May 2007 13:11:15 Adrian Bunk wrote:
> > On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
> > > On 5/28/07, Adrian Bunk <bunk@...sta.de> wrote:
> > >...
> > >
> > >> - then ensure that it works correctly on all architectures and
> > >
> > > Already tested on x86, amd64, ppc (by Bret). I do not have machines
> > > from other archs available. Bret tested 'take 3' version but no
> > > changes were introduced in further revisions that could affect
> > > correctness - but still it will be good to have this version tested
> > > too. Only with inclusion in -mm and testing by much wider user base
> > > can make it to mainline (I suppose nobody uses -mm for production use
> > > anyway).
> > >
> > >> document why your version is that much faster than the original
> > >> version and why you know your optimizations have no side effects
>
> With likely(), unlikely() and noinline *not* defined as NOP's performance
> drops:
>
> 10000 run averages:
> 'Tiny LZO':
> Combined: 84.9292 usec
> Compression: 42.4646 usec
> Decompression: 42.4646 usec
> 'miniLZO':
> Combined: 61.3548 usec
> Compression: 43.5648 usec
> Decompression: 17.79 usec
>
> However, I'm worried that my testbed code - likely the Perl script that
> actually loops the test code and collects its output - is somehow faulting,
> as the way that the Compression and Decompression code have the exact same
> value.
>
> I'm going to toss some debugging output in the script and see if I can spot
> the problem.
>
Okay, checked my code and it was all a problem with cpu_to_le16 getting
defined fully instead of just being a NOP. With that problem corrected the
performance returns:
10000 run averages:
'Tiny LZO':
Combined: 60.1028 usec
Compression: 42.0652 usec
Decompression: 18.0376 usec
'miniLZO':
Combined: 61.0932 usec
Compression: 43.4382 usec
Decompression: 17.655 usec
Combined average shows 'Tiny' to be 1.6% faster
Compression in 'Tiny' is 3.2% faster
Decompression in 'Tiny' is 2.2% slower
All in all, the trade-off in this code, with the overall performance of
the 'Tiny' code being faster than the stock miniLZO code I'd like to say that
I'm certain that the decompressor could probably be sped up more, although I
don't know of a place in the kernel where less than half a microsecond would
make a massive impact. (Only place I can think of where this might have a
negative is in SLAB/SLOB/SLUB, and I don't think that a low-level memory
manager like those is a place for a compression algorithm anyway)
Later today or tommorrow I'll start putting together another part to
this "test-bed" for stress-testing the algorithm. Currently I only have a few
ideas for tests:
(for benchmarking)
1) Random input to compressor
2) large input data
3) real-world data (mmap()'d chunk of /dev/mem as input)
(for stability checking)
4) deliberate corruption of compressed data
5) early finish (realloc() compressed data buffer to less than the full size
of the data)
6) late start (change pointer to start of compressed data so the decompressor
doesn't start at the beginning)
When I have a better idea of how the LZO algorithm works I might try creating
an input data-set for the decompressor that would deliberately overflow the
output buffer. This is supposed to be caught by bounds-checking in
the '_safe' version of the decompressor (the version used in 'Tiny' and
used in the benchmarking of miniLZO), however it might be possible to trick
the decompressor. (If I do manage this and it crashes both 'Tiny'
and 'miniLZO' I *will* report it to Herr Oberhumer as well as see if I can
come up with a quick patch for the problem).
As I said before, any suggestions for how to improve the benchmark code are
welcome, as are suggestions for tests I can add to it for stressing the
new 'TinyLZO' implementation. Latest version of the test-bed attached.
Significant changes include:
likely(), unlikely() and noinline are no longer NOP's
when the marked line in helpers.h is commented out, cpu_to_le16 functions as
expected and is no longer a NOP. (Yes, that means this code is now fully
capable of working correctly on a BE system).
DRH
Download attachment "lzo1x-test-bed-5.tar.bz2" of type "application/x-tbz" (26963 bytes)
Powered by blists - more mailing lists