linux-kernel - Re: [RFC] LZO de/compression support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200705281652.09273.dhazelton@enter.net>
Date:	Mon, 28 May 2007 16:52:09 -0400
From:	Daniel Hazelton <dhazelton@...er.net>
To:	Adrian Bunk <bunk@...sta.de>
Cc:	Nitin Gupta <nitingupta910@...il.com>,
	lkml <linux-kernel@...r.kernel.org>, linux-mm-cc@...top.org,
	linuxcompressed-devel@...ts.sourceforge.net,
	Andrew Morton <akpm@...ux-foundation.org>,
	Richard Purdie <richard@...nedhand.com>,
	Bret Towe <magnade@...il.com>,
	Satyam Sharma <satyam.sharma@...il.com>
Subject: Re: [RFC] LZO de/compression support - take 6

On Monday 28 May 2007 16:18:40 Daniel Hazelton wrote:
> On Monday 28 May 2007 13:11:15 Adrian Bunk wrote:
> > On Mon, May 28, 2007 at 09:33:32PM +0530, Nitin Gupta wrote:
> > > On 5/28/07, Adrian Bunk <bunk@...sta.de> wrote:
> > >...
> > >
> > >> - then ensure that it works correctly on all architectures and
> > >
> > > Already tested on x86, amd64, ppc (by Bret). I do not have machines
> > > from other archs available. Bret tested 'take 3' version but no
> > > changes were introduced in further revisions that could affect
> > > correctness - but still it will be good to have this version tested
> > > too. Only with inclusion in -mm and testing by much wider user base
> > > can make it to mainline (I suppose nobody uses -mm for production use
> > > anyway).
> > >
> > >>   document why your version is that much faster than the original
> > >>   version and why you know your optimizations have no side effects
>
> With likely(), unlikely() and noinline *not* defined as NOP's performance
> drops:
>
> 10000 run averages:
> 'Tiny LZO':
>         Combined: 84.9292 usec
>         Compression: 42.4646 usec
>         Decompression: 42.4646 usec
> 'miniLZO':
>         Combined: 61.3548 usec
>         Compression: 43.5648 usec
>         Decompression: 17.79 usec
>
> However, I'm worried that my testbed code - likely the Perl script that
> actually loops the test code and collects its output - is somehow faulting,
> as the way that the Compression and Decompression code have the exact same
> value.
>
> I'm going to toss some debugging output in the script and see if I can spot
> the problem.
>

Okay, checked my code and it was all a problem with cpu_to_le16 getting 
defined fully instead of just being a NOP. With that problem corrected the 
performance returns:

10000 run averages:
'Tiny LZO':
        Combined: 60.1028 usec
        Compression: 42.0652 usec
        Decompression: 18.0376 usec
'miniLZO':
        Combined: 61.0932 usec
        Compression: 43.4382 usec
        Decompression: 17.655 usec

Combined average shows 'Tiny' to be 1.6% faster
Compression in 'Tiny' is 3.2% faster
Decompression in 'Tiny' is 2.2% slower

All in all, the trade-off in this code, with the overall performance of 
the 'Tiny' code being faster than the stock miniLZO code I'd like to say that 
I'm certain that the decompressor could probably be sped up more, although I 
don't know of a place in the kernel where less than half a microsecond would 
make a massive impact. (Only place I can think of where this might have a 
negative is in SLAB/SLOB/SLUB,  and I don't think that a low-level memory 
manager like those is a place for a compression algorithm anyway)

Later today or tommorrow  I'll start putting together another part to 
this "test-bed" for stress-testing the algorithm. Currently I only have a few 
ideas for tests:
(for benchmarking)
1) Random input to compressor
2) large input data
3) real-world data (mmap()'d chunk of /dev/mem as input)

(for stability checking)
4) deliberate corruption of compressed data
5) early finish (realloc() compressed data buffer to less than the full size 
                       of the data)
6) late start (change pointer to start of compressed data so the decompressor
                     doesn't start at the beginning)

When I have a better idea of how the LZO algorithm works I might try creating 
an input data-set for the decompressor that would deliberately overflow the 
output buffer. This is supposed to be caught by bounds-checking in 
the '_safe' version of the decompressor (the version  used in 'Tiny' and  
used in the benchmarking of miniLZO), however it might be possible to trick 
the decompressor. (If I do manage this and it crashes both 'Tiny' 
and 'miniLZO' I *will* report it to Herr Oberhumer as well as see if I can 
come up with a quick patch for the problem).

As I said before, any suggestions for how to improve the benchmark code are 
welcome, as are suggestions for tests I can add to it for stressing the 
new 'TinyLZO' implementation. Latest version of the test-bed attached. 
Significant changes include:
likely(), unlikely() and noinline are no longer NOP's
when the marked line in helpers.h is commented out, cpu_to_le16 functions as 
expected and is no longer a NOP. (Yes, that means this code is now fully 
capable of working correctly on a BE system).

DRH

Download attachment "lzo1x-test-bed-5.tar.bz2" of type "application/x-tbz" (26963 bytes)