lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 6 May 2009 00:08:32 +0200
From:	Lars Ellenberg <lars.ellenberg@...bit.com>
To:	Neil Brown <neilb@...e.de>
Cc:	Philipp Reisner <philipp.reisner@...bit.com>,
	linux-kernel@...r.kernel.org, Jens Axboe <jens.axboe@...cle.com>,
	Greg KH <gregkh@...e.de>,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Sam Ravnborg <sam@...nborg.org>, Dave Jones <davej@...hat.com>,
	Nikanth Karthikesan <knikanth@...e.de>,
	Lars Marowsky-Bree <lmb@...e.de>,
	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>,
	Kyle Moffett <kyle@...fetthome.net>,
	Bart Van Assche <bart.vanassche@...il.com>
Subject: Re: [PATCH 00/16] DRBD: a block device for HA clusters

> > > but to answer the question:
> > > why bother to implement our own encoding?
> > > because we know a lot about the data to be encoded.
> > > 
> > > the compression of the bitmap transfer we just added very recently.
> > > for a bitmap, with large chunks of bits set or unset, it is efficient
> > > to just code the runlength.
> > > to use gzip in kernel would add yet an other huge overhead for code
> > > tables and so on.
> > > during testing of this encoding, applying it to an already gzip'ed file
> > > was able to compress it even further, btw.
> > > though on english plain text, gzip compression is _much_ more effective.
> > 
> > I just tried a little experiment.
> > I created a 128meg file and randomly set 1000 bits in it.
> > I compressed it with "gzip --best" and the result was 4Meg.  Not
> > particularly impressive.
> > I then tried to compress it wit bzip2 and got 3452 bytes.
> > Now *that* is impressive.  I suspect your encoding might do a little
> > better, but I wonder if it is worth the effort.
> 
> The effort is minimal.
> The cpu overhead is negligible (compared with bzip2, or any other
> generic compression scheme), and the memory overhead is next to none
> (just a small scratch buffer, to assemble the network packet).
> No tables or anything involved.
> Especially the _decoding_ part has this nice property:
>   chunk = 0;
>   while (!eof) {
> 	vli_decode_bits(&rl, input); /* number of unset bits */
> 	chunk += rl;
> 	vli_decode_bits(&rl, input); /* number of set bits */
> 	bitmap_dirty_bits(bitmap, chunk, chunk + rl);
> 	chunk += rl;
>  }
> 
> The source code is there.
> 
> For your example, on average you'd have (128 << 23) / 1000 "clear" bits,
> then one set bit. The encoding transfers
> "first bit unset -- ca. (1<<20), 1, ca. (1<<20), 1, ca. (1<<20), 1, ...",
> using 2 bits for the "1", and up to 29 bit for the "ca. 1<<20".
> should be in the very same ballpark as your bzip2 result.
> 
> > I'm not certain that my test file is entirely realistic, but it is
> > still an interesting experiment.
> 
> It is not ;) but still...
> If you are interessted, I can dig up my throw away user land code,
> that has been used to evaluate various such schemes.
> But it is so ugly that I won't post it to lkml.

I found round about ten different versions of that throw away code.
Oh well.  So I just hacked up an other one.

For your entertainment, prepare some example bitmaps.  From all my
real-world example bitmaps I can see that (at least with 4KiB bitmap
granularity), areas with alternating single-bit (which is the only run
length that does not compress) are rare.

Comments wellcome.  Have fun.

	Lars

View attachment "vli_bitstream_demo.c" of type "text/x-csrc" (28392 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ