[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46F19326.1040503@myrealbox.com>
Date: Wed, 19 Sep 2007 17:22:46 -0400
From: Andy Lutomirski <luto@...ealbox.com>
To: linux-kernel@...r.kernel.org, andi@...stfloor.org,
kernel1@...erdogtech.com
Subject: Re: A little coding style nugget of joy
Andi Kleen wrote:
> Matt LaPlante <kernel1@...erdogtech.com> writes:
>
>> Since everyone loves random statistics, here are a few gems to give you a break from your busy day:
>>
>> Number of lines in the 2.6.22 Linux kernel source that include one or more trailing whitespaces: 135209
>> Bytes saved by removing said whitespace: 151809
>
> You don't actually save anything on disk on most file systems
> (essentially everything except reiserfs on current Linux)
> because all files are rounded to block size (normally 4K)
>
> Same in page cache.
This is a terrible assumption in general (i.e. if filesize % blocksize
is close to uniformly distributed). If you remove one byte and the data
is stored with blocksize B, then you either save zero bytes with
probability 1-1/B or you save B bytes with probability 1/B. The
expected number of bytes saved is B*1/B=1. Since expectation is linear,
if you remove x bytes, the expected number of bytes saved is x (even if
there is more than one byte removed per file).
In my tree, about half of the files have size >= 4k, so the assumption
is probably not _that_ far off the mark.
Alternatively, there are an average of about 16 bytes removed per file,
and there are 11 which are <= 16 bytes short of a 4k boundary, so it's
not at all unreasonable that we'd save 40-50k.
>
> And in tar files bzip2/gzip is very good at compacting them.
That's true.
--Andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists