linux-kernel - Re: [PATCH V2 10/16] Squashfs: cache operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081029093218.GA26456@logfs.org>
Date:	Wed, 29 Oct 2008 10:32:18 +0100
From:	Jörn Engel <joern@...fs.org>
To:	Phillip Lougher <phillip@...gher.demon.co.uk>
Cc:	akpm@...ux-foundation.org, linux-embedded@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	tim.bird@...sony.com
Subject: Re: [PATCH V2 10/16] Squashfs: cache operations

On Wed, 29 October 2008 01:49:56 +0000, Phillip Lougher wrote:
> +/*
> + * Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
> + * recently accessed data Squashfs uses two small metadata and fragment caches.
> + *
> + * This file implements a generic cache implementation used for both caches,
> + * plus functions layered ontop of the generic cache implementation to
> + * access the metadata and fragment caches.
> + */

I tend to agree with Andrew that a lot of this should be done by the
page cache instead.  One of the problems seems to be that your blocksize
can exceed page size and there really isn't any infrastructure to deal
with such cases yet.  Bufferheads deal with blocks smaller than a page,
not the other way around.

Another is that address spaces are limited ot 16TB on 32bit
architectures.  I guess that should be good enough for a while.  I've
heard some rumors that btrfs actually uses multiple address spaces to
handle this problem, so a good strategy may be to sit back, wait and
simply copy what btrfs does once the issue becomes pressing.

To deal with large blocks, you most likely want to keep you struct
squashfs_cache_entry around and have page->private point to it.  But be
warned, as the whole page->private business is - shall we say - fragile.
You need to supply a number of methods to make things work.  And if you
fail to set any one of them, core code will assume a default, which is
to call into the bufferhead code.  And the backtrace you will receive
some time later has little or no indication that you actually only
missed one method.  I've been meaning to clean this up, but never found
the courage to actually do it.

> +/*
> + * Look-up block in cache, and increment usage count.  If not in cache, read
> + * and decompress it from disk.
> + */
> +struct squashfs_cache_entry *squashfs_cache_get(struct super_block *sb,
> +	struct squashfs_cache *cache, long long block, int length)

I personally prefer u64 instead of long long.  It is a device address
for a 64bit filesystem after all.  Same for next_index.

> +		if (i == cache->entries) {
> +			/*
> +			 * Block not in cache, if all cache entries are locked
> +			 * go to sleep waiting for one to become available.
> +			 */
> +			if (cache->unused == 0) {
> +				cache->waiting++;
> +				spin_unlock(&cache->lock);
> +				wait_event(cache->wait_queue, cache->unused);
> +				spin_lock(&cache->lock);
> +				cache->waiting--;

Maybe rename to no_waiters?  "waiting" looks more like a boolean.

> +			entry->length = squashfs_read_data(sb, entry->data,
> +				block, length, &entry->next_index,
> +				cache->block_size);
> +
> +			spin_lock(&cache->lock);
> +
> +			if (entry->length < 0)
> +				entry->error = entry->length;

entry->error is of type char.  We actually have errno's defined up to
131, so if by whatever freak chance the error is -ENOTRECOVERABLE, this
will convert it to a positive number.  I wouldn't want to debug that.

> +void squashfs_cache_put(struct squashfs_cache *cache,
> +				struct squashfs_cache_entry *entry)
> +{
> +	spin_lock(&cache->lock);
> +	entry->locked--;
> +	if (entry->locked == 0) {

You might want to rename this to "refcount", just to make the name match
the behaviour.

Jörn

-- 
Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
-- Rob Pike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/