lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120320154159.GD27051@google.com>
Date:	Tue, 20 Mar 2012 08:41:59 -0700
From:	Mandeep Singh Baines <msb@...omium.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	Mandeep Singh Baines <msb@...omium.org>,
	linux-kernel@...r.kernel.org, dm-devel@...hat.com,
	Alasdair G Kergon <agk@...hat.com>,
	Will Drewry <wad@...omium.org>,
	Elly Jones <ellyjones@...omium.org>,
	Milan Broz <mbroz@...hat.com>,
	Olof Johansson <olofj@...omium.org>,
	Steffen Klassert <steffen.klassert@...unet.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] dm: remake of the verity target

Hi Mikulas,

Can you please resend this patch with a proper commit message.
We'd really like to see this merged. Alasdair, other than that,
what work is remaining for verity to be merged?

Regards,
Mandeep

Mikulas Patocka (mpatocka@...hat.com) wrote:
> Hi
> 
> > Hi Mikulus,
> > 
> > This is some nice work. I like that you've been able to abstract a lot
> > of the hash buffer management with dm-bufio. You got rid of the I/O queue.
> > I've been meaning to do that for a while. The prefetch is also nice.
> > We planned to do this but I decided to not do it now in order to get the
> > base functionality in:
> > 
> > http://crosbug.com/25441
> > 
> > However, there are some things that I don't like. I don't like comments
> > either but you have none. You also removed our documentation. You are
> 
> I added some comments. As for documentation, it's OK to use documentation 
> from your patch because the on-disk format and the target arguments are 
> the same (with an enhancement that my code supports different data and 
> metadata bock size and it has variable-length salt).
> 
> > allocated a complete shash_desc per I/O. We only allocate one per CPU.
> 
> The hash of 4k block takes 174000 cycles. So trying to optimize 
> memory latency that is about 250 cycles doesn't make much sense.
> 
> I actually observed better performance using verity on ramdisk with 
> workqueue unbound to specific CPUs. The reason is that the ramdisk bio 
> completion routine is always run on the same CPU (that one that submitted 
> the request), so with bound workqueue, everything was executing on one 
> CPU. With unbound workqueue, I got parallelism.
> 
> > We short-circuit the hash at any level. Your implementation can only
> > shirt circuit at the lowest level.
> 
> It short-circuits hash at all levels. If the function 
> "verity_verify_level" finds out that "aux->hash_verified" is non-zero, it 
> doesn't do any hashing, it just copies the hash for the lower level. My 
> implementation walks the tree from the top to the bottom, but it doesn't 
> do hash verification if the same block has been verified before.
> 
> All this tree-walking from the root to the bottom is 50-times faster than 
> the actual hashing of the data block (I measured that), so there's not 
> much point in trying to optimize it. I did a simple optimization (don't 
> walk the tree if the lowest block is already verified) and I don't need to 
> do anything complicated given the fact that it can't improve more than by 
> 2%.
> 
> > I'd like to propose that we get the version we sent upstream and then work
> > together on adding some of your enhancements incrementally.
> 
> If you add dm-bufio support, you end up deleting majority of the original 
> code anyway. That's why I wrote it from scratch and that's why I didn't 
> attempt to morph your code.
> 
> It's simpler to write the code from scratch and it is also less bug-prone. 
> 
> > Other than
> > the changes we've made to cleanup for upstreaming, the version I
> > submitted is the code we are using in production.
> > 
> > I'm happy to add prefetch now if that is required for merging.
> > 
> > What do you think?
> > 
> > Regards,
> > Mandeep
> 
> This is the version with comments added:
> 
> Mikulas
> 
> ----
> 
> Remake of the google dm-verity patch.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@...hat.com>
> 
> ---
>  drivers/md/Kconfig     |   17 
>  drivers/md/Makefile    |    1 
>  drivers/md/dm-verity.c |  851 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 869 insertions(+)
> 
> Index: linux-3.3-rc6-fast/drivers/md/Kconfig
> ===================================================================
> --- linux-3.3-rc6-fast.orig/drivers/md/Kconfig	2012-03-13 21:46:03.000000000 +0100
> +++ linux-3.3-rc6-fast/drivers/md/Kconfig	2012-03-13 21:46:05.000000000 +0100
> @@ -404,4 +404,21 @@ config DM_VERITY2
>  
>            If unsure, say N.
>  
> +config DM_VERITY
> +	tristate "Verity target support"
> +	depends on BLK_DEV_DM
> +	select CRYPTO
> +	select CRYPTO_HASH
> +	select DM_BUFIO
> +	---help---
> +	  This device-mapper target allows you to create a device that
> +	  transparently integrity checks the data on it. You'll need to
> +	  activate the digests you're going to use in the cryptoapi
> +	  configuration.
> +
> +	  To compile this code as a module, choose M here: the module will
> +	  be called dm-verity.
> +
> +	  If unsure, say N.
> +
>  endif # MD
> Index: linux-3.3-rc6-fast/drivers/md/Makefile
> ===================================================================
> --- linux-3.3-rc6-fast.orig/drivers/md/Makefile	2012-03-13 21:46:03.000000000 +0100
> +++ linux-3.3-rc6-fast/drivers/md/Makefile	2012-03-13 21:46:05.000000000 +0100
> @@ -29,6 +29,7 @@ obj-$(CONFIG_MD_FAULTY)		+= faulty.o
>  obj-$(CONFIG_BLK_DEV_MD)	+= md-mod.o
>  obj-$(CONFIG_BLK_DEV_DM)	+= dm-mod.o
>  obj-$(CONFIG_DM_BUFIO)		+= dm-bufio.o
> +obj-$(CONFIG_DM_VERITY)		+= dm-verity.o
>  obj-$(CONFIG_DM_CRYPT)		+= dm-crypt.o
>  obj-$(CONFIG_DM_DELAY)		+= dm-delay.o
>  obj-$(CONFIG_DM_FLAKEY)		+= dm-flakey.o
> Index: linux-3.3-rc6-fast/drivers/md/dm-verity.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux-3.3-rc6-fast/drivers/md/dm-verity.c	2012-03-13 22:02:05.000000000 +0100
> @@ -0,0 +1,851 @@
> +/*
> + * Copyright (C) 2012 Red Hat, Inc.
> + *
> + * Author: Mikulas Patocka <mpatocka@...hat.com>
> + *
> + * Based on Chromium dm-verity driver (C) 2011 The Chromium OS Authors
> + *
> + * This file is released under the GPLv2.
> + *
> + * Device mapper target parameters:
> + *	<version>	0
> + *	<data device>
> + *	<hash device>
> + *	<hash start>	(typically 0)
> + *	<block size>	(typically 4096)
> + *	<algorithm>
> + *	<digest>
> + *	optional parameters:
> + *		<salt> (should have 32 bytes for compatibility with Google code)
> + *		<hash block size> (by default it is the same as data block size)
> + *
> + * In the file "/sys/module/dm_verity/parameters/prefetch_cluster" you can set
> + * default prefetch value. Data are read in "prefetch_cluster" chunks from the
> + * hash device. Prefetch cluster greatly improves performance when data and hash
> + * are on the same disk on different partitions.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/device-mapper.h>
> +#include <crypto/hash.h>
> +#include "dm-bufio.h"
> +
> +#define DM_MSG_PREFIX			"verity"
> +
> +#define DM_VERITY_IO_VEC_INLINE		16
> +#define DM_VERITY_MEMPOOL_SIZE		4
> +#define DM_VERITY_DEFAULT_PREFETCH_SIZE	262144
> +
> +#define DM_VERITY_MAX_LEVELS		63
> +
> +static unsigned prefetch_cluster = DM_VERITY_DEFAULT_PREFETCH_SIZE;
> +
> +module_param_named(prefetch_cluster, prefetch_cluster, uint, S_IRUGO | S_IWUSR);
> +
> +struct dm_verity {
> +	struct dm_dev *data_dev;
> +	struct dm_dev *hash_dev;
> +	struct dm_target *ti;
> +	struct dm_bufio_client *bufio;
> +	char *alg_name;
> +	struct crypto_shash *tfm;
> +	u8 *root_digest;	/* digest of the root block */
> +	u8 *salt;		/* salt, its size is salt_size */
> +	unsigned salt_size;
> +	sector_t data_start;	/* data offset in 512-byte sectors */
> +	sector_t hash_start;	/* hash start in blocks */
> +	sector_t data_blocks;	/* the number of data blocks */
> +	sector_t hash_blocks;	/* the number of hash blocks */
> +	unsigned char data_dev_block_bits;	/* log2(data blocksize) */
> +	unsigned char hash_dev_block_bits;	/* log2(hash blocksize) */
> +	unsigned char hash_per_block_bits;	/* log2(hashes in hash block) */
> +	unsigned char levels;	/* the number of tree levels */
> +	unsigned digest_size;	/* digest size for the current hash algorithm */
> +	unsigned shash_descsize;/* the size of temporary space for crypto */
> +
> +	mempool_t *io_mempool;	/* mempool of struct dm_verity_io */
> +	mempool_t *vec_mempool;	/* mempool of bio vector */
> +
> +	struct workqueue_struct *verify_wq;
> +
> +	/* starting blocks for each tree level. 0 is the lowest level. */
> +	sector_t hash_level_block[DM_VERITY_MAX_LEVELS];
> +};
> +
> +struct dm_verity_io {
> +	struct dm_verity *v;
> +	struct bio *bio;
> +
> +	/* original values of bio->bi_end_io and bio->bi_private */
> +	bio_end_io_t *orig_bi_end_io;
> +	void *orig_bi_private;
> +
> +	sector_t block;
> +	unsigned n_blocks;
> +
> +	/* saved bio vector */
> +	struct bio_vec *io_vec;
> +	unsigned io_vec_size;
> +
> +	struct work_struct work;
> +
> +	/* a space for short vectors; longer vectors are allocated separately */
> +	struct bio_vec io_vec_inline[DM_VERITY_IO_VEC_INLINE];
> +
> +	/* variable-size fields, accessible with functions
> +		io_hash_desc, io_real_digest, io_want_digest */
> +	/* u8 hash_desc[crypto_shash_descsize(v->tfm)]; */
> +	/* u8 real_digest[v->digest_size]; */
> +	/* u8 want_digest[v->digest_size]; */
> +};
> +
> +static struct shash_desc *io_hash_desc(struct dm_verity *v, struct dm_verity_io *io)
> +{
> +	return (struct shash_desc *)(io + 1);
> +}
> +
> +static u8 *io_real_digest(struct dm_verity *v, struct dm_verity_io *io)
> +{
> +	return (u8 *)(io + 1) + v->shash_descsize;
> +}
> +
> +static u8 *io_want_digest(struct dm_verity *v, struct dm_verity_io *io)
> +{
> +	return (u8 *)(io + 1) + v->shash_descsize + v->digest_size;
> +}
> +
> +/*
> + * Auxiliary structure appended to each dm-bufio buffer. If the value
> + * hash_verified is nonzero, hash of the block has been verified.
> + *
> + * There is no lock around this value, a race condition can at worst cause
> + * that multiple processes verify the hash of the same buffer simultaneously.
> + * This condition is harmless, so we don't need locking.
> + */
> +struct buffer_aux {
> +	int hash_verified;
> +};
> +
> +/*
> + * Initialize struct buffer_aux for a freshly created buffer.
> + */
> +static void dm_bufio_alloc_callback(struct dm_buffer *buf)
> +{
> +	struct buffer_aux *aux = dm_bufio_get_aux_data(buf);
> +	aux->hash_verified = 0;
> +}
> +
> +/*
> + * Translate input sector number to the sector number on the target device.
> + */
> +static sector_t verity_map_sector(struct dm_verity *v, sector_t bi_sector)
> +{
> +	return v->data_start + dm_target_offset(v->ti, bi_sector);
> +}
> +
> +/*
> + * Return hash position of a specified block at a specified tree level
> + * (0 is the lowest level).
> + * The lowest "hash_per_block_bits"-bits of the result denote hash position
> + * inside a hash block. The remaining bits denode location of the hash block.
> + */
> +static sector_t verity_position_at_level(struct dm_verity *v, sector_t block,
> +					 int level)
> +{
> +	return block >> (level * v->hash_per_block_bits);
> +}
> +
> +static void verity_hash_at_level(struct dm_verity *v, sector_t block, int level,
> +				 sector_t *hash_block, unsigned *offset)
> +{
> +	sector_t position = verity_position_at_level(v, block, level);
> +
> +	*hash_block = v->hash_level_block[level] + (position >> v->hash_per_block_bits);
> +	if (offset)
> +		*offset = v->digest_size * (position & ((1 << v->hash_per_block_bits) - 1));
> +}
> +
> +/*
> + * Verify hash of a metadata block pertaining to the specified data block
> + * ("block" argument) at a specified level ("level" argument).
> + *
> + * On successful return, io_want_digest(v, io) contains the hash value for
> + * a lower tree level or for the data block (if we're at the lowest leve).
> + *
> + * If "skip_unverified" is true, unverified buffer is skipped an 1 is returned.
> + * If "skip_unverified" is false, unverified buffer is hashed and verified
> + * against current value of io_want_digest(v, io).
> + */
> +static int verity_verify_level(struct dm_verity_io *io, sector_t block,
> +			       int level, bool skip_unverified)
> +{
> +	struct dm_verity *v = io->v;
> +	struct dm_buffer *buf;
> +	struct buffer_aux *aux;
> +	u8 *data;
> +	int r;
> +	sector_t hash_block;
> +	unsigned offset;
> +
> +	verity_hash_at_level(v, block, level, &hash_block, &offset);
> +
> +	data = dm_bufio_read(v->bufio, hash_block, &buf);
> +	if (unlikely(IS_ERR(data)))
> +		return PTR_ERR(data);
> +
> +	aux = dm_bufio_get_aux_data(buf);
> +
> +	if (!aux->hash_verified) {
> +		struct shash_desc *desc;
> +		u8 *result;
> +
> +		if (skip_unverified) {
> +			r = 1;
> +			goto release_ret_r;
> +		}
> +
> +		desc = io_hash_desc(v, io);
> +		desc->tfm = v->tfm;
> +		desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> +		r = crypto_shash_init(desc);
> +		if (r < 0) {
> +			DMERR("crypto_shash_init failed: %d", r);
> +			goto release_ret_r;
> +		}
> +
> +		r = crypto_shash_update(desc, data, 1 << v->hash_dev_block_bits);
> +		if (r < 0) {
> +			DMERR("crypto_shash_update failed: %d", r);
> +			goto release_ret_r;
> +		}
> +
> +		r = crypto_shash_update(desc, v->salt, v->salt_size);
> +		if (r < 0) {
> +			DMERR("crypto_shash_update failed: %d", r);
> +			goto release_ret_r;
> +		}
> +
> +		result = io_real_digest(v, io);
> +		r = crypto_shash_final(desc, result);
> +		if (r < 0) {
> +			DMERR("crypto_shash_final failed: %d", r);
> +			goto release_ret_r;
> +		}
> +		if (unlikely(memcmp(result, io_want_digest(v, io), v->digest_size))) {
> +			DMERR_LIMIT("metadata block %llu is corrupted",
> +				(unsigned long long)hash_block);
> +			r = -EIO;
> +			goto release_ret_r;
> +		} else
> +			aux->hash_verified = 1;
> +	}
> +
> +	data += offset;
> +
> +	memcpy(io_want_digest(v, io), data, v->digest_size);
> +
> +	dm_bufio_release(buf);
> +	return 0;
> +
> +release_ret_r:
> +	dm_bufio_release(buf);
> +	return r;
> +}
> +
> +/*
> + * Verify one "dm_verity_io" structure.
> + */
> +static int verity_verify_io(struct dm_verity_io *io)
> +{
> +	struct dm_verity *v = io->v;
> +	unsigned b;
> +	int i;
> +	unsigned vector = 0, offset = 0;
> +	for (b = 0; b < io->n_blocks; b++) {
> +		struct shash_desc *desc;
> +		u8 *result;
> +		int r;
> +		unsigned todo;
> +
> +		if (likely(v->levels)) {
> +			/*
> +			 * First, we try to get the requested hash for
> +			 * the current block. If the hash block itself is
> +			 * verified, zero is returned. If it isn't, this
> +			 * function returns 0 and we fall back to whole
> +			 * chain verification.
> +			 */
> +			int r = verity_verify_level(io, io->block + b, 0, true);
> +			if (likely(!r))
> +				goto test_block_hash;
> +			if (r < 0)
> +				return r;
> +		}
> +
> +		memcpy(io_want_digest(v, io), v->root_digest, v->digest_size);
> +
> +		for (i = v->levels - 1; i >= 0; i--) {
> +			int r = verity_verify_level(io, io->block + b, i, false);
> +			if (unlikely(r))
> +				return r;
> +		}
> +
> +test_block_hash:
> +		desc = io_hash_desc(v, io);
> +		desc->tfm = v->tfm;
> +		desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> +		r = crypto_shash_init(desc);
> +		if (r < 0) {
> +			DMERR("crypto_shash_init failed: %d", r);
> +			return r;
> +		}
> +
> +		todo = 1 << v->data_dev_block_bits;
> +		do {
> +			struct bio_vec *bv;
> +			u8 *page;
> +			unsigned len;
> +
> +			BUG_ON(vector >= io->io_vec_size);
> +			bv = &io->io_vec[vector];
> +			page = kmap_atomic(bv->bv_page, KM_USER0);
> +			len = bv->bv_len - offset;
> +			if (likely(len >= todo))
> +				len = todo;
> +			r = crypto_shash_update(desc,
> +					page + bv->bv_offset + offset, len);
> +			kunmap_atomic(page, KM_USER0);
> +			if (r < 0) {
> +				DMERR("crypto_shash_update failed: %d", r);
> +				return r;
> +			}
> +			offset += len;
> +			if (likely(offset == bv->bv_len)) {
> +				offset = 0;
> +				vector++;
> +			}
> +			todo -= len;
> +		} while (todo);
> +
> +		r = crypto_shash_update(desc, v->salt, v->salt_size);
> +		if (r < 0) {
> +			DMERR("crypto_shash_update failed: %d", r);
> +			return r;
> +		}
> +
> +		result = io_real_digest(v, io);
> +		r = crypto_shash_final(desc, result);
> +		if (r < 0) {
> +			DMERR("crypto_shash_final failed: %d", r);
> +			return r;
> +		}
> +		if (unlikely(memcmp(result, io_want_digest(v, io), v->digest_size))) {
> +			DMERR_LIMIT("data block %llu is corrupted",
> +				(unsigned long long)(io->block + b));
> +			return -EIO;
> +		}
> +	}
> +	BUG_ON(vector != io->io_vec_size);
> +	BUG_ON(offset);
> +	return 0;
> +}
> +
> +/*
> + * End one "io" structure with a given error.
> + */
> +static void verity_finish_io(struct dm_verity_io *io, int error)
> +{
> +	struct bio *bio = io->bio;
> +	struct dm_verity *v = io->v;
> +
> +	bio->bi_end_io = io->orig_bi_end_io;
> +	bio->bi_private = io->orig_bi_private;
> +
> +	if (io->io_vec != io->io_vec_inline)
> +		mempool_free(io->io_vec, v->vec_mempool);
> +	mempool_free(io, v->io_mempool);
> +
> +	bio_endio(bio, error);
> +}
> +
> +static void verity_work(struct work_struct *w)
> +{
> +	struct dm_verity_io *io = container_of(w, struct dm_verity_io, work);
> +
> +	verity_finish_io(io, verity_verify_io(io));
> +}
> +
> +static void verity_end_io(struct bio *bio, int error)
> +{
> +	struct dm_verity_io *io = bio->bi_private;
> +	if (error) {
> +		verity_finish_io(io, error);
> +		return;
> +	}
> +
> +	INIT_WORK(&io->work, verity_work);
> +	queue_work(io->v->verify_wq, &io->work);
> +}
> +
> +/*
> + * Prefetch buffers for the specified io.
> + * The root buffer is not prefetched, it is assumed that it will be cached
> + * all the time.
> + */
> +static void verity_prefetch_io(struct dm_verity *v, struct dm_verity_io *io)
> +{
> +	int i;
> +	for (i = v->levels - 2; i >= 0; i--) {
> +		sector_t hash_block_start;
> +		sector_t hash_block_end;
> +		verity_hash_at_level(v, io->block, i, &hash_block_start, NULL);
> +		verity_hash_at_level(v, io->block + io->n_blocks - 1, i, &hash_block_end, NULL);
> +		if (!i) {
> +			unsigned cluster = prefetch_cluster;
> +	 /* barrier to stop GCC from re-reading prefetch_cluster again */
> +			barrier();
> +			cluster >>= v->data_dev_block_bits;
> +			if (unlikely(!cluster))
> +				goto no_prefetch_cluster;
> +			if (unlikely(cluster & (cluster - 1)))
> +				cluster = 1 << (fls(cluster) - 1);
> +
> +			hash_block_start &= ~(sector_t)(cluster - 1);
> +			hash_block_end |= cluster - 1;
> +			if (unlikely(hash_block_end >= v->hash_blocks))
> +				hash_block_end = v->hash_blocks - 1;
> +		}
> +no_prefetch_cluster:
> +		dm_bufio_prefetch(v->bufio, hash_block_start,
> +					hash_block_end - hash_block_start + 1);
> +	}
> +}
> +
> +/*
> + * Bio map function. It allocates dm_verity_io structure and bio vector and
> + * fills them. Then it issues prefetches and the I/O.
> + */
> +static int verity_map(struct dm_target *ti, struct bio *bio,
> +		      union map_info *map_context)
> +{
> +	struct dm_verity *v = ti->private;
> +	struct dm_verity_io *io;
> +
> +	if (((unsigned)bio->bi_sector | bio_sectors(bio)) &
> +	    ((1 << (v->data_dev_block_bits - SECTOR_SHIFT)) - 1)) {
> +		DMERR_LIMIT("unaligned io");
> +		return -EIO;
> +	}
> +
> +	if ((bio->bi_sector + bio_sectors(bio)) >>
> +	    (v->data_dev_block_bits - SECTOR_SHIFT) > v->data_blocks) {
> +		DMERR_LIMIT("io out of range");
> +		return -EIO;
> +	}
> +
> +	if (bio_data_dir(bio) == WRITE)
> +		return -EIO;
> +
> +	io = mempool_alloc(v->io_mempool, GFP_NOIO);
> +	io->v = v;
> +	io->bio = bio;
> +	io->orig_bi_end_io = bio->bi_end_io;
> +	io->orig_bi_private = bio->bi_private;
> +	io->block = bio->bi_sector >> (v->data_dev_block_bits - SECTOR_SHIFT);
> +	io->n_blocks = bio->bi_size >> v->data_dev_block_bits;
> +
> +	bio->bi_end_io = verity_end_io;
> +	bio->bi_private = io;
> +	bio->bi_bdev = v->data_dev->bdev;
> +	bio->bi_sector = verity_map_sector(v, bio->bi_sector);
> +
> +	io->io_vec_size = bio->bi_vcnt - bio->bi_idx;
> +	if (io->io_vec_size < DM_VERITY_IO_VEC_INLINE)
> +		io->io_vec = io->io_vec_inline;
> +	else
> +		io->io_vec = mempool_alloc(v->vec_mempool, GFP_NOIO);
> +	memcpy(io->io_vec, bio_iovec(bio),
> +	       io->io_vec_size * sizeof(struct bio_vec));
> +
> +	verity_prefetch_io(v, io);
> +
> +	generic_make_request(bio);
> +
> +	return DM_MAPIO_SUBMITTED;
> +}
> +
> +static int verity_status(struct dm_target *ti, status_type_t type,
> +			 char *result, unsigned maxlen)
> +{
> +	struct dm_verity *v = ti->private;
> +	unsigned sz = 0;
> +	unsigned x;
> +
> +	switch (type) {
> +	case STATUSTYPE_INFO:
> +		result[0] = 0;
> +		break;
> +	case STATUSTYPE_TABLE:
> +		DMEMIT("%u %s %s %llu %u %s ",
> +			0,
> +			v->data_dev->name,
> +			v->hash_dev->name,
> +			(unsigned long long)v->hash_start << (v->hash_dev_block_bits - SECTOR_SHIFT),
> +			1 << v->data_dev_block_bits,
> +			v->alg_name
> +			);
> +		for (x = 0; x < v->digest_size; x++)
> +			DMEMIT("%02x", v->root_digest[x]);
> +		DMEMIT(" ");
> +		if (!v->salt_size)
> +			DMEMIT("-");
> +		else
> +			for (x = 0; x < v->salt_size; x++)
> +				DMEMIT("%02x", v->salt[x]);
> +		if (v->data_dev_block_bits != v->hash_dev_block_bits)
> +			DMEMIT(" %u", 1 << v->hash_dev_block_bits);
> +		break;
> +	}
> +	return 0;
> +}
> +
> +static int verity_ioctl(struct dm_target *ti, unsigned cmd,
> +			unsigned long arg)
> +{
> +	struct dm_verity *v = ti->private;
> +	int r = 0;
> +
> +	if (v->data_start ||
> +	    ti->len != i_size_read(v->data_dev->bdev->bd_inode) >> SECTOR_SHIFT)
> +		r = scsi_verify_blk_ioctl(NULL, cmd);
> +
> +	return r ? : __blkdev_driver_ioctl(v->data_dev->bdev, v->data_dev->mode,
> +				     cmd, arg);
> +}
> +
> +static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
> +			struct bio_vec *biovec, int max_size)
> +{
> +	struct dm_verity *v = ti->private;
> +	struct request_queue *q = bdev_get_queue(v->data_dev->bdev);
> +
> +	if (!q->merge_bvec_fn)
> +		return max_size;
> +
> +	bvm->bi_bdev = v->data_dev->bdev;
> +	bvm->bi_sector = verity_map_sector(v, bvm->bi_sector);
> +
> +	return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
> +}
> +
> +static int verity_iterate_devices(struct dm_target *ti,
> +				  iterate_devices_callout_fn fn, void *data)
> +{
> +	struct dm_verity *v = ti->private;
> +	return fn(ti, v->data_dev, v->data_start, ti->len, data);
> +}
> +
> +static void verity_io_hints(struct dm_target *ti, struct queue_limits *limits)
> +{
> +	struct dm_verity *v = ti->private;
> +
> +	if (limits->logical_block_size < 1 << v->data_dev_block_bits)
> +		limits->logical_block_size = 1 << v->data_dev_block_bits;
> +	if (limits->physical_block_size < 1 << v->data_dev_block_bits)
> +		limits->physical_block_size = 1 << v->data_dev_block_bits;
> +	blk_limits_io_min(limits, limits->logical_block_size);
> +}
> +
> +static void verity_dtr(struct dm_target *ti);
> +
> +static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
> +{
> +	struct dm_verity *v;
> +	unsigned num;
> +	unsigned long long hs;
> +	int r;
> +	int i;
> +	sector_t hash_position;
> +	char dummy;
> +
> +	v = kzalloc(sizeof(struct dm_verity), GFP_KERNEL);
> +	if (!v) {
> +		ti->error = "Cannot allocate verity structure";
> +		return -ENOMEM;
> +	}
> +	ti->private = v;
> +	v->ti = ti;
> +
> +	if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) {
> +		ti->error = "Device must be readonly";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +
> +	if (argc < 7) {
> +		ti->error = "Not enough arguments";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +
> +	if (sscanf(argv[0], "%d%c", &num, &dummy) != 1 ||
> +	    num != 0) {
> +		ti->error = "Invalid version";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +
> +	r = dm_get_device(ti, argv[1], FMODE_READ, &v->data_dev);
> +	if (r) {
> +		ti->error = "Data device lookup failed";
> +		goto bad;
> +	}
> +
> +	r = dm_get_device(ti, argv[2], FMODE_READ, &v->hash_dev);
> +	if (r) {
> +		ti->error = "Data device lookup failed";
> +		goto bad;
> +	}
> +
> +	if (sscanf(argv[3], "%llu%c", &hs, &dummy) != 1 ||
> +	    hs != (sector_t)hs) {
> +		ti->error = "Invalid hash start";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +
> +	if (sscanf(argv[4], "%u%c", &num, &dummy) != 1 ||
> +	    !num || (num & (num - 1)) ||
> +	    num < bdev_logical_block_size(v->data_dev->bdev) ||
> +	    num > PAGE_SIZE) {
> +		ti->error = "Invalid data device block size";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +	v->data_dev_block_bits = ffs(num) - 1;
> +	v->hash_dev_block_bits = ffs(num) - 1;
> +
> +	v->alg_name = kstrdup(argv[5], GFP_KERNEL);
> +	if (!v->alg_name) {
> +		ti->error = "Cannot allocate algorithm name";
> +		r = -ENOMEM;
> +		goto bad;
> +	}
> +
> +	v->tfm = crypto_alloc_shash(v->alg_name, 0, 0);
> +	if (IS_ERR(v->tfm)) {
> +		ti->error = "Cannot initialize hash function";
> +		r = PTR_ERR(v->tfm);
> +		v->tfm = NULL;
> +		goto bad;
> +	}
> +	v->digest_size = crypto_shash_digestsize(v->tfm);
> +	if ((1 << v->hash_dev_block_bits) < v->digest_size * 2) {
> +		ti->error = "Digest size too big";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +	v->shash_descsize =
> +		sizeof(struct shash_desc) + crypto_shash_descsize(v->tfm);
> +
> +	v->root_digest = kmalloc(v->digest_size, GFP_KERNEL);
> +	if (!v->root_digest) {
> +		ti->error = "Cannot allocate root digest";
> +		r = -ENOMEM;
> +		goto bad;
> +	}
> +	if (strlen(argv[6]) != v->digest_size * 2 ||
> +	    hex2bin(v->root_digest, argv[6], v->digest_size)) {
> +		ti->error = "Invalid root digest";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +
> +	if (argc > 7 && strcmp(argv[7], "-")) {
> +		v->salt_size = strlen(argv[7]) / 2;
> +		v->salt = kmalloc(v->salt_size, GFP_KERNEL);
> +		if (!v->salt) {
> +			ti->error = "Cannot allocate salt";
> +			r = -ENOMEM;
> +			goto bad;
> +		}
> +		if (strlen(argv[7]) != v->salt_size * 2 ||
> +		    hex2bin(v->salt, argv[7], v->salt_size)) {
> +			ti->error = "Invalid salt";
> +			r = -EINVAL;
> +			goto bad;
> +		}
> +	}
> +
> +	if (argc > 8) {
> +		if (sscanf(argv[8], "%u%c", &num, &dummy) != 1 ||
> +		    !num || (num & (num - 1)) ||
> +		    num < bdev_logical_block_size(v->hash_dev->bdev) ||
> +		    num > INT_MAX) {
> +			ti->error = "Invalid hash device block size";
> +			r = -EINVAL;
> +			goto bad;
> +		}
> +		v->hash_dev_block_bits = ffs(num) - 1;
> +	}
> +
> +	if (hs & ((1 << (v->hash_dev_block_bits - SECTOR_SHIFT)) - 1)) {
> +		ti->error = "Hash start not aligned on block boundary";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +	v->hash_start = hs >> (v->hash_dev_block_bits - SECTOR_SHIFT);
> +
> +	if (ti->len > i_size_read(v->data_dev->bdev->bd_inode) >> SECTOR_SHIFT) {
> +		ti->error = "Data device si too small";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +
> +	if (ti->len & ((1 << (v->data_dev_block_bits - SECTOR_SHIFT)) - 1)) {
> +		ti->error = "Data device length is not aligned to block size";
> +		r = -EINVAL;
> +		goto bad;
> +	}
> +
> +	v->data_blocks = ti->len >> (v->data_dev_block_bits - SECTOR_SHIFT);
> +
> +	v->hash_per_block_bits =
> +		fls((1 << v->hash_dev_block_bits) / v->digest_size) - 1;
> +
> +	v->levels = 0;
> +	if (v->data_blocks)
> +		while (v->hash_per_block_bits * v->levels < 64 &&
> +		       (unsigned long long)(v->data_blocks - 1) >>
> +		       (v->hash_per_block_bits * v->levels))
> +			v->levels++;
> +
> +	if (v->levels > DM_VERITY_MAX_LEVELS) {
> +		ti->error = "Too many tree levels";
> +		r = -E2BIG;
> +		goto bad;
> +	}
> +
> +	hash_position = v->hash_start;
> +	for (i = v->levels - 1; i >= 0; i--) {
> +		sector_t s;
> +		v->hash_level_block[i] = hash_position;
> +		s = verity_position_at_level(v, v->data_blocks, i);
> +		s = (s >> v->hash_per_block_bits) +
> +		    !!(s & ((1 << v->hash_per_block_bits) - 1));
> +		if (hash_position + s < hash_position) {
> +			ti->error = "Hash device offset overflow";
> +			r = -E2BIG;
> +			goto bad;
> +		}
> +		hash_position += s;
> +	}
> +	v->hash_blocks = hash_position;
> +
> +	v->bufio = dm_bufio_client_create(v->hash_dev->bdev,
> +		1 << v->hash_dev_block_bits, 1, sizeof(struct buffer_aux),
> +		dm_bufio_alloc_callback, NULL);
> +	if (IS_ERR(v->bufio)) {
> +		ti->error = "Cannot initialize dm-bufio";
> +		r = PTR_ERR(v->bufio);
> +		v->bufio = NULL;
> +		goto bad;
> +	}
> +
> +	if (dm_bufio_get_device_size(v->bufio) < v->hash_blocks) {
> +		ti->error = "Hash device is too small";
> +		r = -E2BIG;
> +		goto bad;
> +	}
> +
> +	v->io_mempool = mempool_create_kmalloc_pool(DM_VERITY_MEMPOOL_SIZE,
> +	  sizeof(struct dm_verity_io) + v->shash_descsize + v->digest_size * 2);
> +	if (!v->io_mempool) {
> +		ti->error = "Cannot allocate io mempool";
> +		r = -ENOMEM;
> +		goto bad;
> +	}
> +
> +	v->vec_mempool = mempool_create_kmalloc_pool(DM_VERITY_MEMPOOL_SIZE,
> +					BIO_MAX_PAGES * sizeof(struct bio_vec));
> +	if (!v->vec_mempool) {
> +		ti->error = "Cannot allocate vector mempool";
> +		r = -ENOMEM;
> +		goto bad;
> +	}
> +
> +	/*v->verify_wq = alloc_workqueue("verityd", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 1);*/
> +	/* WQ_UNBOUND greatly improves performance when running on ramdisk */
> +	v->verify_wq = alloc_workqueue("verityd", WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND, num_online_cpus());
> +	if (!v->verify_wq) {
> +		ti->error = "Cannot allocate workqueue";
> +		r = -ENOMEM;
> +		goto bad;
> +	}
> +
> +	return 0;
> +
> +bad:
> +	verity_dtr(ti);
> +	return r;
> +}
> +
> +static void verity_dtr(struct dm_target *ti)
> +{
> +	struct dm_verity *v = ti->private;
> +
> +	if (v->verify_wq)
> +		destroy_workqueue(v->verify_wq);
> +	if (v->vec_mempool)
> +		mempool_destroy(v->vec_mempool);
> +	if (v->io_mempool)
> +		mempool_destroy(v->io_mempool);
> +	if (v->bufio)
> +		dm_bufio_client_destroy(v->bufio);
> +	kfree(v->salt);
> +	kfree(v->root_digest);
> +	if (v->tfm)
> +		crypto_free_shash(v->tfm);
> +	kfree(v->alg_name);
> +	if (v->hash_dev)
> +		dm_put_device(ti, v->hash_dev);
> +	if (v->data_dev)
> +		dm_put_device(ti, v->data_dev);
> +	kfree(v);
> +}
> +
> +static struct target_type verity_target = {
> +	.name		= "verity",
> +	.version	= {1, 0, 0},
> +	.module		= THIS_MODULE,
> +	.ctr		= verity_ctr,
> +	.dtr		= verity_dtr,
> +	.map		= verity_map,
> +	.status		= verity_status,
> +	.ioctl		= verity_ioctl,
> +	.merge		= verity_merge,
> +	.iterate_devices = verity_iterate_devices,
> +	.io_hints	= verity_io_hints,
> +};
> +
> +static int __init dm_verity_init(void)
> +{
> +	int r;
> +	r = dm_register_target(&verity_target);
> +	if (r < 0)
> +		DMERR("register failed %d", r);
> +	return r;
> +}
> +
> +static void __exit dm_verity_exit(void)
> +{
> +	dm_unregister_target(&verity_target);
> +}
> +
> +module_init(dm_verity_init);
> +module_exit(dm_verity_exit);
> +
> +MODULE_AUTHOR("Mikulas Patocka <mpatocka@...hat.com>");
> +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
> +MODULE_LICENSE("GPL");
> +
> Index: linux-3.3-rc6-fast/drivers/md/dm-bufio.c
> ===================================================================
> --- linux-3.3-rc6-fast.orig/drivers/md/dm-bufio.c	2012-03-12 22:43:23.000000000 +0100
> +++ linux-3.3-rc6-fast/drivers/md/dm-bufio.c	2012-03-13 15:41:02.000000000 +0100
> @@ -579,7 +579,7 @@ static void write_endio(struct bio *bio,
>  	struct dm_buffer *b = container_of(bio, struct dm_buffer, bio);
>  
>  	b->write_error = error;
> -	if (error) {
> +	if (unlikely(error)) {
>  		struct dm_bufio_client *c = b->c;
>  		(void)cmpxchg(&c->async_write_error, 0, error);
>  	}
> @@ -698,13 +698,20 @@ static void __wait_for_free_buffer(struc
>  	dm_bufio_lock(c);
>  }
>  
> +enum new_flag {
> +	NF_FRESH = 0,
> +	NF_READ = 1,
> +	NF_GET = 2,
> +	NF_PREFETCH = 3
> +};
> +
>  /*
>   * Allocate a new buffer. If the allocation is not possible, wait until
>   * some other thread frees a buffer.
>   *
>   * May drop the lock and regain it.
>   */
> -static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client *c)
> +static struct dm_buffer *__alloc_buffer_wait_no_callback(struct dm_bufio_client *c, enum new_flag nf)
>  {
>  	struct dm_buffer *b;
>  
> @@ -727,6 +734,9 @@ static struct dm_buffer *__alloc_buffer_
>  				return b;
>  		}
>  
> +		if (nf == NF_PREFETCH)
> +			return NULL;
> +
>  		if (!list_empty(&c->reserved_buffers)) {
>  			b = list_entry(c->reserved_buffers.next,
>  				       struct dm_buffer, lru_list);
> @@ -744,9 +754,12 @@ static struct dm_buffer *__alloc_buffer_
>  	}
>  }
>  
> -static struct dm_buffer *__alloc_buffer_wait(struct dm_bufio_client *c)
> +static struct dm_buffer *__alloc_buffer_wait(struct dm_bufio_client *c, enum new_flag nf)
>  {
> -	struct dm_buffer *b = __alloc_buffer_wait_no_callback(c);
> +	struct dm_buffer *b = __alloc_buffer_wait_no_callback(c, nf);
> +
> +	if (!b)
> +		return NULL;
>  
>  	if (c->alloc_callback)
>  		c->alloc_callback(b);
> @@ -866,15 +879,8 @@ static struct dm_buffer *__find(struct d
>   * Getting a buffer
>   *--------------------------------------------------------------*/
>  
> -enum new_flag {
> -	NF_FRESH = 0,
> -	NF_READ = 1,
> -	NF_GET = 2
> -};
> -
>  static struct dm_buffer *__bufio_new(struct dm_bufio_client *c, sector_t block,
> -				     enum new_flag nf, struct dm_buffer **bp,
> -				     int *need_submit)
> +				     enum new_flag nf, int *need_submit)
>  {
>  	struct dm_buffer *b, *new_b = NULL;
>  
> @@ -882,6 +888,19 @@ static struct dm_buffer *__bufio_new(str
>  
>  	b = __find(c, block);
>  	if (b) {
> +found_buffer:
> +		if (nf == NF_PREFETCH)
> +			return NULL;
> +		/*
> +		 * Note: it is essential that we don't wait for the buffer to be
> +		 * read if dm_bufio_get function is used. Both dm_bufio_get and
> +		 * dm_bufio_prefetch can be used in the driver request routine.
> +		 * If the user called both dm_bufio_prefetch and dm_bufio_get on
> +		 * the same buffer, it would deadlock if we waited.
> +		 */
> +		if (nf == NF_GET && unlikely(test_bit(B_READING, &b->state)))
> +			return NULL;
> +
>  		b->hold_count++;
>  		__relink_lru(b, test_bit(B_DIRTY, &b->state) ||
>  			     test_bit(B_WRITING, &b->state));
> @@ -891,7 +910,9 @@ static struct dm_buffer *__bufio_new(str
>  	if (nf == NF_GET)
>  		return NULL;
>  
> -	new_b = __alloc_buffer_wait(c);
> +	new_b = __alloc_buffer_wait(c, nf);
> +	if (!new_b)
> +		return NULL;
>  
>  	/*
>  	 * We've had a period where the mutex was unlocked, so need to
> @@ -900,10 +921,7 @@ static struct dm_buffer *__bufio_new(str
>  	b = __find(c, block);
>  	if (b) {
>  		__free_buffer_wake(new_b);
> -		b->hold_count++;
> -		__relink_lru(b, test_bit(B_DIRTY, &b->state) ||
> -			     test_bit(B_WRITING, &b->state));
> -		return b;
> +		goto found_buffer;
>  	}
>  
>  	__check_watermark(c);
> @@ -957,7 +975,7 @@ static void *new_read(struct dm_bufio_cl
>  	struct dm_buffer *b;
>  
>  	dm_bufio_lock(c);
> -	b = __bufio_new(c, block, nf, bp, &need_submit);
> +	b = __bufio_new(c, block, nf, &need_submit);
>  	dm_bufio_unlock(c);
>  
>  	if (!b || IS_ERR(b))
> @@ -1006,13 +1024,46 @@ void *dm_bufio_new(struct dm_bufio_clien
>  }
>  EXPORT_SYMBOL_GPL(dm_bufio_new);
>  
> +void dm_bufio_prefetch(struct dm_bufio_client *c,
> +		       sector_t block, unsigned n_blocks)
> +{
> +	struct blk_plug plug;
> +
> +	blk_start_plug(&plug);
> +	dm_bufio_lock(c);
> +
> +	for (; n_blocks--; block++) {
> +		int need_submit;
> +		struct dm_buffer *b;
> +		b = __bufio_new(c, block, NF_PREFETCH, &need_submit);
> +		if (unlikely(b != NULL)) {
> +			dm_bufio_unlock(c);
> +
> +			if (need_submit)
> +				submit_io(b, READ, b->block, read_endio);
> +			dm_bufio_release(b);
> +
> +			dm_bufio_cond_resched();
> +
> +			if (!n_blocks)
> +				goto flush_plug;
> +			dm_bufio_lock(c);
> +		}
> +
> +	}
> +
> +	dm_bufio_unlock(c);
> +flush_plug:
> +	blk_finish_plug(&plug);
> +}
> +EXPORT_SYMBOL_GPL(dm_bufio_prefetch);
> +
>  void dm_bufio_release(struct dm_buffer *b)
>  {
>  	struct dm_bufio_client *c = b->c;
>  
>  	dm_bufio_lock(c);
>  
> -	BUG_ON(test_bit(B_READING, &b->state));
>  	BUG_ON(!b->hold_count);
>  
>  	b->hold_count--;
> @@ -1025,6 +1076,7 @@ void dm_bufio_release(struct dm_buffer *
>  		 * invalid buffer.
>  		 */
>  		if ((b->read_error || b->write_error) &&
> +		    !test_bit(B_READING, &b->state) &&
>  		    !test_bit(B_WRITING, &b->state) &&
>  		    !test_bit(B_DIRTY, &b->state)) {
>  			__unlink_buffer(b);
> @@ -1042,6 +1094,8 @@ void dm_bufio_mark_buffer_dirty(struct d
>  
>  	dm_bufio_lock(c);
>  
> +	BUG_ON(test_bit(B_READING, &b->state));
> +
>  	if (!test_and_set_bit(B_DIRTY, &b->state))
>  		__relink_lru(b, LIST_DIRTY);
>  
> Index: linux-3.3-rc6-fast/drivers/md/dm-bufio.h
> ===================================================================
> --- linux-3.3-rc6-fast.orig/drivers/md/dm-bufio.h	2012-03-12 22:43:23.000000000 +0100
> +++ linux-3.3-rc6-fast/drivers/md/dm-bufio.h	2012-03-12 22:43:25.000000000 +0100
> @@ -63,6 +63,14 @@ void *dm_bufio_new(struct dm_bufio_clien
>  		   struct dm_buffer **bp);
>  
>  /*
> + * Prefetch the specified blocks to the cache.
> + * The function starts to read the blocks and returns without waiting for
> + * I/O to finish.
> + */
> +void dm_bufio_prefetch(struct dm_bufio_client *c,
> +		       sector_t block, unsigned n_blocks);
> +
> +/*
>   * Release a reference obtained with dm_bufio_{read,get,new}. The data
>   * pointer and dm_buffer pointer is no longer valid after this call.
>   */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ