[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.1203152149190.20644@file.rdu.redhat.com>
Date: Fri, 16 Mar 2012 21:16:00 -0400 (EDT)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Will Drewry <wad@...omium.org>
cc: Mandeep Singh Baines <msb@...omium.org>,
linux-kernel@...r.kernel.org, dm-devel@...hat.com,
Alasdair G Kergon <agk@...hat.com>,
Elly Jones <ellyjones@...omium.org>,
Milan Broz <mbroz@...hat.com>,
Olof Johansson <olofj@...omium.org>,
Steffen Klassert <steffen.klassert@...unet.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] dm: remake of the verity target
Hi Will
On Wed, 14 Mar 2012, Will Drewry wrote:
> Hi Mikulas,
>
> This is a nice rewrite and takes advantage of your dm-bufio layer. I
> wish it'd existed (and or we wrote it :) in 2009 when we started this
> work! Some comments below:
>
> > ---
> > +static void verity_prefetch_io(struct dm_verity *v, struct dm_verity_io *io)
> > +{
> > + int i;
> > + for (i = v->levels - 2; i >= 0; i--) {
> > + sector_t hash_block_start;
> > + sector_t hash_block_end;
> > + verity_hash_at_level(v, io->block, i, &hash_block_start, NULL);
> > + verity_hash_at_level(v, io->block + io->n_blocks - 1, i, &hash_block_end, NULL);
> > + if (!i) {
> > + unsigned cluster = prefetch_cluster;
> > + /* barrier to stop GCC from re-reading prefetch_cluster again */
> > + barrier();
> > + cluster >>= v->data_dev_block_bits;
>
> Would:
> unsigned cluster = prefetch_cluster >> v->data_dev_block_bits;
> not have similar behavior without a barrier? (Yeah yeah I could
> compile and see, but I was curious if you already had.)
>
> Since the max iterations here is 61 in a worst-case, I don't think
> it's a big deal to barrier() each time, just thought I'd ask.
>
> > + if (unlikely(!cluster))
> > + goto no_prefetch_cluster;
> > + if (unlikely(cluster & (cluster - 1)))
> > + cluster = 1 << (fls(cluster) - 1);
> > +
> > + hash_block_start &= ~(sector_t)(cluster - 1);
> > + hash_block_end |= cluster - 1;
> > + if (unlikely(hash_block_end >= v->hash_blocks))
> > + hash_block_end = v->hash_blocks - 1;
> > + }
> > +no_prefetch_cluster:
> > + dm_bufio_prefetch(v->bufio, hash_block_start,
> > + hash_block_end - hash_block_start + 1);
The problem here is this. If you look at the code, you think that after
the clause "if (unlikely(!cluster)) goto no_prefetch_cluster;", cluster
can't be zero. But this assumption is wrong. The C compiler is allowed to
transform the above code into:
unsigned cluster;
if (!(prefetch_cluster >> v->data_dev_block_bits))
goto no_prefetch_cluster;
cluster = prefetch_cluster >> v->data_dev_block_bits;
if (unlikely(cluster & (cluster - 1)))
cluster = 1 << (fls(cluster) - 1);
I know it's suboptimal, but the C compiler is just allowed to perform this
transformation. Now, if you know that "prefetch_cluster" can change
asynchronously by another thread running simultaneously, the condition "if
(!(prefetch_cluster >> v->data_dev_block_bits))" is useless ---
prefetch_cluster may change just after this condition and we won't catch
the zero value. (if the cluster value is zero in the above code, it ends
up in hash_block_end being ORed with -1 and the prefetch goes wild over
the whole hash device).
That's why I put that "barrier()" there. It would be better to declare
"prefetch_cluster" as volatile, but the module param macros issue warnings
if the variable is volatile.
Or maybe I can change it this way:
"unsigned cluster = *(volatile unsigned *)&prefetch_cluster", it could be
better than the "barrier()".
> > + case STATUSTYPE_TABLE:
> > + DMEMIT("%u %s %s %llu %u %s ",
> > + 0,
> > + v->data_dev->name,
> > + v->hash_dev->name,
>
> I understand the new approach is to use major:minor instead of the
> device name. I don't care which, but I believe agk@ requested that.
All the device mappers report dm_dev->name in their status routine, so I
do it this way too.
> > +static int verity_ioctl(struct dm_target *ti, unsigned cmd,
> > + unsigned long arg)
> > +{
> > + struct dm_verity *v = ti->private;
> > + int r = 0;
> > +
> > + if (v->data_start ||
> > + ti->len != i_size_read(v->data_dev->bdev->bd_inode) >> SECTOR_SHIFT)
> > + r = scsi_verify_blk_ioctl(NULL, cmd);
> > +
>
> Is it worth supporting ioctl at all given these hoops? Nothing stops
> a privileged user from directly running the ioctl on the underlying
> device/devices, it's just very inconvenient :)
I don't know. The other dm targets attempt to pass-thru ioctls too.
You need ioctl pass-thru if you want to run it over a cd-rom because
the iso9660 filesystem needs to send an ioctl to find its superblock.
Other than that I don't know if other filesystems need ioctls.
> > + if (ti->len > i_size_read(v->data_dev->bdev->bd_inode) >> SECTOR_SHIFT) {
> > + ti->error = "Data device si too small";
>
> s/si/is
>
> Should this also check ti->start + ti->len to ensure it isn't reading
> off the end or do you just rely on the requests failing?
ti->start is the offset in the target table --- so it shouldn't be checked
here (for example, you can map a verity device having 1024 blocks to a
sector offset 1000000 in the table --- so ti->start == 1000000 and ti->len
== 1024 --- in this case, you have test that the underlying device has at
least 1024 blocks, but you shouldn't test it for 1000000 sectors ---
1000000 is offset in the table, not required device size.
But this reminds me that I had the size test wrong in verity_map ...
fixed.
> > +MODULE_AUTHOR("Mikulas Patocka <mpatocka@...hat.com>");
>
> As per linux/module.h, I'd welcome additional authors as per the
> lkml/patch lineage:
> MODULE_AUTHOR("Mandeep Baines <msb@...omium.org>");
> MODULE_AUTHOR("Will Drewry <wad@...omium.org>");
OK, I added you there.
> Regardless, I'll just be happy to see this functionality merge.
>
> > +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking");
> > +MODULE_LICENSE("GPL");
> > +
> > Index: linux-3.3-rc6-fast/drivers/md/dm-bufio.c
>
> This should be in a separate patch I think.
Yes, it is a separate patch.
> > b->hold_count++;
>
> Are these hold_counts safe on architectures with weak memory models?
> Should they be atomic_ts? I haven't looked at them in context, but
> based on what I see here they make me a bit nervous.
>
> Thanks for jumping in to the fray! None of my comments are blocking,
> so I believe the following is appropriate (if not
> s/Signed-off/Reviewed-by/).
>
> Signed-off-by: Will Drewry <wad@...omium.org>
>
> cheers!
> will
hold_count is read or changed only when we hold dm_bufio_client->lock, so
it doesn't have to be atomic.
Mikulas
Powered by blists - more mailing lists