[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F458D99.30802@redhat.com>
Date: Wed, 22 Feb 2012 18:51:37 -0600
From: Eric Sandeen <sandeen@...hat.com>
To: Lukas Czerner <lczerner@...hat.com>
CC: linux-ext4@...r.kernel.org, tytso@....edu, psusi@...ntu.com
Subject: Re: [PATCH 1/2] e2fsck: Discard only unused parts of inode table
On 2/21/12 10:02 AM, Lukas Czerner wrote:
> When calling e2fsck with '-E discard' option it might happen that
> valid inodes are discarded accidentally. This is because we just
> discard the part of inode table which lies past the highest used
> inode. This is terribly wrong (sorry!).
>
> This patch fixes it so only the free parts of an inode table
> is discarded, leaving used inodes intact. This was tested with highly
> fragmented inode tables with block size 4k and 1k.
>
> Signed-off-by: Lukas Czerner <lczerner@...hat.com>
> Reported-by: Phillip Susi <psusi@...ntu.com>
> ---
> e2fsck/pass5.c | 59 +++++++++++++++++++++++++++++++++++--------------------
> 1 files changed, 37 insertions(+), 22 deletions(-)
>
> diff --git a/e2fsck/pass5.c b/e2fsck/pass5.c
> index 1e836e3..9cc4a20 100644
> --- a/e2fsck/pass5.c
> +++ b/e2fsck/pass5.c
> @@ -94,6 +94,26 @@ static void e2fsck_discard_blocks(e2fsck_t ctx, io_manager manager,
> ctx->options &= ~E2F_OPT_DISCARD;
> }
I could really use a comment here. Is "start" the inode nr within
the group? Is the first inode in the group 0 or 1? The calling
function's loop starts at inode 1 ... but based on some testing
it looks like you call the first free inode in an empty group
"inode 0" so I'll assume that, but comments please.
> +static void e2fsck_discard_inodes(e2fsck_t ctx, int group,
> + int start, int count)
I'm not sure if ints are good here, but it's what the calling
function uses, I guess.
> +{
> + ext2_filsys fs = ctx->fs;
> + blk64_t blk, num;
> +
> + /*
> + * We can discard only blocks containing only unused
> + * inodes in the table.
> + */
> + blk = DIV_ROUND_UP(start,
> + EXT2_INODES_PER_BLOCK(fs->super));
> + count -= (blk * EXT2_INODES_PER_BLOCK(fs->super) - start);
> + blk += ext2fs_inode_table_loc(fs, group);
> + num = count / EXT2_INODES_PER_BLOCK(fs->super);
> +
> + if (num > 0)
> + e2fsck_discard_blocks(ctx, fs->io->manager, blk, num);
> +}
I can't make intuitive sense of that at all, and had to count on my fingers
and draw diagrams. Maybe it's just my old brain. :(
Let's say we have 4 inode blocks in this table, with
1 == used, 0 == unused, and we start counting inodes at 0.
|1111111100000000|0000000000000000|0000000000000000|0000000000000001|
012345678 <- start
so start = 8, count = 55 (note one inode used at the end)
In this case we should get:
blk = 1 (DIV_ROUND_UP(8, 16)) - ok, that's next block after the partial block
count = 55 - ((1 * 16) - 8) = 47
num = 47 / 16 = 2
Ok, so that works, assuming the first inode in the group is "0"
We can make a filesystem kind of like that, to test it:
# mkfs.ext4 -I 256 -b 4096 -N 256 -G 1 -g 256 /dev/loop0 1024
with 4 groups each having 64 inodes.
If I run a e2fsck -E discard on that fresh filesystem, with some debugging, I see:
Pass 5: Checking group summary information
got start 12 count 51; count reduced to 47 inodes, discarding 2 blocks starting at block 36
Ok, that is a little weird; on a fresh filesystem lost+found/ is 11, so the first free
inode is 12; but now we seem to have started counting at 1, not 0?
got start 0 count 63; count reduced to 63 inodes, discarding 3 blocks starting at block 291
got start 0 count 63; count reduced to 63 inodes, discarding 3 blocks starting at block 514
got start 0 count 63; count reduced to 63 inodes, discarding 3 blocks starting at block 803
Oh, now we're back to "the first inode in the group is 0" :(
And here it looks like we are not discarding all the free blocks in the completely-free tables;
the last group has:
Inode table at 803-806 (+35)
217 free blocks, 64 free inodes, 0 directories, 64 unused inodes
but we only discarded 803 through 805, for example, because we were only told to discard
63 inodes, not 64 as I'd have expected since 64 are free.
I can't really make heads or tails out of how we got here, the calling function is so
messy it's really hard for me to keep track of. I think perhaps a clean up series
for the calling function first, and then we can try to properly extend it...?
I think we need ultra-careful attention to which variables are 0-based and which
are 1-based, for starters.
For e2fsck_discard_inodes() I wonder if it could be made more intuitive by:
1) find first totally free block based on start
2) find last totally free block based on the last free inode (start+count-1)
3) find free block length, free_length = last_blk - first_blk + 1
4) e2fsck_discard_blocks(..., ..., first_blk, free_length);
i.e. get rid of the count -= ...
_assuming_ we start counting at 0:
/* round up to next block after a partially free one */
first_blk = DIV_ROUND_UP(start,
EXT2_INODES_PER_BLOCK(fs->super));
first_blk += ext2fs_inode_table_loc(fs, group);
last_free_inode = start + count - 1;
/* round down to block before a partially free one */
last_blk = last_free_inode / EXT2_INODES_PER_BLOCK(fs->super));
free_blks = last_blk - first_blk + 1;
if (free_blks > 0)
e2fsck_discard_blocks(ctx, fs->io->manager, first_blk, free_blks);
I probably have off-by-ones there too, but you get the idea...
-Eric
> +
> #define NO_BLK ((blk64_t) -1)
>
> static void print_bitmap_problem(e2fsck_t ctx, int problem,
> @@ -422,6 +442,7 @@ static void check_inode_bitmaps(e2fsck_t ctx)
> ext2_ino_t i;
> unsigned int free_inodes = 0;
> int group_free = 0;
> + int first_free = fs->super->s_inodes_per_group;
> int dirs_count = 0;
> int group = 0;
> unsigned int inodes = 0;
> @@ -497,6 +518,7 @@ redo_counts:
> * are 0, count the free inode,
> * skip the current block group.
> */
> + first_free = 0;
> inodes = fs->super->s_inodes_per_group - 1;
> group_free = inodes;
> free_inodes += inodes;
> @@ -561,37 +583,27 @@ redo_counts:
> ctx->options &= ~E2F_OPT_DISCARD;
>
> do_counts:
> + inodes++;
> if (bitmap) {
> if (ext2fs_test_inode_bitmap2(ctx->inode_dir_map, i))
> dirs_count++;
> + if (inodes - 1 > first_free) {
> + e2fsck_discard_inodes(ctx, group, first_free,
> + inodes - first_free - 1);
> + first_free = fs->super->s_inodes_per_group;
> + }
> } else if (!skip_group || csum_flag) {
> group_free++;
> free_inodes++;
> + if (first_free > inodes)
> + first_free = inodes;
> }
>
> - inodes++;
> if ((inodes == fs->super->s_inodes_per_group) ||
> (i == fs->super->s_inodes_count)) {
> -
> - free_array[group] = group_free;
> - dir_array[group] = dirs_count;
> -
> - /* Discard inode table */
> - if (ctx->options & E2F_OPT_DISCARD) {
> - blk64_t used_blks, blk, num;
> -
> - used_blks = DIV_ROUND_UP(
> - (EXT2_INODES_PER_GROUP(fs->super) -
> - group_free),
> - EXT2_INODES_PER_BLOCK(fs->super));
> -
> - blk = ext2fs_inode_table_loc(fs, group) +
> - used_blks;
> - num = fs->inode_blocks_per_group -
> - used_blks;
> - e2fsck_discard_blocks(ctx, manager, blk, num);
> - }
> -
> + if (inodes - 1 > first_free)
> + e2fsck_discard_inodes(ctx, group, first_free,
> + inodes - first_free - 1);
> /*
> * If discard zeroes data and the group inode table
> * was not zeroed yet, set itable as zeroed
> @@ -599,12 +611,15 @@ do_counts:
> if ((ctx->options & E2F_OPT_DISCARD) &&
> (io_channel_discard_zeroes_data(fs->io)) &&
> !(ext2fs_bg_flags_test(fs, group,
> - EXT2_BG_INODE_ZEROED))) {
> + EXT2_BG_INODE_ZEROED))) {
> ext2fs_bg_flags_set(fs, group,
> EXT2_BG_INODE_ZEROED);
> ext2fs_group_desc_csum_set(fs, group);
> }
>
> + first_free = fs->super->s_inodes_per_group;
> + free_array[group] = group_free;
> + dir_array[group] = dirs_count;
> group ++;
> inodes = 0;
> skip_group = 0;
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists