[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20081125083533.GS3186@webber.adilger.int>
Date: Tue, 25 Nov 2008 01:35:33 -0700
From: Andreas Dilger <adilger@....com>
To: Theodore Tso <tytso@....EDU>
Cc: Solofo.Ramangalahy@...l.net, linux-ext4@...r.kernel.org
Subject: Re: [RFC 0/2] ext4: zero uninitialized inode tables
On Nov 25, 2008 00:32 -0500, Theodore Ts'o wrote:
> I would recommend doing the first 32k of the inode table
> first, and once it completes, you can update inode_bg_unavaile so that
> an additional (32k / EXT4_INODE_SIZE(sb)) inodes are available.
I agree with everything Ted says, though I would zero the itable in
chunks of 64kB or even 128kB. Two reasons are because 64kB is the
maximum blocksize for the filesystem, and it doesn't make sense to zero
less than a whole block at once. Secondly, 64kB is more likely to
match with the internal track size of spinning disks, and 128kB is more
likely to match the erase block size of SSDs.
> In terms of how quickly the itable initializer should work, in between
> each block group, as we discussed on the call, the simplest thing for
> it do is to wait for some time period to go by (say, 5 seconds) before
> working on the next block group. The next, slightly more complicated
> scheme would be to set a "last ext4 operation time" field in
> EXT4_SB(sb) which is set any time the ext4 code paths are entered
That would be "s_wtime" already in the on-disk superblock. It wouldn't
kill us to update this occasionally in ext4, though not on disk all
the time.
> (basically, any function in ext4's inode operations, super operations
> or file operations). The itable initalizer would sample that time,
> and before starting to initialize the next block group where
> BG_ITABLE_ZERO is not set, it would check the last ext4 operation time
> field, and if there had been an ext4 operation in the last 5 seconds,
> it would sleep 5 seconds and check again.
Well, I'd say if it has slept 5s then it should submit a block regardless
of whether the filesystem was in use or not. Otherwise the itable may
never be zeroed out if the filesystem is always in use. Adding a rare
64kB write to disk is unlikely to hurt anything, and if people REALLY care
about it they can avoid formatting with "lazy_itable_init".
> This would prevent the itable initializer from running if the filesystem
> is in use, although it will not detect the case where there is a lot
> of mmap'ed I/O going on, but no other ext4 operations.
Wouldn't even mmap operations cause some ext4 methods to be called?
> In the long run, we would really want some kind of I/O activity
> indication from the block device elevator, but that would require
> changes to the core kernel, and the last ext4 operation time is almost
> just as good.
Alternately we could check the journal tid?
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists