[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090916204225.GB84213@freezingfog.local>
Date: Wed, 16 Sep 2009 15:42:25 -0500
From: Will Drewry <redpig@...aspill.org>
To: Andreas Dilger <adilger@....com>
Cc: linux-ext4@...r.kernel.org
Subject: Re: [PATCH][RFC] resize2fs and uninit_bg questions
On Wed, Sep 16, 2009 at 01:08:31PM -0600, Andreas Dilger wrote:
> On Sep 16, 2009 11:24 -0500, Will Drewry wrote:
> > I have a two questions with an accompanying patch for clarification.
> >
> > resize2fs is uninit_bg aware, but when it is expanding an ext4
> > filesystem, it will always zero the inode tables. Is it safe to mimick
> > mke2fs's write_inode_table(.., lazy_flag=1) and leave the new block
> > groups' inode tables marked INODE_UNINIT, BLOCK_UNINIT and _not_ zero
> > out the inode table if uninit_bg is supported?
> >
> > If it is okay, then it means offline resizing upwards can be just as
> > fast as mke2fs. I've attached a patch which is probably incomplete.
> > I'd love feedback as to the feasibility of the change and/or patch
> > quality.
> >
> > As a follow-on, would it be sane to add support like this for
> > online resizing. From a cursory investigation, it looks like
> > setup_new_block_groups() could be modified to not zero itables
> > if uninit_bg is supported, and INODE_ZEROED could be replaced
> > with ΒG_*_UNINIT. However, I'm not sure if that is a naive
> > view. I'm happy to send along a patch illustrating this change
> > if that'd be helpful or welcome.
>
> The question is why you would want to risk disk corruption vs.
> the (likely not performance critical) online resize?
I'm interested in it for a few reasons:
1. it undermines the use of uninit_bg on large filesystems as
e2fsck -f will go back to normal speed, even without those block
groups being 'used'. In my local example, it goes from 14s to 2m24s.
2. it will spread the I/O cost out over time. Online resizing often
means that you don't want to/can't unmount the fs. However, a
large filesystem increase might result in gigabytes of 0s being
written to the backing store which could result in I/O throttling
that makes doing it online less useful. It'd be nice to be able to
optionally amortize that cost as is done if the fs had been mke2fs -O
lazy_itable_init=1 at full size initially.
3. dm/sparse-file-backed ext4 filesystems end up having the file size
expanded quite early as init'ing the itables force extra non-sparse
bytes to be written. Otherwise, ext4+uninit_bg is a really nice fs
type for this purpose.
Would it seriously raise the risk of corruption if uninit_bg is already
in use? Alternately, would a patch to that effect stand a chance of ever
making it upstream?
> > Any and all feedback is appreciated -- even if it just for me
> > to look at the archives/link/etc.
> >
> > diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> > index 1a5d910..9fcc3b9 100644
> > --- a/resize/resize2fs.c
> > +++ b/resize/resize2fs.c
> > @@ -497,8 +497,7 @@ retry:
> >
> > fs->group_desc[i].bg_flags = 0;
> > if (csum_flag)
> > - fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT |
> > - EXT2_BG_INODE_ZEROED;
> > + fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT;
>
> This shouldn't be unconditional, since most users will want the
> safety of having zeroed inode tables.
I've attached a version with it being flagged by a "-l" for lazy.
Thanks for the quick response!
will
Signed-off-by: Will Drewry <redpig@...aspill.org>
---
resize/main.c | 7 +++++--
resize/online.c | 2 +-
resize/resize2fs.8.in | 3 +++
resize/resize2fs.c | 41 +++++++++++++++++++++++++++++------------
resize/resize2fs.h | 5 ++++-
5 files changed, 42 insertions(+), 16 deletions(-)
diff --git a/resize/main.c b/resize/main.c
index 220c192..f04a939 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -39,7 +39,7 @@ char *program_name, *device_name, *io_options;
static void usage (char *prog)
{
- fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-M] [-P] "
+ fprintf (stderr, _("Usage: %s [-d debug_flags] [-f] [-F] [-l] [-M] [-P] "
"[-p] device [new_size]\n\n"), prog);
exit (1);
@@ -189,7 +189,7 @@ int main (int argc, char ** argv)
if (argc && *argv)
program_name = *argv;
- while ((c = getopt (argc, argv, "d:fFhMPpS:")) != EOF) {
+ while ((c = getopt (argc, argv, "d:fFhlMPpS:")) != EOF) {
switch (c) {
case 'h':
usage(program_name);
@@ -209,6 +209,9 @@ int main (int argc, char ** argv)
case 'd':
flags |= atoi(optarg);
break;
+ case 'l':
+ flags |= RESIZE_LAZY_INIT;
+ break;
case 'p':
flags |= RESIZE_PERCENT_COMPLETE;
break;
diff --git a/resize/online.c b/resize/online.c
index 4bc5451..f0b0569 100644
--- a/resize/online.c
+++ b/resize/online.c
@@ -104,7 +104,7 @@ errcode_t online_resize_fs(ext2_filsys fs, const char *mtpt,
* but at least it allows on-line resizing to function.
*/
new_fs->super->s_feature_incompat &= ~EXT4_FEATURE_INCOMPAT_FLEX_BG;
- retval = adjust_fs_info(new_fs, fs, 0, *new_size);
+ retval = adjust_fs_info(new_fs, fs, 0, *new_size, flags);
if (retval)
return retval;
diff --git a/resize/resize2fs.8.in b/resize/resize2fs.8.in
index 3ea7a63..020393e 100644
--- a/resize/resize2fs.8.in
+++ b/resize/resize2fs.8.in
@@ -102,6 +102,9 @@ really useful for doing
.B resize2fs
time trials.
.TP
+.B \-l
+Lazily initialize new inode tables if supported (uninit_bg).
+.TP
.B \-M
Shrink the filesystem to the minimum size.
.TP
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 1a5d910..fee2e3f 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -41,7 +41,8 @@
#endif
static void fix_uninit_block_bitmaps(ext2_filsys fs);
-static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size);
+static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size,
+ int flags);
static errcode_t blocks_to_move(ext2_resize_t rfs);
static errcode_t block_mover(ext2_resize_t rfs);
static errcode_t inode_scan_and_fix(ext2_resize_t rfs);
@@ -106,7 +107,7 @@ errcode_t resize_fs(ext2_filsys fs, blk_t *new_size, int flags,
if (retval)
goto errout;
- retval = adjust_superblock(rfs, *new_size);
+ retval = adjust_superblock(rfs, *new_size, flags);
if (retval)
goto errout;
@@ -292,7 +293,7 @@ static void free_gdp_blocks(ext2_filsys fs,
* filesystem.
*/
errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs,
- ext2fs_block_bitmap reserve_blocks, blk_t new_size)
+ ext2fs_block_bitmap reserve_blocks, blk_t new_size, int flags)
{
errcode_t retval;
int overhead = 0;
@@ -496,9 +497,11 @@ retry:
adjblocks = 0;
fs->group_desc[i].bg_flags = 0;
- if (csum_flag)
- fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT |
- EXT2_BG_INODE_ZEROED;
+ if (csum_flag) {
+ fs->group_desc[i].bg_flags |= EXT2_BG_INODE_UNINIT;
+ if (!(flags & RESIZE_LAZY_INIT))
+ fs->group_desc[i].bg_flags |= EXT2_BG_INODE_ZEROED;
+ }
if (i == fs->group_desc_count-1) {
numblocks = (fs->super->s_blocks_count -
fs->super->s_first_data_block) %
@@ -565,10 +568,10 @@ errout:
* This routine adjusts the superblock and other data structures, both
* in disk as well as in memory...
*/
-static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
+static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size, int flags)
{
ext2_filsys fs;
- int adj = 0;
+ int adj = 0, csum_flag = 0, num = 0;
errcode_t retval;
blk_t group_block;
unsigned long i;
@@ -584,7 +587,7 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
if (retval)
return retval;
- retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size);
+ retval = adjust_fs_info(fs, rfs->old_fs, rfs->reserve_blocks, new_size, flags);
if (retval)
goto errout;
@@ -624,6 +627,9 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
&rfs->itable_buf);
if (retval)
goto errout;
+ /* Track if we can get by with a lazy init */
+ csum_flag = EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM);
memset(rfs->itable_buf, 0, fs->blocksize * fs->inode_blocks_per_group);
group_block = fs->super->s_first_data_block +
@@ -642,10 +648,21 @@ static errcode_t adjust_superblock(ext2_resize_t rfs, blk_t new_size)
/*
* Write out the new inode table
*/
+ if (csum_flag && (flags & RESIZE_LAZY_INIT)) {
+ /* These are _new_ inode tables. No inodes should be in use. */
+ fs->group_desc[i].bg_itable_unused = fs->super->s_inodes_per_group;
+ num = ((((fs->super->s_inodes_per_group -
+ fs->group_desc[i].bg_itable_unused) *
+ EXT2_INODE_SIZE(fs->super)) +
+ EXT2_BLOCK_SIZE(fs->super) - 1) /
+ EXT2_BLOCK_SIZE(fs->super));
+ } else {
+ num = fs->inode_blocks_per_group;
+ }
retval = io_channel_write_blk(fs->io,
- fs->group_desc[i].bg_inode_table,
- fs->inode_blocks_per_group,
- rfs->itable_buf);
+ fs->group_desc[i].bg_inode_table, /* blk */
+ num, /* count */
+ rfs->itable_buf); /* contents */
if (retval) goto errout;
io_channel_flush(fs->io);
diff --git a/resize/resize2fs.h b/resize/resize2fs.h
index fab7290..d4c862c 100644
--- a/resize/resize2fs.h
+++ b/resize/resize2fs.h
@@ -77,6 +77,8 @@ typedef struct ext2_sim_progress *ext2_sim_progmeter;
#define RESIZE_DEBUG_INODEMAP 0x0004
#define RESIZE_DEBUG_ITABLEMOVE 0x0008
+#define RESIZE_LAZY_INIT 0x0010
+
#define RESIZE_PERCENT_COMPLETE 0x0100
#define RESIZE_VERBOSE 0x0200
@@ -129,7 +131,8 @@ extern errcode_t resize_fs(ext2_filsys fs, blk_t *new_size, int flags,
extern errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs,
ext2fs_block_bitmap reserve_blocks,
- blk_t new_size);
+ blk_t new_size,
+ int flags);
extern blk_t calculate_minimum_resize_size(ext2_filsys fs);
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists