[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080728161156.GB4545@skywalker>
Date: Mon, 28 Jul 2008 21:41:56 +0530
From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
To: Mingming Cao <cmm@...ibm.com>
Cc: tytso <tytso@....edu>, Shehjar Tikoo <shehjart@....unsw.edu.au>,
linux-ext4@...r.kernel.org
Subject: Re: [RFC]Ext4: journal credits reservation fixes for DIO,
fallocate and delalloc writepages
On Fri, Jul 25, 2008 at 12:26:42PM -0700, Mingming Cao wrote:
>
......
> > > */
> > > int ext4_ext_calc_credits_for_insert(struct inode *inode,
> > > struct ext4_ext_path *path)
> > > @@ -1930,9 +1931,6 @@ int ext4_ext_calc_credits_for_insert(str
> > > */
> > > needed += (depth * 2) + (depth * 2);
> > >
> > > - /* any allocation modifies superblock */
> > > - needed += 1;
> > > -
> >
> >
> > Why are we dropping the super block modification credit ? An insert of
> > an extent can result in super block modification also. If the goal is to
> > use ext4_writepages_trans_blocks everywhere then this change is correct.
> > But i see many place not using ext4_writepages_trans_blocks.
> >
> ext4_writepages_trans_blocks() will take care of the credits need for
> updating superblock, for just once.ext4_ext_calc_credits_for_insert()
> calculate the credits needed for inserting a single extent, You could
> see in ext4_writepages_trans_blocks(), it will multiple it will the
> total numberof blocks. If we account for superblock credit in
> ext4_ext_calc_credits_for_insert(), then super block updates for
> multiple extents allocation will be overaccounted, and have to remove
> that later in ext4_writepages_trans_blocks()
ext4_ext_calc_credit for insert was not doing it currently with
path != NULL. I attaching a patch which reflect some of the changes
I suggested.
>
> Other places calling ext4_ext_calc_credits_for_insert() (mostly
> migrate.c and defrag,c) are updated to add extra 4 credits, including
> superblock, inode block and quota blocks.
I guess it should not be +4 but should be +2 + 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
>
> > You also need to update the function comment saying that super block
> > update is not accounted.Also it doesn't account for block bitmap,
> > group descriptor and inode block update.
> >
>
> Credits for block bitmap and group descriptors are accounted in
> ext4_ext_calc_credits_for_insert()
>
> inode block update is only accounted once per writepages, so it's
> accounted in
> ext4_writepages_trans_blocks()/ext4_ext_writepages_trans_blocks()
>
....
> > > *
> > > @@ -1142,13 +1133,13 @@ int ext4_get_block(struct inode *inode,
> > > handle_t *handle = ext4_journal_current_handle();
> > > int ret = 0, started = 0;
> > > unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
> > > + int dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
> >
> > Again this should be
> > int dio_credits = ext4_writeblocks_trans(inode, DIO_MAX_BLOCKS);
> >
> No. DIO case is different than writepages(). When get_block() is called
> from DIO path, the get_block() is called in a loop, and the credit is
> reserved inside the loop. Each time get_block(),will return a single
> extent, so we should use EXT4_DATA_TRANS_BLOCKS(inode->i_sb) which is
> calculate a single chunk of allocation credits.
That is true only for extent format. With noextents we need something
like below
if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
dio_credits = ext4_writeblocks_trans_credits_old(inode, max_blocks);
else {
/*
* For extent format get_block return only one extent
*/
dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
}
>
> ext4_da_writepages() is different, we have to reserve the credits
> outside the loop of calling get_block(), since the underlying
> get_block() could be called multiple times, we need to worse case, so
> ext4_writepages_trans_blocks() is called to handling the worse case.
>
> > >
> > > if (create && !handle) {
> > > /* Direct IO write... */
> > > if (max_blocks > DIO_MAX_BLOCKS)
> > > max_blocks = DIO_MAX_BLOCKS;
> > > - handle = ext4_journal_start(inode, DIO_CREDITS +
> > > - 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb));
> > > + handle = ext4_journal_start(inode, dio_credits);
> > > if (IS_ERR(handle)) {
> > > ret = PTR_ERR(handle);
> > > goto out;
> > > @@ -1327,7 +1318,7 @@ static int ext4_write_begin(struct file
> > > struct page **pagep, void **fsdata)
> > > {
> > > struct inode *inode = mapping->host;
....
> >
> > > - int bpp = ext4_journal_blocks_per_page(inode);
> > > - int indirects = (EXT4_NDIR_BLOCKS % bpp) ? 5 : 3;
> > > + int indirects = (EXT4_NDIR_BLOCKS % num) ? 5 : 3;
> > > int ret;
> > >
> > > - if (EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL)
> > > - return ext4_ext_writepage_trans_blocks(inode, bpp);
> > > -
> > > if (ext4_should_journal_data(inode))
> > > - ret = 3 * (bpp + indirects) + 2;
> > > + ret = 3 * (num + indirects) + 2;
> > > else
> > > - ret = 2 * (bpp + indirects) + 2;
> > > + ret = 2 * (num + indirects) + 2;
> >
> >
> > With non journalled moded we should just decrease num. I guess the above
> > should be
> > if (ext4_should_journal_data(inode))
> > ret = 3 * (num + indirects) + 2;
> > else
> > ret = 3 * (num + indirects) + 2 - num;
> >
> >
> >
>
> Well, I think in the journalled mode we need to journal the content of
> the indirect/double indirect blocks also, no?
>
With non journalled mode we still need to journal the indirect, dindirect
and tindirect block so this should be
if (ext4_should_journal_data(inode))
ret = 3 * (num + indirects) + 2;
else
ret = 3 * (num + indirects) + 2 - num;
Attaching the patch for easy review
diff --git a/fs/ext4/defrag.c b/fs/ext4/defrag.c
index 7c819fd..46e9600 100644
--- a/fs/ext4/defrag.c
+++ b/fs/ext4/defrag.c
@@ -1385,8 +1385,9 @@ ext4_defrag_alloc_blocks(handle_t *handle, struct inode *org_inode,
struct buffer_head *bh = NULL;
int err, i, credits = 0;
- credits = ext4_ext_calc_credits_for_single_extent(dest_inode, dest_path)
- + 4;
+ credits = ext4_ext_calc_credits_for_single_extent(dest_inode,
+ dest_path) + 2 +
+ 2 * EXT4_QUOTA_TRANS_BLOCKS(dest_inode->i_sb);
err = ext4_ext_journal_restart(handle,
credits + EXT4_TRANS_META_BLOCKS);
if (err)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 77f4f94..969b1e6 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1898,6 +1898,9 @@ static int ext4_ext_rm_idx(handle_t *handle, struct inode *inode,
* for old tree and new tree, for every level we need to reserve
* credits to log the bitmap and block group descriptors
*
+ * This doesn't take into account credit needed for the update of
+ * super block + inode block + quota files
+ *
*/
int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
struct ext4_ext_path *path)
@@ -1909,7 +1912,8 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
depth = ext_depth(inode);
if (le16_to_cpu(path[depth].p_hdr->eh_entries)
< le16_to_cpu(path[depth].p_hdr->eh_max))
- return 1;
+ /* 1 group desc + 1 block bitmap for allocated blocks */
+ return 2;
}
/*
@@ -1919,7 +1923,7 @@ int ext4_ext_calc_credits_for_single_extent(struct inode *inode,
*/
depth = 5;
- /* allocation of new data block(s) */
+ /* 1 group desc + 1 block bitmap for allocated blocks */
needed = 2;
/*
@@ -2059,9 +2063,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
correct_index = 1;
credits += (ext_depth(inode)) + 1;
}
-#ifdef CONFIG_QUOTA
credits += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
-#endif
err = ext4_ext_journal_restart(handle, credits);
if (err)
@@ -3023,9 +3025,7 @@ int ext4_ext_writeblocks_trans_credits(struct inode *inode, int nrblocks)
else
needed = nrblocks * needed + 2;
-#ifdef CONFIG_QUOTA
needed += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
-#endif
return needed;
}
@@ -3092,7 +3092,7 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
/*
* credits to insert 1 extent into extent tree
*/
- credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+ credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
mutex_lock(&inode->i_mutex);
retry:
while (ret >= 0 && ret < max_blocks) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5a394c8..d2832a1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1126,18 +1126,29 @@ int ext4_get_blocks_wrap(handle_t *handle, struct inode *inode, sector_t block,
up_write((&EXT4_I(inode)->i_data_sem));
return retval;
}
+static int ext4_writeblocks_trans_credits_old(struct inode *, int);
int ext4_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
handle_t *handle = ext4_journal_current_handle();
int ret = 0, started = 0;
unsigned max_blocks = bh_result->b_size >> inode->i_blkbits;
- int dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+ int dio_credits;
if (create && !handle) {
/* Direct IO write... */
if (max_blocks > DIO_MAX_BLOCKS)
max_blocks = DIO_MAX_BLOCKS;
+ if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
+ dio_credits = ext4_writeblocks_trans_credits_old(inode,
+ max_blocks);
+ else {
+ /*
+ * For extent format get_block return only one extent
+ */
+ dio_credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb);
+ }
+
handle = ext4_journal_start(inode, dio_credits);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -4310,14 +4321,18 @@ static int ext4_writeblocks_trans_credits_old(struct inode *inode, int nrblocks)
if (ext4_should_journal_data(inode))
ret = 3 * (nrblocks + indirects) + 2;
- else
- ret = 2 * (nrblocks + indirects) + 2;
+ else {
+ /*
+ * We still need to journal update for the
+ * indirect, dindirect, and tindirect blocks
+ * only data blocks are not journalled
+ */
+ ret = 3 * (nrblocks + indirects) + 2 - nrblocks;
+ }
-#ifdef CONFIG_QUOTA
/* We know that structure was already allocated during DQUOT_INIT so
* we will be updating only the data blocks + inodes */
- ret += 2*EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
-#endif
+ ret += 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);
return ret;
}
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index 3456094..72488ab 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -55,7 +55,8 @@ static int finish_range(handle_t *handle, struct inode *inode,
*
* extra 4 credits for: 1 superblock, 1 inode block, 2 quotas
*/
- needed = ext4_ext_calc_credits_for_single_extent(inode, path) + 4;
+ needed = ext4_ext_calc_credits_for_single_extent(inode, path) + 2 +
+ 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb);;
/*
* Make sure the credit we accumalated is not really high
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists