[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080613032006.GC12892@skywalker>
Date:	Fri, 13 Jun 2008 08:50:06 +0530
From:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
To:	Mingming Cao <cmm@...ibm.com>
Cc:	Jan Kara <jack@...e.cz>, linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: ext4_page_mkwrite and delalloc
On Thu, Jun 12, 2008 at 02:00:46PM -0700, Mingming Cao wrote:
> On Thu, 2008-06-12 at 23:44 +0530, Aneesh Kumar K.V wrote:
> > Hi,
> > 
> > With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> > with delalloc is to delay the block allocation and make sure we allocate
> > chunks of blocks together at writepages. So i guess we should update
> > ext4_page_mkwrite to use write_begin and write_end instead of writepage.
> 
> I agree with delayed allocation page_mkwrite is much simplier, just to
> block reservation to prevent ENOSPC
> 
> > Taking i_alloc_sem should protect against parallel truncate and the page
> > lock should protect against parallel write_begin/write_end.
> > 
> > How about the patch below ?
> > 
> 
> Do we plan to support page_mkwrite for non delalloc? the following patch
> seems suggesting that we only do page_mkwrite with delalloc?
Yes it is needed for non delalloc also. The primary requirement is for
lock inversion patches. With lock inversion patches we don't do
block allocation in writepage
> 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index cac132b..7f162cc 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
> >  	return err;
> >  }
> > 
> > -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> > -{
> > -	if (!buffer_mapped(bh)) {
> > -		/*
> > -		 * Mark buffer as dirty so that
> > -		 * block_write_full_page() writes it
> > -		 */
> > -		set_buffer_dirty(bh);
> > -	}
> > -	return 0;
> > -}
> > -
> >  static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
> >  {
> >  	return !buffer_mapped(bh);
> > @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
> >  		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
> >  				       ext4_bh_unmapped))
> >  			goto out_unlock;
> > -		/*
> > -		 * Now mark all the  buffer head dirty so
> > -		 * that writepage can write it
> > -		 */
> > -		walk_page_buffers(NULL, page_buffers(page), 0, len,
> > -					NULL, ext4_bh_prepare_fill);
> >  	}
> >  	/*
> > -	 * OK, we need to fill the hole... Lock the page and do writepage.
> > -	 * We can't do write_begin and write_end here because we don't
> > -	 * have inode_mutex and that allow parallel write_begin, write_end call.
> > +	 * OK, we need to fill the hole... Lock the page and do write_begin
> > +	 * write_end. We are not holding inode.i__mutex here. That allow
> > +	 * parallel write_begin, write_end call.
> >  	 * (lock_page prevent this from happening on the same page though)
> >  	 */
> > -	lock_page(page);
> > -	wbc.range_start = page_offset(page);
> > -	wbc.range_end = page_offset(page) + len;
> > -	ret = mapping->a_ops->writepage(page, &wbc);
> > -	/* writepage unlocks the page */
> > +	ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> > +			len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
> 
> What is this AOP_FLAG_UNINTERRUPTIBLE flag ? Also shouldn't we test
> delalloc is enabled?
> 
Since we are not doing any real copy here I guess we can say that
we don't do short write. The flag means that.
#define AOP_FLAG_UNINTERRUPTIBLE        0x0001 /* will not do a short write */
> > +	if (ret < 0)
> > +		goto out_unlock;
> > +	ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> > +			len, len, page, NULL);
> 
> I am still puzzled why we need to mark the page dirty in write_end here.
> Thought only do block reservation in write_begin is enough, we haven't
> write anything yet...
The reason is to get the ordered and journaled mode behavior correct.
We need ensure that the meta-data that got allocated in the write_begin
get commited in the right order. We need add the buffer_heads
corresponding to the data (page) to the right list in the journal.
write_end mostly does that.
-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists
 
