linux-ext4 - Re: ext4_page_mkwrite and delalloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080616141141.GB31567@duck.suse.cz>
Date:	Mon, 16 Jun 2008 16:11:41 +0200
From:	Jan Kara <jack@...e.cz>
To:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Cc:	Mingming Cao <cmm@...ibm.com>,
	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: ext4_page_mkwrite and delalloc

  Hi Aneesh,

On Thu 12-06-08 23:44:07, Aneesh Kumar K.V wrote:
> With delalloc we should not do writepage in ext4_page_mkwrite. The idea
> with delalloc is to delay the block allocation and make sure we allocate
> chunks of blocks together at writepages. So i guess we should update
> ext4_page_mkwrite to use write_begin and write_end instead of writepage.
> Taking i_alloc_sem should protect against parallel truncate and the page
> lock should protect against parallel write_begin/write_end.
> 
> How about the patch below ?
  In principle the patch looks fine, I would only like to see two things
checked:
  1) Did you do some stress testing of the patch - combining mmapped writes
with ordinary writes to the same file and truncation so that we detect
possible bugs in locking / data corruption due to some bad locking. This
significantly changes when write_begin / write_end can be called in ext4
(i.e., it is now called without i_mutex - BTW: that is probably worth a
comment before these functions).
  2) How does this change influence CPU load for mmapped accesses - I worry
about write_begin / write_end path being significantly heavier than just
calling writepage. Probably just mmap a large file, write single byte
to every page and measure using oprofile whether accumulated time spent in
page_mkwrite didn't change to much.

  Thanks.
									Honza
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index cac132b..7f162cc 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3543,18 +3543,6 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
>  	return err;
>  }
>  
> -static int ext4_bh_prepare_fill(handle_t *handle, struct buffer_head *bh)
> -{
> -	if (!buffer_mapped(bh)) {
> -		/*
> -		 * Mark buffer as dirty so that
> -		 * block_write_full_page() writes it
> -		 */
> -		set_buffer_dirty(bh);
> -	}
> -	return 0;
> -}
> -
>  static int ext4_bh_unmapped(handle_t *handle, struct buffer_head *bh)
>  {
>  	return !buffer_mapped(bh);
> @@ -3596,24 +3584,22 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
>  		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
>  				       ext4_bh_unmapped))
>  			goto out_unlock;
> -		/*
> -		 * Now mark all the  buffer head dirty so
> -		 * that writepage can write it
> -		 */
> -		walk_page_buffers(NULL, page_buffers(page), 0, len,
> -					NULL, ext4_bh_prepare_fill);
>  	}
>  	/*
> -	 * OK, we need to fill the hole... Lock the page and do writepage.
> -	 * We can't do write_begin and write_end here because we don't
> -	 * have inode_mutex and that allow parallel write_begin, write_end call.
> +	 * OK, we need to fill the hole... Lock the page and do write_begin
> +	 * write_end. We are not holding inode.i__mutex here. That allow
> +	 * parallel write_begin, write_end call.
>  	 * (lock_page prevent this from happening on the same page though)
>  	 */
> -	lock_page(page);
> -	wbc.range_start = page_offset(page);
> -	wbc.range_end = page_offset(page) + len;
> -	ret = mapping->a_ops->writepage(page, &wbc);
> -	/* writepage unlocks the page */
> +	ret = mapping->a_ops->write_begin(file, mapping, page_offset(page),
> +			len, AOP_FLAG_UNINTERRUPTIBLE, &page, NULL);
> +	if (ret < 0)
> +		goto out_unlock;
> +	ret = mapping->a_ops->write_end(file, mapping, page_offset(page),
> +			len, len, page, NULL);
> +	if (ret < 0)
> +		goto out_unlock;
> +	ret = 0;
>  out_unlock:
>  	up_read(&inode->i_alloc_sem);
>  	return ret;
> 
> If we agree i will send an updated ext4_page_mkwrite.patch and other
> related patches that needed to be updated so that the patch queue apply
> cleanly. 
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html