lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 03 May 2010 17:57:47 -0700
From:	Mingming Cao <cmm@...ibm.com>
To:	djwong@...ibm.com
Cc:	"Theodore Ts'o" <tytso@....edu>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Keith Mannthey <kmannth@...ibm.com>,
	Mingming Cao <mcao@...ibm.com>
Subject: Re: [RFC] ext4: Don't send extra barrier during fsync if there are
 no dirty pages.

On Thu, 2010-04-29 at 16:51 -0700, Darrick J. Wong wrote:
> Hmm.  A while ago I was complaining that an evil program that calls fsync() in
> a loop will send a continuous stream of write barriers to the hard disk.  Ted
> theorized that it might be possible to set a flag in ext4_writepage and clear
> it in ext4_sync_file; if we happen to enter ext4_sync_file and the flag isn't
> set (meaning that nothing has been dirtied since the last fsync()) then we
> could skip issuing the barrier.
> 
> Here's an experimental patch to do something sort of like that.  From a quick
> run with blktrace, it seems to skip the redundant barriers and improves the ffsb
> mail server scores.  However, I haven't done extensive power failure testing to
> see how much data it can destroy.  For that matter I'm not even 100% sure it's
> correct at what it aims to do.
> 
> Just throwing this out there, though.  Nothing's blown up ... yet. :P
> ---
> Signed-off-by: Darrick J. Wong <djwong@...ibm.com>
> ---
> 
>  fs/ext4/ext4.h  |    2 ++
>  fs/ext4/fsync.c |    7 +++++--
>  fs/ext4/inode.c |    5 +++++
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index bf938cf..3b70195 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1025,6 +1025,8 @@ struct ext4_sb_info {
> 
>  	/* workqueue for dio unwritten */
>  	struct workqueue_struct *dio_unwritten_wq;
> +
> +	atomic_t unflushed_writes;
>  };
> 

Just wondering is this per filesystem flag? Thought it is nicer to make
this per -inode flag, when there is no dirty data in fly for this inode
(instead of the whole fs), there is no need to call barrier in
ext4_sync_file(). 

Mingming
>  static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 0d0c323..441f872 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -52,7 +52,8 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
>  {
>  	struct inode *inode = dentry->d_inode;
>  	struct ext4_inode_info *ei = EXT4_I(inode);
> -	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
> +	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> +	journal_t *journal = sbi->s_journal;
>  	int ret;
>  	tid_t commit_tid;
...

> @@ -102,7 +103,9 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
>  		    (journal->j_flags & JBD2_BARRIER))
>  			blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
>  		jbd2_log_wait_commit(journal, commit_tid);
> -	} else if (journal->j_flags & JBD2_BARRIER)
> +	} else if (journal->j_flags & JBD2_BARRIER && atomic_read(&sbi->unflushed_writes)) {
> +		atomic_set(&sbi->unflushed_writes, 0);
>  		blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
> +	}
>  	return ret;
>  }
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5381802..e501abd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2718,6 +2718,7 @@ static int ext4_writepage(struct page *page,
>  	unsigned int len;
>  	struct buffer_head *page_bufs = NULL;
>  	struct inode *inode = page->mapping->host;
> +	struct ext4_sb_info *sbi = EXT4_SB(page->mapping->host->i_sb);
> 
>  	trace_ext4_writepage(inode, page);
>  	size = i_size_read(inode);
> @@ -2726,6 +2727,8 @@ static int ext4_writepage(struct page *page,
>  	else
>  		len = PAGE_CACHE_SIZE;
> 
> +	atomic_set(&sbi->unflushed_writes, 1);
> +
>  	if (page_has_buffers(page)) {
>  		page_bufs = page_buffers(page);
>  		if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
> @@ -2872,6 +2875,8 @@ static int ext4_da_writepages(struct address_space *mapping,
>  	if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
>  		range_whole = 1;
> 
> +	atomic_set(&sbi->unflushed_writes, 1);
> +
>  	range_cyclic = wbc->range_cyclic;
>  	if (wbc->range_cyclic) {
>  		index = mapping->writeback_index;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ