[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1272934667.2544.3.camel@mingming-laptop>
Date: Mon, 03 May 2010 17:57:47 -0700
From: Mingming Cao <cmm@...ibm.com>
To: djwong@...ibm.com
Cc: "Theodore Ts'o" <tytso@....edu>,
linux-ext4 <linux-ext4@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Keith Mannthey <kmannth@...ibm.com>,
Mingming Cao <mcao@...ibm.com>
Subject: Re: [RFC] ext4: Don't send extra barrier during fsync if there are
no dirty pages.
On Thu, 2010-04-29 at 16:51 -0700, Darrick J. Wong wrote:
> Hmm. A while ago I was complaining that an evil program that calls fsync() in
> a loop will send a continuous stream of write barriers to the hard disk. Ted
> theorized that it might be possible to set a flag in ext4_writepage and clear
> it in ext4_sync_file; if we happen to enter ext4_sync_file and the flag isn't
> set (meaning that nothing has been dirtied since the last fsync()) then we
> could skip issuing the barrier.
>
> Here's an experimental patch to do something sort of like that. From a quick
> run with blktrace, it seems to skip the redundant barriers and improves the ffsb
> mail server scores. However, I haven't done extensive power failure testing to
> see how much data it can destroy. For that matter I'm not even 100% sure it's
> correct at what it aims to do.
>
> Just throwing this out there, though. Nothing's blown up ... yet. :P
> ---
> Signed-off-by: Darrick J. Wong <djwong@...ibm.com>
> ---
>
> fs/ext4/ext4.h | 2 ++
> fs/ext4/fsync.c | 7 +++++--
> fs/ext4/inode.c | 5 +++++
> 3 files changed, 12 insertions(+), 2 deletions(-)
>
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index bf938cf..3b70195 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1025,6 +1025,8 @@ struct ext4_sb_info {
>
> /* workqueue for dio unwritten */
> struct workqueue_struct *dio_unwritten_wq;
> +
> + atomic_t unflushed_writes;
> };
>
Just wondering is this per filesystem flag? Thought it is nicer to make
this per -inode flag, when there is no dirty data in fly for this inode
(instead of the whole fs), there is no need to call barrier in
ext4_sync_file().
Mingming
> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
> index 0d0c323..441f872 100644
> --- a/fs/ext4/fsync.c
> +++ b/fs/ext4/fsync.c
> @@ -52,7 +52,8 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
> {
> struct inode *inode = dentry->d_inode;
> struct ext4_inode_info *ei = EXT4_I(inode);
> - journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + journal_t *journal = sbi->s_journal;
> int ret;
> tid_t commit_tid;
...
> @@ -102,7 +103,9 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
> (journal->j_flags & JBD2_BARRIER))
> blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
> jbd2_log_wait_commit(journal, commit_tid);
> - } else if (journal->j_flags & JBD2_BARRIER)
> + } else if (journal->j_flags & JBD2_BARRIER && atomic_read(&sbi->unflushed_writes)) {
> + atomic_set(&sbi->unflushed_writes, 0);
> blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
> + }
> return ret;
> }
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5381802..e501abd 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2718,6 +2718,7 @@ static int ext4_writepage(struct page *page,
> unsigned int len;
> struct buffer_head *page_bufs = NULL;
> struct inode *inode = page->mapping->host;
> + struct ext4_sb_info *sbi = EXT4_SB(page->mapping->host->i_sb);
>
> trace_ext4_writepage(inode, page);
> size = i_size_read(inode);
> @@ -2726,6 +2727,8 @@ static int ext4_writepage(struct page *page,
> else
> len = PAGE_CACHE_SIZE;
>
> + atomic_set(&sbi->unflushed_writes, 1);
> +
> if (page_has_buffers(page)) {
> page_bufs = page_buffers(page);
> if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
> @@ -2872,6 +2875,8 @@ static int ext4_da_writepages(struct address_space *mapping,
> if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
> range_whole = 1;
>
> + atomic_set(&sbi->unflushed_writes, 1);
> +
> range_cyclic = wbc->range_cyclic;
> if (wbc->range_cyclic) {
> index = mapping->writeback_index;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists