lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100429235102.GC15607@tux1.beaverton.ibm.com>
Date:	Thu, 29 Apr 2010 16:51:02 -0700
From:	"Darrick J. Wong" <djwong@...ibm.com>
To:	"Theodore Ts'o" <tytso@....edu>
Cc:	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Keith Mannthey <kmannth@...ibm.com>,
	Mingming Cao <mcao@...ibm.com>
Subject: [RFC] ext4: Don't send extra barrier during fsync if there are no
	dirty pages.

Hmm.  A while ago I was complaining that an evil program that calls fsync() in
a loop will send a continuous stream of write barriers to the hard disk.  Ted
theorized that it might be possible to set a flag in ext4_writepage and clear
it in ext4_sync_file; if we happen to enter ext4_sync_file and the flag isn't
set (meaning that nothing has been dirtied since the last fsync()) then we
could skip issuing the barrier.

Here's an experimental patch to do something sort of like that.  From a quick
run with blktrace, it seems to skip the redundant barriers and improves the ffsb
mail server scores.  However, I haven't done extensive power failure testing to
see how much data it can destroy.  For that matter I'm not even 100% sure it's
correct at what it aims to do.

Just throwing this out there, though.  Nothing's blown up ... yet. :P
---
Signed-off-by: Darrick J. Wong <djwong@...ibm.com>
---

 fs/ext4/ext4.h  |    2 ++
 fs/ext4/fsync.c |    7 +++++--
 fs/ext4/inode.c |    5 +++++
 3 files changed, 12 insertions(+), 2 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index bf938cf..3b70195 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1025,6 +1025,8 @@ struct ext4_sb_info {
 
 	/* workqueue for dio unwritten */
 	struct workqueue_struct *dio_unwritten_wq;
+
+	atomic_t unflushed_writes;
 };
 
 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 0d0c323..441f872 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -52,7 +52,8 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
 {
 	struct inode *inode = dentry->d_inode;
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	journal_t *journal = sbi->s_journal;
 	int ret;
 	tid_t commit_tid;
 
@@ -102,7 +103,9 @@ int ext4_sync_file(struct file *file, struct dentry *dentry, int datasync)
 		    (journal->j_flags & JBD2_BARRIER))
 			blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
 		jbd2_log_wait_commit(journal, commit_tid);
-	} else if (journal->j_flags & JBD2_BARRIER)
+	} else if (journal->j_flags & JBD2_BARRIER && atomic_read(&sbi->unflushed_writes)) {
+		atomic_set(&sbi->unflushed_writes, 0);
 		blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
+	}
 	return ret;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5381802..e501abd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2718,6 +2718,7 @@ static int ext4_writepage(struct page *page,
 	unsigned int len;
 	struct buffer_head *page_bufs = NULL;
 	struct inode *inode = page->mapping->host;
+	struct ext4_sb_info *sbi = EXT4_SB(page->mapping->host->i_sb);
 
 	trace_ext4_writepage(inode, page);
 	size = i_size_read(inode);
@@ -2726,6 +2727,8 @@ static int ext4_writepage(struct page *page,
 	else
 		len = PAGE_CACHE_SIZE;
 
+	atomic_set(&sbi->unflushed_writes, 1);
+
 	if (page_has_buffers(page)) {
 		page_bufs = page_buffers(page);
 		if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
@@ -2872,6 +2875,8 @@ static int ext4_da_writepages(struct address_space *mapping,
 	if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
 		range_whole = 1;
 
+	atomic_set(&sbi->unflushed_writes, 1);
+
 	range_cyclic = wbc->range_cyclic;
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ