lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100629205102.GM15515@tux1.beaverton.ibm.com>
Date:	Tue, 29 Jun 2010 13:51:02 -0700
From:	"Darrick J. Wong" <djwong@...ibm.com>
To:	Mingming Cao <cmm@...ibm.com>
Cc:	Ric Wheeler <rwheeler@...hat.com>, "Theodore Ts'o" <tytso@....edu>,
	linux-ext4 <linux-ext4@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Keith Mannthey <kmannth@...ibm.com>,
	Mingming Cao <mcao@...ibm.com>
Subject: [RFC v2] ext4: Don't send extra barrier during fsync if there are
	no dirty pages.

Hmm.  A while ago I was complaining that an evil program that calls fsync() in
a loop will send a continuous stream of write barriers to the hard disk.  Ted
theorized that it might be possible to set a flag in ext4_writepage and clear
it in ext4_sync_file; if we happen to enter ext4_sync_file and the flag isn't
set (meaning that nothing has been dirtied since the last fsync()) then we
could skip issuing the barrier.

Here's an experimental patch to do something sort of like that.  From a quick
run with blktrace, it seems to skip the redundant barriers and improves the ffsb
mail server scores.  However, I haven't done extensive power failure testing to
see how much data it can destroy.  For that matter I'm not even 100% sure it's
correct at what it aims to do.

This second version of the patch uses the inode state flags and (suboptimally)
also catches directio writes.  It might be a better idea to try to coordinate
all the barrier requests across the whole filesystem, though that's a bit more
difficult.

Signed-off-by: Darrick J. Wong <djwong@...ibm.com>
---

 fs/ext4/ext4.h  |    1 +
 fs/ext4/fsync.c |    5 ++++-
 fs/ext4/inode.c |    7 +++++++
 3 files changed, 12 insertions(+), 1 deletions(-)


diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 19a4de5..d2e8e40 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1181,6 +1181,7 @@ enum {
 	EXT4_STATE_EXT_MIGRATE,		/* Inode is migrating */
 	EXT4_STATE_DIO_UNWRITTEN,	/* need convert on dio done*/
 	EXT4_STATE_NEWENTRY,		/* File just added to dir */
+	EXT4_STATE_DIRTY_DATA,		/* dirty data, need barrier */
 };
 
 #define EXT4_INODE_BIT_FNS(name, field)					\
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 592adf2..96625c3 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -130,8 +130,11 @@ int ext4_sync_file(struct file *file, int datasync)
 			blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL,
 					NULL, BLKDEV_IFL_WAIT);
 		ret = jbd2_log_wait_commit(journal, commit_tid);
-	} else if (journal->j_flags & JBD2_BARRIER)
+	} else if (journal->j_flags & JBD2_BARRIER &&
+		   ext4_test_inode_state(inode, EXT4_STATE_DIRTY_DATA)) {
 		blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL,
 			BLKDEV_IFL_WAIT);
+		ext4_clear_inode_state(inode, EXT4_STATE_DIRTY_DATA);
+	}
 	return ret;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 42272d6..486d349 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2685,6 +2685,8 @@ static int ext4_writepage(struct page *page,
 	else
 		len = PAGE_CACHE_SIZE;
 
+	ext4_set_inode_state(inode, EXT4_STATE_DIRTY_DATA);
+
 	if (page_has_buffers(page)) {
 		page_bufs = page_buffers(page);
 		if (walk_page_buffers(NULL, page_bufs, 0, len, NULL,
@@ -2948,6 +2950,8 @@ static int ext4_da_writepages(struct address_space *mapping,
 	if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
 		range_whole = 1;
 
+	ext4_set_inode_state(inode, EXT4_STATE_DIRTY_DATA);
+
 	range_cyclic = wbc->range_cyclic;
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index;
@@ -3996,6 +4000,9 @@ static ssize_t ext4_direct_IO(int rw, struct kiocb *iocb,
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
 
+	if (rw == WRITE)
+		ext4_set_inode_state(inode, EXT4_STATE_DIRTY_DATA);
+
 	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
 		return ext4_ext_direct_IO(rw, iocb, iov, offset, nr_segs);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ