lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081009030054.GD17512@mit.edu>
Date:	Wed, 8 Oct 2008 23:00:54 -0400
From:	Theodore Tso <tytso@....edu>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Arjan van de Ven <arjan@...radead.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel@...r.kernel.org, Alan Cox <alan@...rguk.ukuu.org.uk>,
	linux-ext4@...r.kernel.org
Subject: Re: [PATCH] Give kjournald a IOPRIO_CLASS_RT io priority

On Thu, Oct 02, 2008 at 10:24:38PM -0700, Andrew Morton wrote:
> 
> Mount a junk partition with `-oakpm' and run some benchmarks.  If the
> results are "wow" then it's worth spending time on.  If the results are
> "meh" then we can not bother..
> 

I've ported the patch to the ext4 filesystem, and dropped it into the
unstable portion of the ext4 patch queue.  If we can get someone (hi,
Ric!) to run fs_mark with and without -o akpm_lock_hack, I suspect we
will find that it makes quite a large difference on that particular
benchmark, since it is fsync-heavy to force a large number of
transaction, and the creation of the inodes should cause multiple
blocks that will be entangled between the current and committing
transactions.

	      	     	     	      	      	 - Ted

ext4: akpm's locking hack to fix locking delays

This is a port of the following patch from Andrew Morton to ext4:

	http://lkml.org/lkml/2008/10/3/22

This fixes a major contention problem in do_get_write_access() when a
buffer is modified in both the current and committing transaction.

Signed-off-by: "Theodore Ts'o" <tytso@....edu>
Cc: akpm@...ux-foundation.org
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f46a513..23822fb 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -540,6 +540,7 @@ do {									       \
 #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000 /* Journal Async Commit */
 #define EXT4_MOUNT_I_VERSION            0x2000000 /* i_version support */
 #define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
+#define EXT4_MOUNT_AKPM_LOCK_HACK	0x10000000 /* akpm lock hack */
 /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at once */
 #ifndef _LINUX_EXT2_FS_H
 #define clear_opt(o, opt)		o &= ~EXT4_MOUNT_##opt
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 67ebefb..f4e7157 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -752,6 +752,8 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 		seq_puts(seq, ",journal_async_commit");
 	if (test_opt(sb, NOBH))
 		seq_puts(seq, ",nobh");
+	if (test_opt(sb, AKPM_LOCK_HACK))
+		seq_puts(seq, ",akpm_lock_hack");
 	if (!test_opt(sb, EXTENTS))
 		seq_puts(seq, ",noextents");
 	if (test_opt(sb, I_VERSION))
@@ -911,7 +913,7 @@ enum {
 	Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
 	Opt_grpquota, Opt_extents, Opt_noextents, Opt_i_version,
 	Opt_mballoc, Opt_nomballoc, Opt_stripe, Opt_delalloc, Opt_nodelalloc,
-	Opt_inode_readahead_blks
+	Opt_inode_readahead_blks, Opt_akpm_lock_hack,
 };
 
 static match_table_t tokens = {
@@ -973,6 +975,7 @@ static match_table_t tokens = {
 	{Opt_delalloc, "delalloc"},
 	{Opt_nodelalloc, "nodelalloc"},
 	{Opt_inode_readahead_blks, "inode_readahead_blks=%u"},
+	{Opt_akpm_lock_hack, "akpm_lock_hack"},
 	{Opt_err, NULL},
 };
 
@@ -1382,6 +1385,9 @@ set_qf_format:
 				return 0;
 			sbi->s_inode_readahead_blks = option;
 			break;
+		case Opt_akpm_lock_hack:
+			set_opt(sbi->s_mount_opt, AKPM_LOCK_HACK);
+			break;
 		default:
 			printk(KERN_ERR
 			       "EXT4-fs: Unrecognized mount option \"%s\" "
@@ -2534,6 +2540,10 @@ static void ext4_init_journal_params(struct super_block *sb, journal_t *journal)
 		journal->j_flags |= JBD2_BARRIER;
 	else
 		journal->j_flags &= ~JBD2_BARRIER;
+	if (test_opt(sb, AKPM_LOCK_HACK))
+		journal->j_flags |= JBD2_LOCK_HACK;
+	else
+		journal->j_flags &= ~JBD2_LOCK_HACK;
 	spin_unlock(&journal->j_state_lock);
 }
 
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e5d5405..32c288a 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -546,6 +546,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 	int error;
 	char *frozen_buffer = NULL;
 	int need_copy = 0;
+	int locked = 0;
 
 	if (is_handle_aborted(handle))
 		return -EROFS;
@@ -561,7 +562,13 @@ repeat:
 
 	/* @@@ Need to check for errors here at some point. */
 
-	lock_buffer(bh);
+	if (journal->j_flags & JBD2_LOCK_HACK) {
+		if (trylock_buffer(bh))
+			locked = 1;	/* lolz */
+	} else {
+		lock_buffer(bh);
+		locked = 1;
+	}
 	jbd_lock_bh_state(bh);
 
 	/* We now hold the buffer lock so it is safe to query the buffer
@@ -600,7 +607,8 @@ repeat:
 		jbd_unexpected_dirty_buffer(jh);
 	}
 
-	unlock_buffer(bh);
+	if (locked)
+		unlock_buffer(bh);
 
 	error = -EROFS;
 	if (is_handle_aborted(handle)) {
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 66c3499..c614dae 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -967,6 +967,7 @@ struct journal_s
 #define JBD2_FLUSHED	0x008	/* The journal superblock has been flushed */
 #define JBD2_LOADED	0x010	/* The journal superblock has been loaded */
 #define JBD2_BARRIER	0x020	/* Use IDE barriers */
+#define JBD2_LOCK_HACK	0x040	/* akpm's locking hack */
 
 /*
  * Function declarations for the journaling transaction and buffer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ