lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1210957894.3608.23.camel@localhost.localdomain>
Date:	Fri, 16 May 2008 10:11:34 -0700
From:	Mingming Cao <cmm@...ibm.com>
To:	Josef Bacik <jbacik@...hat.com>
Cc:	Jan Kara <jack@...e.cz>, Badari Pulavarty <pbadari@...ibm.com>,
	akpm@...ux-foundation.org, linux-ext4@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Fix DIO EIO error caused by race between
	jbd_commit_transaction() and journal_try_to_drop_buffers()

On Fri, 2008-05-16 at 11:01 -0400, Josef Bacik wrote:

> 
> Got a couple of whitespace problems above it looks like.  Thanks,
> 

Thanks for catching this, below is updated patch, fixed the whitespace
and comments.


---------------------------------------------------
JBD: fix journal_try_to_free_buffers race with
journal_commit_transaction

From: Mingming Cao <cmm@...ibm.com>

This patch fixed a few races between direct IO and kjournld commit
transaction.
An unexpected EIO error gets returned to direct IO caller when it failed
to
free those data buffers. This could be reproduced easily with parallel 
direct write and buffered write to the same file

More specificly, those races could cause journal_try_to_free_buffers()
fail to free the data buffers, when jbd is committing the transaction
that has
those data buffers on its t_syncdata_list or t_locked_list. 
journal_commit_transaction() still holds the reference to those buffers
 before data reach to disk and buffers are removed from the 
t_syncdata_list of t_locked_list. This prevent the concurrent 
journal_try_to_free_buffers() to free those buffers at the same time,
but cause
EIO error returns back to direct IO.

With this patch, in case of direct IO and when try_to_free_buffers()
failed,
let's waiting for journal_commit_transaction() to finish
flushing the current committing transaction's data buffers to disk, 
then try to free those buffers again.

Signed-off-by: Mingming Cao <cmm@...ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@...ibm.com> 
---
 fs/jbd/commit.c      |    1 +
 fs/jbd/journal.c     |    1 +
 fs/jbd/transaction.c |   46
++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/jbd.h  |    3 +++
 4 files changed, 51 insertions(+)

Index: linux-2.6.26-rc1/include/linux/jbd.h
===================================================================
--- linux-2.6.26-rc1.orig/include/linux/jbd.h	2008-05-14
16:36:41.000000000 -0700
+++ linux-2.6.26-rc1/include/linux/jbd.h	2008-05-15 14:12:10.000000000
-0700
@@ -667,6 +667,9 @@ struct journal_s
 	 */
 	wait_queue_head_t	j_wait_transaction_locked;
 
+	/* Wait queu for waiting for data buffers to flushed to disk*/
+	wait_queue_head_t	j_wait_data_flushed;
+
 	/* Wait queue for waiting for checkpointing to complete */
 	wait_queue_head_t	j_wait_logspace;
 
Index: linux-2.6.26-rc1/fs/jbd/commit.c
===================================================================
--- linux-2.6.26-rc1.orig/fs/jbd/commit.c	2008-05-03 11:59:44.000000000
-0700
+++ linux-2.6.26-rc1/fs/jbd/commit.c	2008-05-15 14:12:46.000000000 -0700
@@ -462,6 +462,7 @@ void journal_commit_transaction(journal_
 	 * clean by now, so check that it is in fact empty.
 	 */
 	J_ASSERT (commit_transaction->t_sync_datalist == NULL);
+	wake_up(&journal->j_wait_data_flushed)
 
 	jbd_debug (3, "JBD: commit phase 3\n");
 
Index: linux-2.6.26-rc1/fs/jbd/journal.c
===================================================================
--- linux-2.6.26-rc1.orig/fs/jbd/journal.c	2008-05-14 16:36:41.000000000
-0700
+++ linux-2.6.26-rc1/fs/jbd/journal.c	2008-05-15 14:13:02.000000000
-0700
@@ -660,6 +660,7 @@ static journal_t * journal_init_common (
 		goto fail;
 
 	init_waitqueue_head(&journal->j_wait_transaction_locked);
+	init_waitqueue_head(&journal->j_wait_data_flushed);
 	init_waitqueue_head(&journal->j_wait_logspace);
 	init_waitqueue_head(&journal->j_wait_done_commit);
 	init_waitqueue_head(&journal->j_wait_checkpoint);
Index: linux-2.6.26-rc1/fs/jbd/transaction.c
===================================================================
--- linux-2.6.26-rc1.orig/fs/jbd/transaction.c	2008-05-03
11:59:44.000000000 -0700
+++ linux-2.6.26-rc1/fs/jbd/transaction.c	2008-05-16 09:27:21.000000000
-0700
@@ -1648,12 +1648,49 @@ out:
 	return;
 }
 
+/*
+ * journal_try_to_free_buffers() could race with
journal_commit_transaction()
+ * The later might still hold the reference count to the buffers when
inspecting
+ * them on t_syncdata_list or t_locked_list.
+ *
+ * Journal_try_to_free_buffers() will call this function to
+ * wait for the current transaction finishing syncing data buffers,
before
+ * try to free that buffer.
+ *
+ * Called with journal->j_state_lock hold.
+ */
+static void journal_wait_for_transaction_sync_data(journal_t *journal)
+{
+	transaction_t *transaction = NULL;
+
+	transaction = journal->j_committing_transaction;
+
+	if (!transaction)
+		return;
+
+	/*
+	 * If the current transaction is flushing and waiting for data buffers
+	 * (t_state is T_FLUSH), wait for the j_wait_data_flushed event
+	 */
+	if (transaction->t_state == T_FLUSH) {
+		DEFINE_WAIT(wait);
+
+		prepare_to_wait(&journal->j_wait_data_flushed,
+			&wait, TASK_UNINTERRUPTIBLE);
+		spin_unlock(&journal->j_state_lock);
+		schedule();
+		finish_wait(&journal->j_wait_data_flushed, &wait);
+		spin_lock(&journal->j_state_lock);
+	}
+	return;
+}
 
 /**
  * int journal_try_to_free_buffers() - try to free page buffers.
  * @journal: journal for operation
  * @page: to try and free
- * @unused_gfp_mask: unused
+ * @gfp_mask: unused for allocation purpose. Here is used
+ * 	      as a flag to tell if direct IO is attemping to free buffers.
  *
  *
  * For all the buffers on this page,
@@ -1682,13 +1719,16 @@ out:
  * journal_try_to_free_buffer() is changing its state.  But that
  * cannot happen because we never reallocate freed data as metadata
  * while the data is part of a transaction.  Yes?
+ *
+ * Return 0 on failure, 1 on success
  */
 int journal_try_to_free_buffers(journal_t *journal,
-				struct page *page, gfp_t unused_gfp_mask)
+				struct page *page, gfp_t gfp_mask)
 {
 	struct buffer_head *head;
 	struct buffer_head *bh;
 	int ret = 0;
+	int dio = gfp_mask & __GFP_REPEAT;
 
 	J_ASSERT(PageLocked(page));
 
@@ -1713,7 +1753,31 @@ int journal_try_to_free_buffers(journal_
 		if (buffer_jbd(bh))
 			goto busy;
 	} while ((bh = bh->b_this_page) != head);
+
 	ret = try_to_free_buffers(page);
+
+ 	/*
+	 * In the case of concurrent direct IO and buffered IO,
+	 * There are a number of places where we
+	 * could race with journal_commit_transaction(), the later still
+	 * helds the reference to the buffers to free while processing them.
+	 * try_to_free_buffers() failed to free those buffers,
+	 * resulting in an unexpected EIO error
+	 * returns back to the generic_file_direct_IO()
+	 *
+	 * So let's wait for the current transaction finished flush
+	 * dirty data buffers before we try to free those buffers
+	 * again. This wait is needed by direct IO code path only,
+	 * gfp_mask __GFP_REPEAT is passed from the direct IO code
+	 * path to flag if we need to wait and retry free buffers.
+	 */
+	if (ret == 0 && dio) {
+        	spin_lock(&journal->j_state_lock);
+		journal_wait_for_transaction_sync_data(journal);
+		ret = try_to_free_buffers(page);
+		spin_unlock(&journal->j_state_lock);
+	}
+
 busy:
 	return ret;
 }
Index: linux-2.6.26-rc1/mm/truncate.c
===================================================================
--- linux-2.6.26-rc1.orig/mm/truncate.c	2008-05-03 11:59:44.000000000
-0700
+++ linux-2.6.26-rc1/mm/truncate.c	2008-05-15 13:13:21.000000000 -0700
@@ -346,7 +346,8 @@ invalidate_complete_page2(struct address
 	if (page->mapping != mapping)
 		return 0;
 
-	if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL))
+	if (PagePrivate(page) &&
+			!try_to_release_page(page,GFP_KERNEL|__GFP_REPEAT))
 		return 0;
 
 	write_lock_irq(&mapping->tree_lock);


> Josef
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ