linux-kernel - Re: [regression] Crash in wb_clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C323677.9040209@kernel.dk>
Date:	Mon, 05 Jul 2010 21:45:59 +0200
From:	Jens Axboe <axboe@...nel.dk>
To:	Christoph Hellwig <hch@...radead.org>
CC:	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: [regression] Crash in wb_clear_pending()

On 05/07/10 21.32, Christoph Hellwig wrote:
> On Mon, Jul 05, 2010 at 09:24:39PM +0200, Jens Axboe wrote:
>> The oops itself looks like a recurrence of the missing RCU grace or
>> too early stack wakeup, which should be a 1-2 liner once it's found.
> 
> See the previous thread.  There's at least two issues:
> 
>  - wb_do_writeback checks work->state after it's been freed when we do
>    the second test_bit for WS_ONSTACK
>  - bdi_work_free accesses work->state after waking up the caller doing
>    bdi_wait_on_work_done, which might have re-used the stack space
>    allocated for the work item.
> 
> The fix for that is to get rid of the fragile work->state stuff and the
> bit wakeups by just using a completion and using that as indicator
> for the stack wait.  That's the main change the above patch does.  In
> addition it also merges the two structures used for the writeback
> requests.  Onl doing the completion and earlier list removal would
> be something like the untested patch below:

If those two late ON_STACK checks is the only issue left there,
why not just apply the below for 2.6.35?

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 0609607..15ce6ab 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -90,12 +90,13 @@ int writeback_in_progress(struct backing_dev_info *bdi)
 static void bdi_work_free(struct rcu_head *head)
 {
 	struct bdi_work *work = container_of(head, struct bdi_work, rcu_head);
+	int on_stack = test_bit(WS_ONSTACK, &work->state);
 
 	clear_bit(WS_INPROGRESS, &work->state);
 	smp_mb__after_clear_bit();
 	wake_up_bit(&work->state, WS_INPROGRESS);
 
-	if (!test_bit(WS_ONSTACK, &work->state))
+	if (!on_stack)
 		kfree(work);
 }
 
@@ -854,6 +855,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 
 	while ((work = get_next_work_item(bdi, wb)) != NULL) {
 		struct wb_writeback_args args = work->args;
+		int on_stack = test_bit(WS_ONSTACK, &work->state);
 
 		/*
 		 * Override sync mode, in case we must wait for completion
@@ -865,7 +867,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 		 * If this isn't a data integrity operation, just notify
 		 * that we have seen this work and we are now starting it.
 		 */
-		if (!test_bit(WS_ONSTACK, &work->state))
+		if (!on_stack)
 			wb_clear_pending(wb, work);
 
 		wrote += wb_writeback(wb, &args);
@@ -874,7 +876,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
 		 * This is a data integrity writeback, so only do the
 		 * notification when we have completed the work.
 		 */
-		if (test_bit(WS_ONSTACK, &work->state))
+		if (on_stack)
 			wb_clear_pending(wb, work);
 	}
 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/