linux-kernel - Re: [PATCH RFC] buffer_head: remove redundant test from wait_on

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xr93ljbbw3u8.fsf@ninji.mtv.corp.google.com>
Date:	Sat, 22 May 2010 23:05:03 -0700
From:	Greg Thelen <gthelen@...gle.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Richard Kennedy <richard@....demon.co.uk>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Jens Axboe <jens.axboe@...cle.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Nick Piggin <npiggin@...e.de>, Jeff Mahoney <jeffm@...e.com>,
	reiserfs-devel@...r.kernel.org
Subject: Re: [PATCH RFC]  buffer_head: remove redundant test from wait_on_buffer

Andrew Morton <akpm@...ux-foundation.org> writes:

> On Fri, 16 Apr 2010 11:58:19 +0100
> Richard Kennedy <richard@....demon.co.uk> wrote:
>
>> The comment suggests that when b_count equals zero it is calling
>> __wait_no_buffer to trigger some debug, but as there is no debug in
>> __wait_on_buffer the whole thing is redundant.
>> 
>> AFAICT from the git log this has been the case for at least 5 years, so
>> it seems safe just to remove this.
>> 
>> Signed-off-by: Richard Kennedy <richard@....demon.co.uk>
>> ---
>> 
>> This patch against 2.6.34-rc4
>> compiled & tested on x86_64
>> 
>> regards
>> Richard
>> 
>> 
>> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
>> index 16ed028..4c62dd4 100644
>> --- a/include/linux/buffer_head.h
>> +++ b/include/linux/buffer_head.h
>> @@ -305,15 +305,10 @@ map_bh(struct buffer_head *bh, struct super_block *sb, sector_t block)
>>  	bh->b_size = sb->s_blocksize;
>>  }
>>  
>> -/*
>> - * Calling wait_on_buffer() for a zero-ref buffer is illegal, so we call into
>> - * __wait_on_buffer() just to trip a debug check.  Because debug code in inline
>> - * functions is bloaty.
>> - */
>>  static inline void wait_on_buffer(struct buffer_head *bh)
>>  {
>>  	might_sleep();
>> -	if (buffer_locked(bh) || atomic_read(&bh->b_count) == 0)
>> +	if (buffer_locked(bh))
>>  		__wait_on_buffer(bh);
>>  }
>
> That debug check got inadvertently crippled during some wait_on_bit()
> conversion.
>
> It's still a nasty bug to call wait_on_buffer() against a zero-ref
> buffer so perhaps we should fix it up rather than removing its remains.
>
> diff -puN include/linux/buffer_head.h~buffer_head-remove-redundant-test-from-wait_on_buffer-fix include/linux/buffer_head.h
> --- a/include/linux/buffer_head.h~buffer_head-remove-redundant-test-from-wait_on_buffer-fix
> +++ a/include/linux/buffer_head.h
> @@ -305,10 +305,15 @@ map_bh(struct buffer_head *bh, struct su
>  	bh->b_size = sb->s_blocksize;
>  }
>  
> +/*
> + * Calling wait_on_buffer() for a zero-ref buffer is illegal, so we call into
> + * __wait_on_buffer() just to trip a debug check.  Because debug code in inline
> + * functions is bloaty.
> + */
>  static inline void wait_on_buffer(struct buffer_head *bh)
>  {
>  	might_sleep();
> -	if (buffer_locked(bh))
> +	if (buffer_locked(bh) || atomic_read(&bh->b_count) == 0)
>  		__wait_on_buffer(bh);
>  }
>  
> diff -puN fs/buffer.c~buffer_head-remove-redundant-test-from-wait_on_buffer-fix fs/buffer.c
> --- a/fs/buffer.c~buffer_head-remove-redundant-test-from-wait_on_buffer-fix
> +++ a/fs/buffer.c
> @@ -90,6 +90,12 @@ EXPORT_SYMBOL(unlock_buffer);
>   */
>  void __wait_on_buffer(struct buffer_head * bh)
>  {
> +	/*
> +	 * Calling wait_on_buffer() against a zero-ref buffer is a nasty bug
> +	 * because it will almost always "work".  However this buffer can be
> +	 * reclaimed at any time.  So check for it.
> +	 */
> +	VM_BUG_ON(atomic_read(&bh->b_count) == 0);

My system is failing this VM_BUG_ON() occasionally.  I think this is due to
wait_on_buffer() calls with b_count=0 from locations within fs/buffer.c.  These
occasional b_count=0 callers are caused by buf reads that complete quickly -
after the I/O is issued but before it is waited upon.  Such fs/buffer.c callers
need to either bypass this assertion or increment b_count.  I don't think they
need to grab an b_count reference.  I suggest a bypass routine in the patch
below.  Does this look good?

>  	wait_on_bit(&bh->b_state, BH_Lock, sync_buffer, TASK_UNINTERRUPTIBLE);
>  }
>  EXPORT_SYMBOL(__wait_on_buffer);
> _
>
>
> And while we're there...
>
> This might make reiserfs explode.
>
>
>
> From: Andrew Morton <akpm@...ux-foundation.org>
>
> The first thing __wait_on_buffer()->wait_on_bit() does is to test that the
> bit was set, so the buffer_locked() test is now redundant.  And once we
> remove that, we can remove the check for zero ->b_count also.
>
> And now that wait_on_buffer() unconditionally calls __wait_on_buffer(), we
> can move the might_sleep() check into __wait_on_buffer() to save some text.
>
> The downside of all of this is that wait_on_buffer() against an unlocked
> buffer will now always perform a function call.  Is it a common case?
>
> We can remove __wait_on_buffer() altogether now.  For some strange reason
> reiserfs calls __wait_on_buffer() directly.  Maybe it's passing in
> zero-ref buffers.  If so, we'll get warnings now and shall need to look at
> that.
>
> Cc: Jens Axboe <jens.axboe@...cle.com>
> Cc: Nick Piggin <nickpiggin@...oo.com.au>
> Cc: Richard Kennedy <richard@....demon.co.uk>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
>  fs/buffer.c                 |    2 ++
>  include/linux/buffer_head.h |    4 +---
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff -puN include/linux/buffer_head.h~wait_on_buffer-remove-the-buffer_locked-test include/linux/buffer_head.h
> --- a/include/linux/buffer_head.h~wait_on_buffer-remove-the-buffer_locked-test
> +++ a/include/linux/buffer_head.h
> @@ -312,9 +312,7 @@ map_bh(struct buffer_head *bh, struct su
>   */
>  static inline void wait_on_buffer(struct buffer_head *bh)
>  {
> -	might_sleep();
> -	if (buffer_locked(bh) || atomic_read(&bh->b_count) == 0)
> -		__wait_on_buffer(bh);
> +	__wait_on_buffer(bh);
>  }
>  
>  static inline int trylock_buffer(struct buffer_head *bh)
> diff -puN fs/buffer.c~wait_on_buffer-remove-the-buffer_locked-test fs/buffer.c
> --- a/fs/buffer.c~wait_on_buffer-remove-the-buffer_locked-test
> +++ a/fs/buffer.c
> @@ -90,6 +90,8 @@ EXPORT_SYMBOL(unlock_buffer);
>   */
>  void __wait_on_buffer(struct buffer_head * bh)
>  {
> +	might_sleep();
> +
>  	/*
>  	 * Calling wait_on_buffer() against a zero-ref buffer is a nasty bug
>  	 * because it will almost always "work".  However this buffer can be
> _

From: Greg Thelen <gthelen@...gle.com>

Introduce new routine for waiting on buffers with zero b_count.

In limited cases it is expected that a buffer can have a zero b_count but
still be protected from reclamation.  Waiting on such buffers with
wait_on_buffer() risks failure of the b_count assertion.  To avoid failing
the b_count assertion in the normal wait_on_buffer() path, this patch
introduces a new routine, __wait_on_buffer_unsafe(), for the few cases
that wait on a buffer which may have a zero b_count.  wait_on_buffer()
indirectly asserts that b_count is non-zero.  This assertion is
generally useful, but causes problems for a few cases in fs/buffer.c:
* __block_prepare_write()
* nobh_write_begin()
* block_truncate_page()

Without this patch I found that a virtual machine would occasionally
fail the __wait_on_buffer() b_count assertion when called from
__block_prepare_write().  Visual inspection suggests that the other two
routines could also fail the same b_count assertion.  So all three
routines now make use of the new __wait_on_buffer_unsafe() routine,
which avoids asserting b_count.

Signed-off-by: Greg Thelen <gthelen@...gle.com>
---
 fs/buffer.c |   21 +++++++++++++++------
 1 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 2500ada..c715da4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -92,21 +92,30 @@ void unlock_buffer(struct buffer_head *bh)
 EXPORT_SYMBOL(unlock_buffer);
 
 /*
+ * Block until a buffer comes unlocked.  This routine trusts the caller to
+ * ensure that the buffer will not be reclaimed.  Holding a b_count reference is
+ * one way, page lock is another.
+ */
+static void __wait_on_buffer_unsafe(struct buffer_head *bh)
+{
+	might_sleep();
+	wait_on_bit(&bh->b_state, BH_Lock, sync_buffer, TASK_UNINTERRUPTIBLE);
+}
+
+/*
  * Block until a buffer comes unlocked.  This doesn't stop it
  * from becoming locked again - you have to lock it yourself
  * if you want to preserve its state.
  */
 void __wait_on_buffer(struct buffer_head * bh)
 {
-	might_sleep();
-
 	/*
 	 * Calling wait_on_buffer() against a zero-ref buffer is a nasty bug
 	 * because it will almost always "work".  However this buffer can be
 	 * reclaimed at any time.  So check for it.
 	 */
 	VM_BUG_ON(atomic_read(&bh->b_count) == 0);
-	wait_on_bit(&bh->b_state, BH_Lock, sync_buffer, TASK_UNINTERRUPTIBLE);
+	__wait_on_buffer_unsafe(bh);
 }
 EXPORT_SYMBOL(__wait_on_buffer);
 
@@ -1934,7 +1943,7 @@ static int __block_prepare_write(struct inode *inode, struct page *page,
 	 * If we issued read requests - let them complete.
 	 */
 	while(wait_bh > wait) {
-		wait_on_buffer(*--wait_bh);
+		__wait_on_buffer_unsafe(*--wait_bh);
 		if (!buffer_uptodate(*wait_bh))
 			err = -EIO;
 	}
@@ -2603,7 +2612,7 @@ int nobh_write_begin(struct file *file, struct address_space *mapping,
 		 * for the buffer_head refcounts.
 		 */
 		for (bh = head; bh; bh = bh->b_this_page) {
-			wait_on_buffer(bh);
+			__wait_on_buffer_unsafe(bh);
 			if (!buffer_uptodate(bh))
 				ret = -EIO;
 		}
@@ -2865,7 +2874,7 @@ int block_truncate_page(struct address_space *mapping,
 	if (!buffer_uptodate(bh) && !buffer_delay(bh) && !buffer_unwritten(bh)) {
 		err = -EIO;
 		ll_rw_block(READ, 1, &bh);
-		wait_on_buffer(bh);
+		__wait_on_buffer_unsafe(bh);
 		/* Uhhuh. Read error. Complain and punt. */
 		if (!buffer_uptodate(bh))
 			goto unlock;
-- 
1.7.0.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/