lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Aug 2008 09:11:53 +0200
From:	Jens Axboe <jens.axboe@...cle.com>
To:	Neil Brown <neilb@...e.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>, jurriaan <thunder7@...all.nl>,
	linux-kernel@...r.kernel.org
Subject: Re: 2.6.27-rc4: lots of 'in_atomic():1, irqs_disabled():0' with  software-raid1

On Fri, Aug 29 2008, Neil Brown wrote:
> On Friday August 29, neilb@...e.de wrote:
> > 
> > Here is my (untested yet) patch to address the problem.
> > I'll try to get some testing done and push it out early next week, but
> > if anyone could review and/or test that would be a great help.
> 
> Actually, that was my "haven't even compiled it yet" patch.  The one I
> meant to send was this one.
> 
> Sorry for the noise.
> 
> NeilBrown
> 
> 
> From 3a0646137016c69dbcaeed0114558f67daa4a6f0 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@...e.de>
> Date: Fri, 29 Aug 2008 15:46:38 +1000
> Subject: [PATCH] Fix problem with waiting while holding rcu read lock in md/bitmap.c
> 
> A recent patch to protect the rdev list with rcu locking leaves us
> with a problem because we can sleep on memalloc while holding the
> rcu lock.
> 
> The rcu lock is only needed while walking the linked list as
> uninteresting devices (failed or spares) can be removed at any time.
> 
> So only take the rcu lock while actually walking the linked list.
> Take a refcount on the rdev during the time when we drop the lock
> and do the memalloc to start IO.
> When we return to the locked code, all the interesting devices
> on the list will not have moved, so we can simply use
> list_for_each_continue_rcu to pick up where we left off.
> 
> Signed-off-by: NeilBrown <neilb@...e.de>
> ---
>  drivers/md/bitmap.c |   44 +++++++++++++++++++++++++++++++++++++-------
>  1 files changed, 37 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 7e65bad..c17eb4f 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -238,15 +238,46 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde
>  
>  }
>  
> +static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
> +{
> +	/* Iterate the disks of an mddev, using rcu to protect access to the
> +	 * linked list, and raising the refcount of devices we return to ensure
> +	 * they don't disappear while in use.
> +	 * As devices are only added or removed when raid_disk is < 0 and
> +	 * nr_pending is 0 and In_sync is clear, the entries we return will
> +	 * still be in the same position on the list when we re-enter
> +	 * list_for_each_continue_rcu.
> +	 */
> +	struct list_head *pos;
> +	rcu_read_lock();
> +	if (rdev == NULL)
> +		/* start at the beginning */
> +		pos = &mddev->disks;
> +	else
> +		/* release the previous rdev */
> +		rdev_dec_pending(rdev, mddev);
> +	
> +	list_for_each_continue_rcu(pos, &mddev->disks) {
> +		rdev = list_entry(pos, mdk_rdev_t, same_set);
> +		if (rdev->raid_disk >= 0 &&
> +		    test_bit(In_sync, &rdev->flags) &&
> +		    !test_bit(Faulty, &rdev->flags)) {
> +			/* this is a usable devices */
> +			atomic_inc(&rdev->nr_pending);
> +			rcu_read_unlock();
> +			return rdev;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return NULL;
> +}
> +
>  static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>  {
> -	mdk_rdev_t *rdev;
> +	mdk_rdev_t *rdev = NULL;
>  	mddev_t *mddev = bitmap->mddev;
>  
> -	rcu_read_lock();
> -	rdev_for_each_rcu(rdev, mddev)
> -		if (test_bit(In_sync, &rdev->flags)
> -		    && !test_bit(Faulty, &rdev->flags)) {
> +	while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
>  			int size = PAGE_SIZE;
>  			if (page->index == bitmap->file_pages-1)
>  				size = roundup(bitmap->last_page_size,
> @@ -281,8 +312,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
>  				       + page->index * (PAGE_SIZE/512),
>  				       size,
>  				       page);
> -		}
> -	rcu_read_unlock();
> +	}
>  
>  	if (wait)
>  		md_super_wait(mddev);
> -- 
> 1.5.6.5

Looks like an elegant solution to the problem Neil, good stuff!

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ