[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <050e01c9da5e$8d142b20$0400a8c0@dcccs>
Date: Thu, 21 May 2009 23:53:16 +0200
From: "Janos Haar" <janos.haar@...center.hu>
To: "Neil Brown" <neilb@...e.de>
Cc: <paulmck@...ux.vnet.ibm.com>, <linux-kernel@...r.kernel.org>
Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, comm: md1_raid5
Neil, Paul,
The problem solved.
It was a bios bug.
(The fedora install CD makes the same, and i am checked with the latest BIOS
version, and the delays are gone. 8-)
Thanks for all help for you too!
Janos Haar
----- Original Message -----
From: "Neil Brown" <neilb@...e.de>
To: <paulmck@...ux.vnet.ibm.com>
Cc: "Janos Haar" <janos.haar@...center.hu>; <linux-kernel@...r.kernel.org>
Sent: Thursday, May 21, 2009 8:50 AM
Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902,
comm: md1_raid5
> On Wednesday May 20, paulmck@...ux.vnet.ibm.com wrote:
>> On Thu, May 21, 2009 at 06:46:15AM +0200, Janos Haar wrote:
>> > Paul,
>> >
>> > Thank you for your attention.
>> > Yes, the PC makes 2-3 second "pauses" and drop this message again and
>> > again.
>> > If i remove the RCU debugging, the message disappears, but the pauses
>> > still
>> > here, and makes 2-3 load on the idle system.
>> > Can i do something?
>> > You suggest to use PREEMPT? (This is a server.)
>>
>> One possibility is that the lock that bitmap_daemon_work() acquires is
>> being held for too long. Another possibility is the list traversal in
>> md_check_recovery() that might loop for a long time if the list were
>> excessively long or could be temporarily tied in a knot.
>>
>> Neil, thoughts?
>>
>
> I would be surprised if any of these things take as long as 3 seconds
> (or even 1 second) but I cannot completely rule it out.
>
> I assume that you mean 3 seconds of continuous running with no
> sleeping, so it cannot be a slow kmalloc that is causing the delay?
>
> bitmap_daemon_work is the most likely candidate as bitmap->chunks
> can be very large (thousands, probably not millions though).
> Taking and dropping the lock every time around that loop doesn't
> really make much sense, does it....
> And it looks like it could actually be optimised quite a bit to skip a
> lot of the iterations in most cases - there are two places where we
> can accelerate 'j' quite a lot.
>
> Janos: Can you try this and see if it makes a difference?
> Thanks.
>
> NeilBrown
>
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 47c68bc..56df1ce 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -1097,14 +1097,12 @@ void bitmap_daemon_work(struct bitmap *bitmap)
> }
> bitmap->allclean = 1;
>
> + spin_lock_irqsave(&bitmap->lock, flags);
> for (j = 0; j < bitmap->chunks; j++) {
> bitmap_counter_t *bmc;
> - spin_lock_irqsave(&bitmap->lock, flags);
> - if (!bitmap->filemap) {
> + if (!bitmap->filemap)
> /* error or shutdown */
> - spin_unlock_irqrestore(&bitmap->lock, flags);
> break;
> - }
>
> page = filemap_get_page(bitmap, j);
>
> @@ -1121,6 +1119,8 @@ void bitmap_daemon_work(struct bitmap *bitmap)
> write_page(bitmap, page, 0);
> bitmap->allclean = 0;
> }
> + spin_lock_irqsave(&bitmap->lock, flags);
> + j |= (PAGE_BITS - 1);
> continue;
> }
>
> @@ -1181,9 +1181,10 @@ void bitmap_daemon_work(struct bitmap *bitmap)
> ext2_clear_bit(file_page_offset(j), paddr);
> kunmap_atomic(paddr, KM_USER0);
> }
> - }
> - spin_unlock_irqrestore(&bitmap->lock, flags);
> + } else
> + j |= PAGE_COUNTER_MASK;
> }
> + spin_unlock_irqrestore(&bitmap->lock, flags);
>
> /* now sync the final page */
> if (lastpage != NULL) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists