linux-kernel - Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, comm: md1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <050e01c9da5e$8d142b20$0400a8c0@dcccs>
Date:	Thu, 21 May 2009 23:53:16 +0200
From:	"Janos Haar" <janos.haar@...center.hu>
To:	"Neil Brown" <neilb@...e.de>
Cc:	<paulmck@...ux.vnet.ibm.com>, <linux-kernel@...r.kernel.org>
Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, comm: md1_raid5

Neil, Paul,

The problem solved.
It was a bios bug.
(The fedora install CD makes the same, and i am checked with the latest BIOS 
version, and the delays are gone. 8-)

Thanks for all help for you too!

Janos Haar

----- Original Message ----- 
From: "Neil Brown" <neilb@...e.de>
To: <paulmck@...ux.vnet.ibm.com>
Cc: "Janos Haar" <janos.haar@...center.hu>; <linux-kernel@...r.kernel.org>
Sent: Thursday, May 21, 2009 8:50 AM
Subject: Re: Fw: RCU detected CPU 1 stall (t=4295904002/751 jiffies)Pid:902, 
comm: md1_raid5


> On Wednesday May 20, paulmck@...ux.vnet.ibm.com wrote:
>> On Thu, May 21, 2009 at 06:46:15AM +0200, Janos Haar wrote:
>> > Paul,
>> >
>> > Thank you for your attention.
>> > Yes, the PC makes 2-3 second "pauses" and drop this message again and
>> > again.
>> > If i remove the RCU debugging, the message disappears, but the pauses 
>> > still
>> > here, and makes 2-3 load on the idle system.
>> > Can i do something?
>> > You suggest to use PREEMPT? (This is a server.)
>>
>> One possibility is that the lock that bitmap_daemon_work() acquires is
>> being held for too long.  Another possibility is the list traversal in
>> md_check_recovery() that might loop for a long time if the list were
>> excessively long or could be temporarily tied in a knot.
>>
>> Neil, thoughts?
>>
>
> I would be surprised if any of these things take as long as 3 seconds
> (or even 1 second) but I cannot completely rule it out.
>
> I assume that you mean 3 seconds of continuous running with no
> sleeping, so it cannot be a slow kmalloc that is causing the delay?
>
> bitmap_daemon_work is the most likely candidate as bitmap->chunks
> can be very large (thousands, probably not millions though).
> Taking and dropping the lock every time around that loop doesn't
> really make much sense, does it....
> And it looks like it could actually be optimised quite a bit to skip a
> lot of the iterations in most cases - there are two places where we
> can accelerate 'j' quite a lot.
>
> Janos: Can you try this and see if it makes a difference?
> Thanks.
>
> NeilBrown
>
> diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
> index 47c68bc..56df1ce 100644
> --- a/drivers/md/bitmap.c
> +++ b/drivers/md/bitmap.c
> @@ -1097,14 +1097,12 @@ void bitmap_daemon_work(struct bitmap *bitmap)
>  }
>  bitmap->allclean = 1;
>
> + spin_lock_irqsave(&bitmap->lock, flags);
>  for (j = 0; j < bitmap->chunks; j++) {
>  bitmap_counter_t *bmc;
> - spin_lock_irqsave(&bitmap->lock, flags);
> - if (!bitmap->filemap) {
> + if (!bitmap->filemap)
>  /* error or shutdown */
> - spin_unlock_irqrestore(&bitmap->lock, flags);
>  break;
> - }
>
>  page = filemap_get_page(bitmap, j);
>
> @@ -1121,6 +1119,8 @@ void bitmap_daemon_work(struct bitmap *bitmap)
>  write_page(bitmap, page, 0);
>  bitmap->allclean = 0;
>  }
> + spin_lock_irqsave(&bitmap->lock, flags);
> + j |= (PAGE_BITS - 1);
>  continue;
>  }
>
> @@ -1181,9 +1181,10 @@ void bitmap_daemon_work(struct bitmap *bitmap)
>  ext2_clear_bit(file_page_offset(j), paddr);
>  kunmap_atomic(paddr, KM_USER0);
>  }
> - }
> - spin_unlock_irqrestore(&bitmap->lock, flags);
> + } else
> + j |= PAGE_COUNTER_MASK;
>  }
> + spin_unlock_irqrestore(&bitmap->lock, flags);
>
>  /* now sync the final page */
>  if (lastpage != NULL) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/ 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/