[<prev] [next>] [day] [month] [year] [list]
Message-ID: <464A2EEE.7070509@atipa.com>
Date: Tue, 15 May 2007 17:06:38 -0500
From: Roger Heflin <rheflin@...pa.com>
To: Dave Kleikamp <shaggy@...ux.vnet.ibm.com>
CC: linux-kernel@...r.kernel.org, nfs@...ts.sourceforge.net
Subject: Re: Apparent Deadlock with nfsd/jfs on 2.6.21.1 under bonnie.
Dave Kleikamp wrote:
> Sorry if I'm missing anyone on the reply, but my mail feed is messed up
> and I'm replying from the gmane archive.
>
> On Tue, 15 May 2007 09:08:25 -0500, Roger Heflin wrote:
>
>> Hello,
>>
>> Running 2.6.21.1 (FC6 Dist), with a RHEL client (client
>> appears to not be having issues) I am getting what I believe
>> is a deadlock on the server end. This is with JFS and
>> NFSD, I have not tested yet with a non-JFS filesystem,
>> though our customer indicated that they have duplicated it with
>> the ext3 filesystem.
>
> I don't have an answer to an ext3 deadlock, but this looks like a jfs
> problem that was recently fixed in linux-2.6.22-rc1. I had intended to
> send it to the stable kernel after it was picked up in mainline, but
> hadn't gotten to it yet.
>
> The patch is here:
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=05ec9e26be1f668ccba4ca54d9a4966c6208c611
>
Ok.
My customer reported that he though he had a ext3, so far I have
not been able to duplicate the ext3 hang.
If ext3 survives until tomorrow, I will retest unpatched jfs, and then
patch it and test again.
>> The basic setup is:
>> fiber channel array -> qlogic fiber card -> /dev/sdx -> LVM stripe ->
>> jfs -> nfs.
>>
>> Running bonnie on a NFS share has apparently produced a deadlock. I
>> have ran bonnie several times without having any issues, I don't believe
>> this is a HW issue, we have a couple of other machines configured with
>> slightly different HW and are also able to duplicate this problem on
>> those machines. There are no abnormal messages in dmesg or in the
>> messages file.
>>
>> After having the apparent deadlock I started a dd of a on the deadlocked
>> filesystem and according to vmstat 1 that was actually working, I then
>> did a "mkdir junk" on the deadlocked filesystem and that apparently put
>> the cat into a permanent "D" state. I will include the sysrq -t from
>> before the cat/mkdir and after the cat/mkdir.
>>
>> I believe I can duplicate this again, and other than the processes going
>> into the "D" state everything else seems to work. Other filesytems
>> appear to be functional, I can still login to the machine.
>>
>> Right now the machine is in the deadlocked state, and I will wait for
>> any suggestions of more data to collect or other tests to try.
>
> I haven't tried it on a locked-up system, but you may try waking up the
> [jfsIO] kernel thread with a signal. I'm not sure what signals may get
> through, since the thread doesn't specifically act on a signal.
>
I will try on the next lockup.
Roger
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists