[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6278d2220908170653s45df9989t9fa550f7efa0c182@mail.gmail.com>
Date: Mon, 17 Aug 2009 14:53:08 +0100
From: Daniel J Blueman <daniel.blueman@...il.com>
To: Trond Myklebust <Trond.Myklebust@...app.com>
Cc: linux-nfs@...r.kernel.org, Chuck Lever <chuck.lever@...cle.com>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [2.6.31-rc5] oops: NFS4 client manager kthread...
Hi Trond,
On Mon, Aug 17, 2009 at 2:12 PM, Trond
Myklebust<Trond.Myklebust@...app.com> wrote:
> On Sun, 2009-08-16 at 23:40 +0100, Daniel J Blueman wrote:
>> After losing and regaining ethernet link a few times with 2.6.31-rc5
>> [1], I've hit an oops in the NFS4 client manager kthread [2] on my
>> client with NFS4 homedir mount.
>>
>> Do you have a frequent test-case for when the client's manager kthread
>> gets invoked (with and without succeeding callbacks, due to eg a
>> firewall)? Server here is unpatched 2.6.30-rc6; I recall seeing
>> problems when the manager kthread gets invoked, across quite a few
>> kernel releases, just wasn't lucky enough to catch an oops.
>>
>> Oppsing in allow_signal() suggests task state corruption perhaps? I'm
>> downloading the debug kernel to match up the disassembly and line
>> numbers, if that helps? This time, the client had no firewall (but
>> have seen other issues when the callback has failed due to the
>> firewall).
>
> Those aren't Oopses. They are 'soft lockup' warnings. Basically, they're
> saying that the CPU is getting stuck waiting for a spin lock or a mutex.
>
> In this case, it is probably the fact that the state manager is going
> nuts trying to recover, while the connection to the server keeps coming
> up and going down.
>
> What does 'netstat -t' say when you get into this situation?
Whoops; it's true the stack-trace comes from the soft-lockup detector.
There was a single 200s link excursion, but the client didn't recover
as locks are held and never released it seems; I observe the
'192.168.1.250-m' NFS4 manager kthread being created and not going
away, despite IP connectivity with the server being fine after.
I'll reproduce it with stock 2.6.31-rc6 on the client and get 'netstat
-t' output.
Thanks for looking at this!
Daniel
> Cheers
> Trond
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@...app.com
> www.netapp.com
>
--
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists