lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 05 Sep 2007 16:55:29 +0100
From:	Trond Myklebust <Trond.Myklebust@...app.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Bret Towe <magnade@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: nfs4 hang regression

On Wed, 2007-09-05 at 08:51 -0700, Andrew Morton wrote:
> > On Wed, 22 Aug 2007 15:41:18 -0700 "Bret Towe" <magnade@...il.com> wrote:
> 
> More than two weeks, you've bisected it and there's no sign of any action?
> 
> I don't see this on Michal's list so perhaps it already got fixed in a different
> thread.  Have you tested current mainline?

I believe Bret already declared himself happy with the fix that was
merged into mainline yesterday.

> > as of commit 3d39c691ff486142dd9aaeac12f553f4476b7a62
> > I got a hang on my clients after something around 26 minutes of uptime
> > the keyboard would stop accepting input
> > mouse would work still, plugging in a spare usb keyboard made no difference
> > another couple odd bits is shutting down would hang, below is a trace
> > of that hang
> > also if your logged into that machine via ssh when you log out it would hang
> > and would require a force terminate
> > 
> > I'm seeing this on 2 machines 1 a g4 mac mini and 2 a athlon64
> > both have a home directory mounted via nfs4
> > reverting the commit above found via bisecting on both machines allows me
> > to use the computers for several hours with no issues
> > 
> > and here is the trace from the athlon64 taken during its hang on shutdown
> > as you can tell from the dates in the trace ive been a bit busy and just
> > now got around to sending this email
> > I'm surprised tho I hadn't seen anyone hitting it
> > let me know what else is needed I'll do what I can and get it out quickly
> > 
> > Aug 12 17:06:35 ghoststar kernel: [ 4791.801262] SysRq : Show State
> > Aug 12 17:06:35 ghoststar kernel: [ 4791.801325]   task
> >         PC stack   pid father
> > Aug 12 17:06:35 ghoststar kernel: [ 4791.801327] init          S
> > 00000000ffffffff     0     1      0
> > Aug 12 17:06:35 ghoststar kernel: [ 4791.801331]  ffff81003ff879b8
> > 0000000000000086 0000000000000008 ffff81003ff84000
> 
> I really wish that hadn't been wordwrapped.
> 
> As a random stab-in-the-dark, does this help?
> 
> --- a/fs/nfs/nfs4renewd.c~a
> +++ a/fs/nfs/nfs4renewd.c
> @@ -133,9 +133,7 @@ nfs4_renewd_prepare_shutdown(struct nfs_
>  void
>  nfs4_kill_renewd(struct nfs_client *clp)
>  {
> -	down_read(&clp->cl_sem);
>  	cancel_delayed_work_sync(&clp->cl_renewd);
> -	up_read(&clp->cl_sem);
>  }

Nope. The problem was the cancel_delayed_work_sync(). It turns out that
this may recurse...

Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ