linux-kernel - Re: Kernel 3.4.X NFS server regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120611105515.3b99942c@tlielax.poochiereds.net>
Date:	Mon, 11 Jun 2012 10:55:15 -0400
From:	Jeff Layton <jlayton@...hat.com>
To:	Boaz Harrosh <bharrosh@...asas.com>
Cc:	bfields <bfields@...ldses.org>, Steve Dickson <steved@...hat.com>,
	"Myklebust, Trond" <Trond.Myklebust@...app.com>,
	Joerg Platte <jplatte@...sa.net>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	Hans de Bruin <jmdebruin@...net.nl>
Subject: Re: Kernel 3.4.X NFS server regression

On Mon, 11 Jun 2012 17:45:06 +0300
Boaz Harrosh <bharrosh@...asas.com> wrote:

> On 06/11/2012 05:11 PM, Jeff Layton wrote:
> 
> > On Mon, 11 Jun 2012 17:05:28 +0300
> > Boaz Harrosh <bharrosh@...asas.com> wrote:
> > 
> >> On 06/11/2012 04:51 PM, Jeff Layton wrote:
> >>
> >>>
> >>> That was considered here, but the problem with the usermode helper is
> >>> that you can't pass anything back to the kernel but a simple status
> >>> code (and that's assuming that you wait for it to exit). In the near
> >>> future, we'll need to pass back more info to the kernel for this, so
> >>> the usermode helper callout wasn't suitable.
> >>>
> >>
> >>
> >> I have answered that in my mail. Repeated here again. Well you made 
> >> a simple mistake. Because it is *easy* to pass back any number and
> >> size of information from user-mode.
> >>
> >> You just setup a sysfs entry points where the answers are written
> >> back to. It's an easy trick to setup a thread safe, way with a
> >> cookie but 90% of the time you don't have to. Say you set up
> >> a structure of per-client (identified uniquely) then user mode
> >> answers back per client, concurrency will not do any harm, since
> >> you answer to the same question the same answer. ans so on. Each
> >> problem it's own.
> >>
> >> If you want we can talk about this, it would be easy for me to setup
> >> a toll free conference number we can all use.
> > 
> > That helpful advice would have been welcome about 3-4 months ago when I
> > first proposed this in detail. At that point you're working with
> > multiple upcall/downcall mechanisms, which was something I was keen to
> > avoid.
> > 
> > I'm not opposed to moving in that direction, but it basically means
> > you're going to rip out everything I've got here so far and replace it.
> > 
> > If you're willing to do that work, I'll be happy to work with you on
> > it, but I don't have the time or inclination to do that on my own right
> > now.
> > 
> 
> 
> No such luck. sorry. I wish I could, but coming from a competing server
> company, you can imagine the priority of that ever happening.
> (Even though I use the Linux-Server everyday for my development and
>  am putting lots of efforts into still, mainly in pnfs)
> 
> Hopefully re-examining the code, it could all be salvaged just the
> same, only lots of code thrown a way.
> 
> But mean-while please address my concern below:
> Boaz Harrosh wrote: 
> 
> > One more thing, the most important one. We have already fixed that in the
> > past and I was hoping the lesson was learned. Apparently it was not, and
> > we are doomed to do this mistake for ever!!
> > 
> > What ever crap fails times out and crashes, in the recovery code, we don't
> > give a dam. It should never affect any Server-client communication.
> > 
> > When the grace periods ends the clients gates opens period. *Any* error
> > return from state recovery code must be carefully ignored and normal
> > operations resumed. At most on error, we move into a mode where any
> > recovery request from client is accepted, since we don't have any better
> > data to verify it.
> > 
> > Please comb recovery code to make sure any catastrophe is safely ignored.
> > We already did that before and it used to work.
> 
> 
> We should make sure that any state recovery code does not interfere with
> regular operations. and fails gracefully / shuts up. 
> 
> We used to have that, apparently it re-broke. Clients should always be granted
> access, after grace period. And Server should be made sure not to fail in any
> situation.
> 
> I would look into it but I'm not uptodate anymore, I wish you or Bruce could.
> 
> Thanks for your work so far, sorry to be bearer of bad news
> Boaz

This problem turned out to be a fairly straightforward bug in the
rpc_pipefs queue timeout mechanism that was causing the laundromat job
to hang and hence to keep the state lock locked. I just sent a patch
that should fix it.

I guess I'm not clear on what you're saying is broken. Modulo the
original bug here, clients are allowed access after the grace period
whether the upcalls are working or not.

What we cannot allow is reclaim requests outside of the grace period,
since we can't verify whether there was conflicting state in the
interim period. That's true whether the server has a functioning client
tracking mechanism or not.

-- 
Jeff Layton <jlayton@...hat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/