[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FD60472.7020207@panasas.com>
Date: Mon, 11 Jun 2012 17:45:06 +0300
From: Boaz Harrosh <bharrosh@...asas.com>
To: Jeff Layton <jlayton@...hat.com>
CC: bfields <bfields@...ldses.org>, Steve Dickson <steved@...hat.com>,
"Myklebust, Trond" <Trond.Myklebust@...app.com>,
Joerg Platte <jplatte@...sa.net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
Hans de Bruin <jmdebruin@...net.nl>
Subject: Re: Kernel 3.4.X NFS server regression
On 06/11/2012 05:11 PM, Jeff Layton wrote:
> On Mon, 11 Jun 2012 17:05:28 +0300
> Boaz Harrosh <bharrosh@...asas.com> wrote:
>
>> On 06/11/2012 04:51 PM, Jeff Layton wrote:
>>
>>>
>>> That was considered here, but the problem with the usermode helper is
>>> that you can't pass anything back to the kernel but a simple status
>>> code (and that's assuming that you wait for it to exit). In the near
>>> future, we'll need to pass back more info to the kernel for this, so
>>> the usermode helper callout wasn't suitable.
>>>
>>
>>
>> I have answered that in my mail. Repeated here again. Well you made
>> a simple mistake. Because it is *easy* to pass back any number and
>> size of information from user-mode.
>>
>> You just setup a sysfs entry points where the answers are written
>> back to. It's an easy trick to setup a thread safe, way with a
>> cookie but 90% of the time you don't have to. Say you set up
>> a structure of per-client (identified uniquely) then user mode
>> answers back per client, concurrency will not do any harm, since
>> you answer to the same question the same answer. ans so on. Each
>> problem it's own.
>>
>> If you want we can talk about this, it would be easy for me to setup
>> a toll free conference number we can all use.
>
> That helpful advice would have been welcome about 3-4 months ago when I
> first proposed this in detail. At that point you're working with
> multiple upcall/downcall mechanisms, which was something I was keen to
> avoid.
>
> I'm not opposed to moving in that direction, but it basically means
> you're going to rip out everything I've got here so far and replace it.
>
> If you're willing to do that work, I'll be happy to work with you on
> it, but I don't have the time or inclination to do that on my own right
> now.
>
No such luck. sorry. I wish I could, but coming from a competing server
company, you can imagine the priority of that ever happening.
(Even though I use the Linux-Server everyday for my development and
am putting lots of efforts into still, mainly in pnfs)
Hopefully re-examining the code, it could all be salvaged just the
same, only lots of code thrown a way.
But mean-while please address my concern below:
Boaz Harrosh wrote:
> One more thing, the most important one. We have already fixed that in the
> past and I was hoping the lesson was learned. Apparently it was not, and
> we are doomed to do this mistake for ever!!
>
> What ever crap fails times out and crashes, in the recovery code, we don't
> give a dam. It should never affect any Server-client communication.
>
> When the grace periods ends the clients gates opens period. *Any* error
> return from state recovery code must be carefully ignored and normal
> operations resumed. At most on error, we move into a mode where any
> recovery request from client is accepted, since we don't have any better
> data to verify it.
>
> Please comb recovery code to make sure any catastrophe is safely ignored.
> We already did that before and it used to work.
We should make sure that any state recovery code does not interfere with
regular operations. and fails gracefully / shuts up.
We used to have that, apparently it re-broke. Clients should always be granted
access, after grace period. And Server should be made sure not to fail in any
situation.
I would look into it but I'm not uptodate anymore, I wish you or Bruce could.
Thanks for your work so far, sorry to be bearer of bad news
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists