[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <597A1A30-E61B-484B-AB79-31AB83AE7A50@oracle.com>
Date: Mon, 1 Nov 2010 17:12:48 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Trond Myklebust <Trond.Myklebust@...app.com>
Cc: Nick Bowler <nbowler@...iptictech.com>,
LKML Kernel <linux-kernel@...r.kernel.org>,
"J. Bruce Fields" <bfields@...hat.com>,
Linux NFS Mailing List <linux-nfs@...r.kernel.org>
Subject: Re: Regression, bisected: sqlite locking failure on nfs
On Nov 1, 2010, at 4:35 PM, Trond Myklebust wrote:
> On Mon, 2010-11-01 at 15:55 -0400, Chuck Lever wrote:
>> On Nov 1, 2010, at 3:48 PM, Trond Myklebust wrote:
>>
>>> On Mon, 2010-11-01 at 15:45 -0400, Chuck Lever wrote:
>>>> On Nov 1, 2010, at 3:42 PM, Trond Myklebust wrote:
>>>>
>>>>> On Mon, 2010-11-01 at 15:22 -0400, Trond Myklebust wrote:
>>>>>> I suspect nlmclnt_lookup_host() is to blame. It appears to be the _only_
>>>>>> thing in the kernel that actually sets this 'srcaddr' field, and it sets
>>>>>> it to
>>>>>>
>>>>>> const struct sockaddr source = {
>>>>>> .sa_family = AF_UNSPEC,
>>>>>> };
>>>>>>
>>>>>> You triggered the bug by removing the line
>>>>>>
>>>>>> transport->srcaddr.ss_family = family;
>>>>>>
>>>>>> from xs_create_sock().
>>>>>>
>>>>>> Trond
>>>>>
>>>>> Does this fix the regression?
>>>>>
>>>>> Trond
>>>>>
>>>>> ----------------------------------------------------------------------------------------------
>>>>> NLM: Fix a regression in lockd
>>>>>
>>>>> From: Trond Myklebust <Trond.Myklebust@...app.com>
>>>>>
>>>>> Nick Bowler reports:
>>>>> There are no unusual messages on the client... but I just logged into
>>>>> the server and I see lots of messages of the following form:
>>>>>
>>>>> nfsd: request from insecure port (192.168.8.199:35766)!
>>>>> nfsd: request from insecure port (192.168.8.199:35766)!
>>>>> nfsd: request from insecure port (192.168.8.199:35766)!
>>>>> nfsd: request from insecure port (192.168.8.199:35766)!
>>>>> nfsd: request from insecure port (192.168.8.199:35766)!
>>>>>
>>>>> Bisected to commit 9247685088398cf21bcb513bd2832b4cd42516c4 (SUNRPC:
>>>>> Properly initialize sock_xprt.srcaddr in all cases)
>>>>>
>>>>> Apparently, removing the 'transport->srcaddr.ss_family = family' from
>>>>> xs_create_sock() triggers this due to nlmclnt_lookup_host() incorrectly
>>>>> initialising the srcaddr family to AF_UNSPEC.
>>>>>
>>>>> Reported-by: Nick Bowler <nbowler@...iptictech.com>
>>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@...app.com>
>>>>> ---
>>>>>
>>>>> fs/lockd/host.c | 5 -----
>>>>> 1 files changed, 0 insertions(+), 5 deletions(-)
>>>>>
>>>>>
>>>>> diff --git a/fs/lockd/host.c b/fs/lockd/host.c
>>>>> index 25e21e4..9ff0c0e 100644
>>>>> --- a/fs/lockd/host.c
>>>>> +++ b/fs/lockd/host.c
>>>>> @@ -238,9 +238,6 @@ struct nlm_host *nlmclnt_lookup_host(const struct sockaddr *sap,
>>>>> const char *hostname,
>>>>> int noresvport)
>>>>> {
>>>>> - const struct sockaddr source = {
>>>>> - .sa_family = AF_UNSPEC,
>>>>> - };
>>>>> struct nlm_lookup_host_info ni = {
>>>>> .server = 0,
>>>>> .sap = sap,
>>>>> @@ -249,8 +246,6 @@ struct nlm_host *nlmclnt_lookup_host(const struct sockaddr *sap,
>>>>> .version = version,
>>>>> .hostname = hostname,
>>>>> .hostname_len = strlen(hostname),
>>>>> - .src_sap = &source,
>>>>> - .src_len = sizeof(source),
>>>>> .noresvport = noresvport,
>>>>> };
>>>>>
>>>>>
>>>>
>>>>
>>>> What about that memcpy() in nlm_lookup_host()? With this patch, wouldn't it be copying garbage into the host's srcaddr field?
>>>>
>>>
>>> It shouldn't. ni->src_len is now zero.
>>
>> Yech. All this still assumes that ANYADDR is all zeroes, and that the memory this is going into is already initialized to zeroes. It's asking for trouble if we re-arrange all this someday.
>>
>> I've got an untested one line patch that should fix this for any upper layer caller. Posting now.
>>
>
> No! Upper layer callers should simply not be setting .saddress. Pretty
> much the only thing that _should_ be setting .saddress is the lockd
> callback, and possibly the nfsv4 server callback.
Then the take 2 patch description should be updated to reflect your actual intent. I don't think it's especially clear from your patch description why "nlmclnt_lookup_host() ... initialising the srcaddr family to AF_UNSPEC" is incorrect from an architectural standpoint. The specifics of "fixing bad behavior" are obvious already, but the architectural problem needs to be documented.
Otherwise I'm OK with the general idea of take 2, since it passes in a NULL pointer when no special bind address is desired, as other callers of rpc_create() do. Of course, take 2 has to pass Nick's test case too.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists