[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A862CFCD-76A2-4373-8F44-F156DB38E6A5@redhat.com>
Date: Sun, 08 Sep 2019 07:39:08 -0400
From: "Benjamin Coddington" <bcodding@...hat.com>
To: "Chuck Lever" <chuck.lever@...cle.com>
Cc: "Jason L Tibbitts III" <tibbs@...h.uh.edu>,
"Bruce Fields" <bfields@...ldses.org>,
"Wolfgang Walter" <linux@...m.de>,
"Linux NFS Mailing List" <linux-nfs@...r.kernel.org>,
km@...all.com, linux-kernel@...r.kernel.org
Subject: Re: Regression in 5.1.20: Reading long directory fails
On 6 Sep 2019, at 16:50, Chuck Lever wrote:
>> On Sep 6, 2019, at 4:47 PM, Jason L Tibbitts III <tibbs@...h.uh.edu>
>> wrote:
>>
>>>>>>> "JBF" == J Bruce Fields <bfields@...ldses.org> writes:
>>
>> JBF> Those readdir changes were client-side, right? Based on that
>> I'd
>> JBF> been assuming a client bug, but maybe it'd be worth getting a
>> full
>> JBF> packet capture of the readdir reply to make sure it's legit.
>>
>> I have been working with bcodding on IRC for the past couple of days
>> on
>> this. Fortunately I was able to come up with way to fill up a
>> directory
>> in such a way that it will fail with certainty and as a bonus doesn't
>> include any user data so I can feel OK about sharing packet captures.
>> I
>> have a capture alongside a kernel trace of the problematic operation
>> in
>> https://www.math.uh.edu/~tibbs/nfs/. Not that I can particularly
>> tell
>> anything useful from that, but bcodding says that it seems to point
>> to
>> some issue in sunrpc.
>>
>> And because I can easily reproduce this and I was able to do a
>> bisect:
>>
>> 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d is the first bad commit
>> commit 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d
>> Author: Chuck Lever <chuck.lever@...cle.com>
>> Date: Mon Feb 11 11:25:41 2019 -0500
>>
>> SUNRPC: Use au_rslack when computing reply buffer size
>>
>> au_rslack is significantly smaller than (au_cslack << 2). Using
>> that value results in smaller receive buffers. In some cases this
>> eliminates an extra segment in Reply chunks (RPC/RDMA).
>>
>> Signed-off-by: Chuck Lever <chuck.lever@...cle.com>
>> Signed-off-by: Anna Schumaker <Anna.Schumaker@...app.com>
>>
>> :040000 040000 d4d1ce2fbe0035c5bd9df976b8c448df85dcb505
>> 7011a792dfe72ff9cd70d66e45d353f3d7817e3e M net
>>
>> But of course, I can't say whether this is the actual bad commit or
>> whether it just introduced a behavior change which alters the
>> conditions
>> under which the problem appears.
>
> The first place I'd start looking is the XDR constants at the head of
> fs/nfs/nfs4xdr.c
> having to do with READDIR.
>
> The report of behavior changes with the use of krb5p also makes this
> commit plausible.
After sprinkling the printk's, we're coming up one word short in the
receive
buffer. I think we're not accounting for the xdr pad of buf->pages for
NFS4
readdir -- but I need to check the RFCs. Anyone know if v4 READDIR
results
have to be aligned?
Also need to check just why krb5i is the only auth that cares..
Ben
Powered by blists - more mailing lists