linux-kernel - Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a976311c-e980-4b29-83ad-b625c7526f4b@talpey.com>
Date: Sun, 9 Feb 2025 13:52:59 -0500
From: Tom Talpey <tom@...pey.com>
To: Jeff Layton <jlayton@...nel.org>, Chuck Lever <chuck.lever@...cle.com>,
 Neil Brown <neilb@...e.de>, Olga Kornievskaia <okorniev@...hat.com>,
 Dai Ngo <Dai.Ngo@...cle.com>, "J. Bruce Fields" <bfields@...ldses.org>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED
 error better

On 2/9/2025 12:05 PM, Jeff Layton wrote:
> On Sun, 2025-02-09 at 11:58 -0500, Tom Talpey wrote:
>> On 2/9/2025 11:51 AM, Jeff Layton wrote:
>>> On Sun, 2025-02-09 at 11:26 -0500, Tom Talpey wrote:
>>>> On 2/8/2025 9:14 PM, Jeff Layton wrote:
>>>>> On Sat, 2025-02-08 at 20:24 -0500, Tom Talpey wrote:
>>>>>> On 2/8/2025 4:07 PM, Chuck Lever wrote:
>>>>>>> On 2/8/25 3:45 PM, Jeff Layton wrote:
>>>>>>>> On Sat, 2025-02-08 at 14:18 -0500, Tom Talpey wrote:
>>>>>>>>> On 2/8/2025 11:08 AM, Jeff Layton wrote:
>>>>>>>>>> On Sat, 2025-02-08 at 13:40 -0500, Tom Talpey wrote:
>>>>>>>>>>> On 2/8/2025 10:02 AM, Jeff Layton wrote:
>>>>>>>>>>>> On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
>>>>>>>>>>>>> On 2/7/25 4:53 PM, Jeff Layton wrote:
>>>>>>>>>>>>>> For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
>>>>>>>>>>>>>> fall back to treating it like a BADSLOT if that fails.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Jeff Layton <jlayton@...nel.org>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>        fs/nfsd/nfs4callback.c | 16 ++++++++++------
>>>>>>>>>>>>>>        1 file changed, 10 insertions(+), 6 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
>>>>>>>>>>>>>> index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
>>>>>>>>>>>>>> --- a/fs/nfsd/nfs4callback.c
>>>>>>>>>>>>>> +++ b/fs/nfsd/nfs4callback.c
>>>>>>>>>>>>>> @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
>>>>>>>>>>>>>>        			goto requeue;
>>>>>>>>>>>>>>        		rpc_delay(task, 2 * HZ);
>>>>>>>>>>>>>>        		return false;
>>>>>>>>>>>>>> +	case -NFS4ERR_SEQ_MISORDERED:
>>>>>>>>>>>>>> +		/*
>>>>>>>>>>>>>> +		 * Reattempt once with seq_nr 1. If that fails, treat this
>>>>>>>>>>>>>> +		 * like BADSLOT.
>>>>>>>>>>>>>> +		 */
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nit: this comment says exactly what the code says. If it were me, I'd
>>>>>>>>>>>>> remove it. Is there a "why" statement that could be made here? Like,
>>>>>>>>>>>>> why retry with a seq_nr of 1 instead of just failing immediately?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> There isn't one that I know of. It looks like Kinglong Mee added it in
>>>>>>>>>>>> 7ba6cad6c88f, but there is no real mention of that in the changelog.
>>>>>>>>>>>>
>>>>>>>>>>>> TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
>>>>>>>>>>>> when we got this error, and we then retry with a seq_nr of 1? Does the
>>>>>>>>>>>> server then treat that as a retransmission?
>>>>>>>>>>>
>>>>>>>>>>> So I assume you mean the requester sent seq_nr 1, saw a reply and sent a
>>>>>>>>>>> subsequent seq_nr 2, to which it gets SEQ_MISORDERED.
>>>>>>>>>>>
>>>>>>>>>>> If so, yes definitely backing up the seq_nr to 1 will result in the
>>>>>>>>>>> peer considering it to be a retransmission, which will be bad.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes, that's what I meant.
>>>>>>>>>>
>>>>>>>>>>>> We might be best off
>>>>>>>>>>>> dropping this and just always treating it like BADSLOT.
>>>>>>>>>>>
>>>>>>>>>>> But, why would this happen? Usually I'd think the peer sent seq_nr X
>>>>>>>>>>> before it received a reply to seq_nr X-1, which would be a peer bug.
>>>>>>>>>>>
>>>>>>>>>>> OTOH, SEQ_MISORDERED is a valid response to an in-progress retry. So,
>>>>>>>>>>> how does the requester know the difference?
>>>>>>>>>>>
>>>>>>>>>>> If treating it as BADSLOT completely resets the sequence, then sure,
>>>>>>>>>>> but either a) the request is still in-progress, or b) if a bug is
>>>>>>>>>>> causing the situation, well it's not going to converge on a functional
>>>>>>>>>>> session.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> With this patchset, on BADSLOT, we'll set SEQ4_STATUS_BACKCHANNEL_FAULT
>>>>>>>>>> in the next forechannel SEQUENCE on the session. That should cause the
>>>>>>>>>> client to (eventually) send a DESTROY_SESSION and create a new one.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, in the meantime, because of the way the callback channel
>>>>>>>>>> update works, the server can end up trying to send the callback again
>>>>>>>>>> on the same session (and maybe more than once). I'm not sure that
>>>>>>>>>> that's a real problem per-se, but it's less than ideal.
>>>>>>>>>>
>>>>>>>>>>> Not sure I have a solid suggestion right now. Whatever the fix, it
>>>>>>>>>>> should capture any subtlety in a comment.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> At this point, I'm leaning toward just treating it like BADSLOT.
>>>>>>>>>> Basically, mark the backchannel faulty, and leak the slot so that
>>>>>>>>>> nothing else uses it. That allows us to send backchannel requests on
>>>>>>>>>> the other slots until the session gets recreated.
>>>>>>>>>
>>>>>>>>> Hmm, leaking the slot is a workable approach, as long as it doesn't
>>>>>>>>> cascade more than a time or two. Some sort of trigger should be armed
>>>>>>>>> to prevent runaway retries.
>>>>>>>>>
>>>>>>>>> It's maybe worth considering what state the peer might be in when this
>>>>>>>>> happens. It too may effectively leak a slot, and if is retaining some
>>>>>>>>> bogus state either as a result of or because of the previous exchange(s)
>>>>>>>>> then this may lead to future hangs/failures. Not pretty, and maybe not
>>>>>>>>> worth trying to guess.
>>>>>>>>>
>>>>>>>>> Tom.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The idea here is that eventually the client should figure out that
>>>>>>>> something is wrong and reestablish the session. Currently we don't
>>>>>>>> limit the number of retries on a callback.
>>>>>>>>
>>>>>>>> Maybe they should time out after a while? If we've retried a callback
>>>>>>>> for more than two lease periods, give up and log something?
>>>>>>>>
>>>>>>>> Either way, I'd consider that to be follow-on work to this set.
>>>>>>>
>>>>>>> As a general comment, I think making a heroic effort to recover in any
>>>>>>> of these cases is probably not worth the additional complexity. Where it
>>>>>>> is required or where we believe it is worth the trouble, that's where we
>>>>>>> want a detailed comment.
>>>>>>>
>>>>>>> What we want to do is ensure forward progress. I'm guessing that error
>>>>>>> conditions are going to be rare, so leaking the slot until a certain
>>>>>>> portion of them are gone, and then indicating a session fault to force
>>>>>>> the client to start over from scratch, is probably the most
>>>>>>> straightforward approach.
>>>>>>>
>>>>>>> So, is there a good reason to retry? There doesn't appear to be any
>>>>>>> reasoning mentioned in the commit log or in nearby comments.
>>>>>>
>>>>>> Agreed on the general comment.
>>>>>>
>>>>>> As for the "any reason to retry" - maybe. If it's a transient error we
>>>>>> don't want to give up early. Unfortunately that appears to be an
>>>>>> ambiguous situation, because SEQ_MISORDERED is allowed in place of
>>>>>> ERR_DELAY. I don't have any great suggestion however.
>>>>>>
>>>>>
>>>>> IMO, we should retry callbacks (basically) indefinitely, unless the
>>>>> NFSv4 client is being torn down (i.e. lease expires or an unmount
>>>>> happened, etc).
>>>>>
>>>>>> Jeff, to your point that the "client should figure out something is
>>>>>> wrong", I'm not sure how you think that will happen. If the server is
>>>>>> making a delegation recall and the client receive code chooses to reject
>>>>>> it at the sequence check, how would that eventually cause the client to
>>>>>> reestablish the session (on the forechannel)?
>>>>>>
>>>>>>
>>>>>
>>>>> In the BADSLOT case, it calls nfsd4_mark_cb_fault(cb->cb_clp), which
>>>>> sets a flag in the client that makes it set
>>>>> SEQ4_STATUS_BACKCHANNEL_FAULT in the next SEQUENCE call.
>>>>
>>>> Aha, that's good. RFC8881 only mentions it twice, but it's normative:
>>>>
>>>> SEQ4_STATUS_BACKCHANNEL_FAULT
>>>>        The server has encountered an unrecoverable fault with the
>>>>        backchannel (e.g., it has lost track of the sequence ID for a slot
>>>>        in the backchannel). The client MUST stop sending more requests on
>>>>        the session's fore channel, wait for all outstanding requests to
>>>>        complete on the fore and back channel, and then destroy the session.
>>>>
>>>> I guess my question is, what if the client ignores it anyway? What
>>>> server code actually forces the recovery?
>>>>
>>>> Tom.
>>>>
>>>
>>> I don't think there is anything that does this right now. Does the RFC
>>> mention what the server should do if that happens? I suppose the server
>>> could just unilaterally destroy the session at some point, and force
>>> the client to reestablish it.
>>
>> Nope. :( Like many other requirements, it's unenforced normatively.
>>
>> Perhaps we could consider this as an erratum, but it's more of an
>> omission. Because of that, it may need IETF discussion ("at some point"
>> needs a MUST). I'll volunteer to open an issue, if you agree to discuss!
> 
> Sure. I don't have a whole lot to add, but my proposal would be:
> 
> If the server sends a SEQUENCE reply with SEQ4_STATUS_BACKCHANNEL_FAULT
> set, then the client has another lease period in which to reestablish
> the session. After that, the server may unilaterally drop the session
> and start returning NFS4ERR_BADSESSION to attempts to use it.
Oh, I don't think it needs to wait one lease period, but that's an
interesting approach. The language needs to be normative and a bit
more clear.

I'll take it to the IETF this week.

Tom.