[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3e4d14075482489cd010e4ea621c0bd368700e27.camel@kernel.org>
Date: Sat, 08 Feb 2025 13:02:11 -0500
From: Jeff Layton <jlayton@...nel.org>
To: Chuck Lever <chuck.lever@...cle.com>, Neil Brown <neilb@...e.de>, Olga
Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>, Tom
Talpey <tom@...pey.com>, "J. Bruce Fields" <bfields@...ldses.org>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 6/7] nfsd: handle CB_SEQUENCE NFS4ERR_SEQ_MISORDERED
error better
On Sat, 2025-02-08 at 12:01 -0500, Chuck Lever wrote:
> On 2/7/25 4:53 PM, Jeff Layton wrote:
> > For NFS4ERR_SEQ_MISORDERED, do one attempt with a seqid of 1, and then
> > fall back to treating it like a BADSLOT if that fails.
> >
> > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > ---
> > fs/nfsd/nfs4callback.c | 16 ++++++++++------
> > 1 file changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 10067a34db3afff8d4e4383854ab9abd9767c2d6..d6e3e8bb2efabadda9f922318880e12e1cb2c23f 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1393,6 +1393,16 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > goto requeue;
> > rpc_delay(task, 2 * HZ);
> > return false;
> > + case -NFS4ERR_SEQ_MISORDERED:
> > + /*
> > + * Reattempt once with seq_nr 1. If that fails, treat this
> > + * like BADSLOT.
> > + */
>
> Nit: this comment says exactly what the code says. If it were me, I'd
> remove it. Is there a "why" statement that could be made here? Like,
> why retry with a seq_nr of 1 instead of just failing immediately?
>
There isn't one that I know of. It looks like Kinglong Mee added it in
7ba6cad6c88f, but there is no real mention of that in the changelog.
TBH, I'm not enamored with this remedy either. What if the seq_nr was 2
when we got this error, and we then retry with a seq_nr of 1? Does the
server then treat that as a retransmission? We might be best off
dropping this and just always treating it like BADSLOT.
Thoughts?
>
> > + if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > + session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > + goto retry_nowait;
> > + }
> > + fallthrough;
> > case -NFS4ERR_BADSLOT:
> > /*
> > * BADSLOT means that the client and server are out of sync
> > @@ -1403,12 +1413,6 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
> > nfsd4_mark_cb_fault(cb->cb_clp);
> > cb->cb_held_slot = -1;
> > goto retry_nowait;
> > - case -NFS4ERR_SEQ_MISORDERED:
> > - if (session->se_cb_seq_nr[cb->cb_held_slot] != 1) {
> > - session->se_cb_seq_nr[cb->cb_held_slot] = 1;
> > - goto retry_nowait;
> > - }
> > - break;
> > default:
> > nfsd4_mark_cb_fault(cb->cb_clp);
> > }
> >
>
>
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists