linux-kernel - Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250611A18503192e946d6.njha@janestreet.com>
Date: Wed, 11 Jun 2025 14:50:31 -0400
From: Nikhil Jha <njha@...estreet.com>
To: Chuck Lever <chuck.lever@...cle.com>
Cc: Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
 	Jeff Layton <jlayton@...nel.org>, Neil Brown <neilb@...e.de>,
 	Olga Kornievskaia <okorniev@...hat.com>,
 	Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
 	"David S. Miller" <davem@...emloft.net>,
 	Eric Dumazet <edumazet@...gle.com>,
 	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 	Simon Horman <horms@...nel.org>,
 	Steven Rostedt <rostedt@...dmis.org>,
 	Masami Hiramatsu <mhiramat@...nel.org>,
 	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 	linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
 	netdev@...r.kernel.org, linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant

On Thu, Mar 20, 2025 at 09:16:15AM -0400, Chuck Lever wrote:
> On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
> > When the client retransmits an operation (for example, because the
> > server is slow to respond), a new GSS sequence number is associated with
> > the XID. In the current kernel code the original sequence number is
> > discarded. Subsequently, if a response to the original request is
> > received there will be a GSS sequence number mismatch. A mismatch will
> > trigger another retransmit, possibly repeating the cycle, and after some
> > number of failed retries EACCES is returned.
> > 
> > RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
> > RPCSEC_GSS sequence number of each request it sends” and "compute the
> > checksum of each sequence number in the cache to try to match the
> > checksum in the reply's verifier." This is what FreeBSD’s implementation
> > does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
> > 
> > However, even with this cache, retransmits directly caused by a seqno
> > mismatch can still cause a bad message interleaving that results in this
> > bug. The RFC already suggests ignoring incorrect seqnos on the server
> > side, and this seems symmetric, so this patchset also applies that
> > behavior to the client.
> > 
> > These two patches are *not* dependent on each other. I tested them by
> > delaying packets with a Python script hooked up to NFQUEUE. If it would
> > be helpful I can send this script along as well.
> > 
> > Signed-off-by: Nikhil Jha <njha@...estreet.com>
> > ---
> > Changes since v1:
> >  * Maintain the invariant that the first seqno is always first in
> >    rq_seqnos, so that it doesn't need to be stored twice.
> >  * Minor formatting, and resending with proper mailing-list headers so the
> >    patches are easier to work with.
> > 
> > ---
> > Nikhil Jha (2):
> >       sunrpc: implement rfc2203 rpcsec_gss seqnum cache
> >       sunrpc: don't immediately retransmit on seqno miss
> > 
> >  include/linux/sunrpc/xprt.h    | 17 +++++++++++-
> >  include/trace/events/rpcgss.h  |  4 +--
> >  include/trace/events/sunrpc.h  |  2 +-
> >  net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
> >  net/sunrpc/clnt.c              |  9 +++++--
> >  net/sunrpc/xprt.c              |  3 ++-
> >  6 files changed, 64 insertions(+), 30 deletions(-)
> > ---
> > base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
> > change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
> > 
> > Best regards,
> 
> This seems like a sensible thing to do to me.
> 
> Acked-by: Chuck Lever <chuck.lever@...cle.com>
> 
> -- 
> Chuck Lever

Hi,

We've been running this patch for a while now and noticed a (very silly
in hindsight) bug.

maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);

needs to be

maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len);

Or the kernel gets stuck in a loop when you have more than two retries.
I can resend this patch but I noticed it's already made its way into
quite a few trees. Should this be a separate patch instead?

- Nikhil