lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56718ad5-fdb3-4588-bab9-7b9a1879cad0@oracle.com>
Date: Thu, 9 Oct 2025 11:03:55 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: Jeff Layton <jlayton@...nel.org>, NeilBrown <neil@...wn.name>,
        Olga Kornievskaia <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>,
        Tom Talpey <tom@...pey.com>, Trond Myklebust <trondmy@...nel.org>,
        Anna Schumaker <anna@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
        David Howells <dhowells@...hat.com>
Cc: Brandon Adams <brandona@...a.com>, linux-nfs@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/2] sunrpc: add a slot to rqstp->rq_bvec for TCP
 record marker

On 10/9/25 10:40 AM, Jeff Layton wrote:
> We've seen some occurrences of messages like this in dmesg on some knfsd
> servers:
> 
>     xdr_buf_to_bvec: bio_vec array overflow
> 
> Usually followed by messages like this that indicate a short send (note
> that this message is from an older kernel and the amount that it reports
> attempting to send is short by 4 bytes):
> 
>     rpc-srv/tcp: nfsd: sent 1048155 when sending 1048152 bytes - shutting down socket
> 
> svc_tcp_sendmsg() steals a slot in the rq_bvec array for the TCP record
> marker. If the send is an unaligned READ call though, then there may not
> be enough slots in the rq_bvec array in some cases.
> 
> Add a rqstp->rq_bvec_len field and use that to keep track of the length
> of rq_bvec. Use that in place of rq_maxpages where it's iterating over
> the bvec.

Granted that the number of items in rq_pages and in rq_bvec don't have
to coincide, they just happen to be the same, historically. And, each
bvec in rq_bvec doesn't necessarily have to be a page.


> Fixes: e18e157bb5c8 ("SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call")
> Tested-by: Brandon Adams <brandona@...a.com>
> Signed-off-by: Jeff Layton <jlayton@...nel.org>
> ---
>  fs/nfsd/vfs.c              | 6 +++---
>  include/linux/sunrpc/svc.h | 1 +
>  net/sunrpc/svc.c           | 4 +++-
>  net/sunrpc/svcsock.c       | 4 ++--
>  4 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 77f6879c2e063fa79865100bbc2d1e64eb332f42..6c7224570d2dadae21876e0069e0b2e0551af0fa 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1111,7 +1111,7 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  	v = 0;
>  	total = dio_end - dio_start;
> -	while (total && v < rqstp->rq_maxpages &&
> +	while (total && v < rqstp->rq_bvec_len &&
>  	       rqstp->rq_next_page < rqstp->rq_page_end) {
>  		len = min_t(size_t, total, PAGE_SIZE);
>  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> @@ -1200,7 +1200,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  	v = 0;
>  	total = *count;
> -	while (total && v < rqstp->rq_maxpages &&
> +	while (total && v < rqstp->rq_bvec_len &&
>  	       rqstp->rq_next_page < rqstp->rq_page_end) {
>  		len = min_t(size_t, total, PAGE_SIZE - base);
>  		bvec_set_page(&rqstp->rq_bvec[v], *rqstp->rq_next_page,
> @@ -1318,7 +1318,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	if (stable && !fhp->fh_use_wgather)
>  		kiocb.ki_flags |= IOCB_DSYNC;
>  
> -	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> +	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, payload);
>  	iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
>  	since = READ_ONCE(file->f_wb_err);
>  	if (verf)
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 5506d20857c318774cd223272d4b0022cc19ffb8..0ee1f411860e55d5e0131c29766540f673193d5f 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -206,6 +206,7 @@ struct svc_rqst {
>  
>  	struct folio_batch	rq_fbatch;
>  	struct bio_vec		*rq_bvec;
> +	u32			rq_bvec_len;
>  
>  	__be32			rq_xid;		/* transmission id */
>  	u32			rq_prog;	/* program number */
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 4704dce7284eccc9e2bc64cf22947666facfa86a..a6bdd83fba77b13f973da66a1bac00050ae922fe 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -706,7 +706,9 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
>  	if (!svc_init_buffer(rqstp, serv, node))
>  		goto out_enomem;
>  
> -	rqstp->rq_bvec = kcalloc_node(rqstp->rq_maxpages,
> +	/* +1 for the TCP record marker */
> +	rqstp->rq_bvec_len = rqstp->rq_maxpages + 1;

What bugs me about this is that svc_prepare_thread() shouldn't have
specific knowledge about the needs of transports. But I don't have a
better idea...


> +	rqstp->rq_bvec = kcalloc_node(rqstp->rq_bvec_len,
>  				      sizeof(struct bio_vec),
>  				      GFP_KERNEL, node);
>  	if (!rqstp->rq_bvec)
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 377fcaaaa061463fc5c85fc09c7a8eab5e06af77..2075ddec250b3fdb36becca4a53f1c0536f8634a 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -740,7 +740,7 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
>  	if (svc_xprt_is_dead(xprt))
>  		goto out_notconn;
>  
> -	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, xdr);
> +	count = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_bvec_len, xdr);
>  
>  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
>  		      count, rqstp->rq_res.len);
> @@ -1244,7 +1244,7 @@ static int svc_tcp_sendmsg(struct svc_sock *svsk, struct svc_rqst *rqstp,
>  	memcpy(buf, &marker, sizeof(marker));
>  	bvec_set_virt(rqstp->rq_bvec, buf, sizeof(marker));
>  
> -	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_maxpages - 1,
> +	count = xdr_buf_to_bvec(rqstp->rq_bvec + 1, rqstp->rq_bvec_len - 1,
>  				&rqstp->rq_res);
>  
>  	iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
> 


-- 
Chuck Lever

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ