[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <641361cd8d704_33b0cc20823@willemb.c.googlers.com.notmuch>
Date:   Thu, 16 Mar 2023 14:37:01 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     David Howells <dhowells@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>
Cc:     David Howells <dhowells@...hat.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Christoph Hellwig <hch@...radead.org>,
        Jens Axboe <axboe@...nel.dk>, Jeff Layton <jlayton@...nel.org>,
        Christian Brauner <brauner@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        netdev@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: RE: [RFC PATCH 03/28] tcp: Support MSG_SPLICE_PAGES
David Howells wrote:
> Make TCP's sendmsg() support MSG_SPLICE_PAGES.  This causes pages to be
> spliced from the source iterator if possible (the iterator must be
> ITER_BVEC and the pages must be spliceable).
> 
> This allows ->sendpage() to be replaced by something that can handle
> multiple multipage folios in a single transaction.
> 
> Signed-off-by: David Howells <dhowells@...hat.com>
> cc: Eric Dumazet <edumazet@...gle.com>
> cc: "David S. Miller" <davem@...emloft.net>
> cc: Jakub Kicinski <kuba@...nel.org>
> cc: Paolo Abeni <pabeni@...hat.com>
> cc: Jens Axboe <axboe@...nel.dk>
> cc: Matthew Wilcox <willy@...radead.org>
> cc: netdev@...r.kernel.org
> ---
>  net/ipv4/tcp.c | 59 +++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 53 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 288693981b00..77c0c69208a5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1220,7 +1220,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>  	int flags, err, copied = 0;
>  	int mss_now = 0, size_goal, copied_syn = 0;
>  	int process_backlog = 0;
> -	bool zc = false;
> +	int zc = 0;
>  	long timeo;
>  
>  	flags = msg->msg_flags;
> @@ -1231,17 +1231,24 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>  		if (msg->msg_ubuf) {
>  			uarg = msg->msg_ubuf;
>  			net_zcopy_get(uarg);
> -			zc = sk->sk_route_caps & NETIF_F_SG;
> +			if (sk->sk_route_caps & NETIF_F_SG)
> +				zc = 1;
>  		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
>  			uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb));
>  			if (!uarg) {
>  				err = -ENOBUFS;
>  				goto out_err;
>  			}
> -			zc = sk->sk_route_caps & NETIF_F_SG;
> -			if (!zc)
> +			if (sk->sk_route_caps & NETIF_F_SG)
> +				zc = 1;
> +			else
>  				uarg_to_msgzc(uarg)->zerocopy = 0;
>  		}
> +	} else if (unlikely(flags & MSG_SPLICE_PAGES) && size) {
> +		if (!iov_iter_is_bvec(&msg->msg_iter))
> +			return -EINVAL;
> +		if (sk->sk_route_caps & NETIF_F_SG)
> +			zc = 2;
>  	}
The commit message mentions MSG_SPLICE_PAGES as an internal flag.
It can be passed from userspace. The code anticipates that and checks
preconditions.
A side effect is that legacy applications that may already be setting
this bit in the flags now start failing. Most socket types are
historically permissive and simply ignore undefined flags.
With MSG_ZEROCOPY we chose to be extra cautious and added
SOCK_ZEROCOPY, only testing the MSG_ZEROCOPY bit if this socket option
is explicitly enabled. Perhaps more cautious than necessary, but FYI.
Powered by blists - more mailing lists
 
