lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <952877.1685694220@warthog.procyon.org.uk>
Date: Fri, 02 Jun 2023 09:23:40 +0100
From: David Howells <dhowells@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: dhowells@...hat.com, Linus Torvalds <torvalds@...ux-foundation.org>,
    netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
    Willem de Bruijn <willemdebruijn.kernel@...il.com>,
    David Ahern <dsahern@...nel.org>,
    Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>,
    linux-mm@...ck.org, linux-kernel@...r.kernel.org,
    Chuck Lever <chuck.lever@...cle.com>,
    Boris Pismenny <borisp@...dia.com>,
    John Fastabend <john.fastabend@...il.com>,
    Christoph Hellwig <hch@...radead.org>
Subject: Re: Bug in short splice to socket?

Jakub Kicinski <kuba@...nel.org> wrote:

> On Thu, 01 Jun 2023 18:14:40 +0100 David Howells wrote:
> > The answer then might be to make TLS handle a zero-length send()
> 
> IDK. Eric added MSG_SENDPAGE_NOTLAST 11 years ago, to work around 
> this exact problem. Your refactoring happens to break it and what
> you're saying sounds to me more or less like "MSG_SENDPAGE_NOTLAST 
> is unnecessary, it's user's fault".
> 
> A bit unconvincing. Maybe Eric would chime in, I'm not too familiar
> with the deadly mess of the unchecked sendmsg()/sendpage() flags.

Not so much the "user's fault" as we couldn't fulfill what the user asked - so
should we leave it to the user to work out how to clean it up rather than
automatically allowing the socket to flush (if cancellation might be an option
instead)?

The problem I have with NOTLAST is that it won't be asserted if the short read
only occupies a single pipe_buf.  We don't know that we won't get some more
data on the next pass.

An alternative way to maintain the current behaviour might be to have
splice_direct_to_actor() call the actor with sd->total_len == 0 if
do_splice_to() returned 0 and SPLICE_F_MORE wasn't set by the caller
(ie. !more).  Attached is a change to do that.  It becomes simpler if/once my
splice_to_socket() patches are applied - but I don't really want to push that
until all the protocols that support sendpage() support sendmsg() +
MSG_SPLICE_PAGES as well[*].

[*] Though if you're okay having a small window where TLS copies data rather
    than splicing, I could push the splice_to_socket() patches *first*.  TCP
    and AF_UNIX splice already support MSG_SPLICE_PAGES so that would bump
    their efficiency.

David
---
diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..84e9ca06db47 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -643,6 +643,22 @@ static void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_des
 		wakeup_pipe_writers(pipe);
 }
 
+/*
+ * Pass a zero-length record to the splice-write actor with SPLICE_F_MORE
+ * turned off to allow the network to see MSG_MORE deasserted.
+ */
+static ssize_t splice_from_pipe_zero(struct pipe_inode_info *pipe,
+				     struct splice_desc *sd,
+				     splice_actor *actor)
+{
+	struct pipe_buffer buf = {
+		.page	= ZERO_PAGE(0),
+		.ops	= &nosteal_pipe_buf_ops,
+	};
+
+	return actor(pipe, &buf, sd);
+}
+
 /**
  * __splice_from_pipe - splice data from a pipe to given actor
  * @pipe:	pipe to splice from
@@ -662,6 +678,9 @@ ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
 	int ret;
 
 	splice_from_pipe_begin(sd);
+	if (!sd->total_len)
+		return splice_from_pipe_zero(pipe, sd, actor);
+
 	do {
 		cond_resched();
 		ret = splice_from_pipe_next(pipe, sd);
@@ -956,13 +975,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 	 */
 	bytes = 0;
 	len = sd->total_len;
+
+	/* Don't block on output, we have to drain the direct pipe. */
 	flags = sd->flags;
+	sd->flags &= ~SPLICE_F_NONBLOCK;
 
 	/*
-	 * Don't block on output, we have to drain the direct pipe.
+	 * We signal MORE until we've read sufficient data to fulfill the
+	 * request and we keep signalling it if the caller set it.
 	 */
-	sd->flags &= ~SPLICE_F_NONBLOCK;
 	more = sd->flags & SPLICE_F_MORE;
+	sd->flags |= SPLICE_F_MORE;
 
 	WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail));
 
@@ -971,21 +994,19 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 		loff_t pos = sd->pos, prev_pos = pos;
 
 		ret = do_splice_to(in, &pos, pipe, len, flags);
-		if (unlikely(ret <= 0))
+		if (unlikely(ret < 0))
 			goto out_release;
 
 		read_len = ret;
 		sd->total_len = read_len;
 
 		/*
-		 * If more data is pending, set SPLICE_F_MORE
-		 * If this is the last data and SPLICE_F_MORE was not set
-		 * initially, clears it.
+		 * If we now have sufficient data to fulfill the request then
+		 * we clear SPLICE_F_MORE if it was not set initially.
 		 */
-		if (read_len < len)
-			sd->flags |= SPLICE_F_MORE;
-		else if (!more)
+		if ((read_len == 0 || read_len >= len) && !more)
 			sd->flags &= ~SPLICE_F_MORE;
+
 		/*
 		 * NOTE: nonblocking mode only applies to the input. We
 		 * must not do the output in nonblocking mode as then we
@@ -1005,6 +1026,8 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
 			sd->pos = prev_pos + ret;
 			goto out_release;
 		}
+		if (read_len < 0)
+			goto out_release;
 	}
 
 done:
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f63e4405cf34..5d48391da16c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -995,6 +995,9 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
 		}
 	}
 
+	if (!msg_data_left(msg) && eor)
+		goto copied;
+
 	while (msg_data_left(msg)) {
 		if (sk->sk_err) {
 			ret = -sk->sk_err;


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ