[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <909595.1685639680@warthog.procyon.org.uk>
Date: Thu, 01 Jun 2023 18:14:40 +0100
From: David Howells <dhowells@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: dhowells@...hat.com, Jakub Kicinski <kuba@...nel.org>,
netdev@...r.kernel.org, "David S.
Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>,
Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Chuck Lever <chuck.lever@...cle.com>,
Boris Pismenny <borisp@...dia.com>,
John Fastabend <john.fastabend@...il.com>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: Bug in short splice to socket?
Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > However, this might well cause a malfunction in UDP, for example.
> > MSG_MORE corks the current packet, so if I ask sendfile() say shove 32K
> > into a packet, if, say, 16K is read from the source and entirely
> > transcribed into the packet,
>
> If you use splice() for UDP, I don't think you would normally expect
> to get all that well-defined packet boundaries.
Actually, it will. Attempting to overfill a UDP packet with splice will get
you -EMSGSIZE. It won't turn a splice into more than one UDP packet.
I wonder if the right solution actually is to declare that the problem is
userspace's. If you ask it to splice Z amount of data and it can't manage
that because the source dries up prematurely, then make it so that you assume
it always passed MSG_MORE and returns a short splice to userspace. Userspace
can retry the splice/sendfile or do an empty sendmsg() to cap the message (or
cancel it). Perhaps flushing a short message is actually a *bad* idea.
The answer then might be to make TLS handle a zero-length send() and fix the
test cases. Would the attached changes then work for you?
David
---
diff --git a/fs/splice.c b/fs/splice.c
index 3e06611d19ae..237688b0700b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -956,13 +956,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
*/
bytes = 0;
len = sd->total_len;
+
+ /* Don't block on output, we have to drain the direct pipe. */
flags = sd->flags;
+ sd->flags &= ~SPLICE_F_NONBLOCK;
/*
- * Don't block on output, we have to drain the direct pipe.
+ * We signal MORE until we've read sufficient data to fulfill the
+ * request and we keep signalling it if the caller set it.
*/
- sd->flags &= ~SPLICE_F_NONBLOCK;
more = sd->flags & SPLICE_F_MORE;
+ sd->flags |= SPLICE_F_MORE;
WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail));
@@ -978,14 +982,12 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
sd->total_len = read_len;
/*
- * If more data is pending, set SPLICE_F_MORE
- * If this is the last data and SPLICE_F_MORE was not set
- * initially, clears it.
+ * If we now have sufficient data to fulfill the request then
+ * we clear SPLICE_F_MORE if it was not set initially.
*/
- if (read_len < len)
- sd->flags |= SPLICE_F_MORE;
- else if (!more)
+ if (read_len >= len && !more)
sd->flags &= ~SPLICE_F_MORE;
+
/*
* NOTE: nonblocking mode only applies to the input. We
* must not do the output in nonblocking mode as then we
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f63e4405cf34..5d48391da16c 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -995,6 +995,9 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
}
}
+ if (!msg_data_left(msg) && eor)
+ goto copied;
+
while (msg_data_left(msg)) {
if (sk->sk_err) {
ret = -sk->sk_err;
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index e699548d4247..7df31583f2a4 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -377,7 +377,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
char buf[TLS_PAYLOAD_MAX_LEN];
uint16_t test_payload_size;
int size = 0;
- int ret;
+ int ret = 0;
char filename[] = "/tmp/mytemp.XXXXXX";
int fd = mkstemp(filename);
off_t offset = 0;
@@ -398,6 +398,9 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
size -= ret;
}
+ if (ret < chunk_size)
+ EXPECT_EQ(send(self->fd, "", 0, 0), 0);
+
EXPECT_EQ(recv(self->cfd, buf, test_payload_size, MSG_WAITALL),
test_payload_size);
Powered by blists - more mailing lists