[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whgpCNzmQfTAUY7D8P6t9TgzoLx9Uauu7YGQpgZtg-SYg@mail.gmail.com>
Date: Fri, 2 Jun 2023 12:53:42 -0400
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: David Howells <dhowells@...hat.com>, netdev@...r.kernel.org,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
David Ahern <dsahern@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Chuck Lever <chuck.lever@...cle.com>,
Boris Pismenny <borisp@...dia.com>,
John Fastabend <john.fastabend@...il.com>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: Bug in short splice to socket?
On Fri, Jun 2, 2023 at 12:39 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> Can we add an optional splice_end / short_splice / splice_underflow /
> splice_I_did_not_mean_to_set_more_on_the_previous_call_sorry callback
> to struct file_operations?
A splice_end() operation might well be the simplest model, but I think
it's broken.
It would certainly be easy to implement: file descriptor that doesn't
care about SPLICE_F_MORE - so most of them - would just leave it as
NULL, and the splice code could decide to call it *if* it had left the
last splice with SPLICE_F_MORE, _and_ the user hadn't set it, and the
file descriptor wants that information.
But I think one of the problems here is one of "what the hell is the
meaning of that bit"?
In particular, think about what happens if a signal is pending, and we
return with a partially completed write? There potentially *is* more
data to be sent, it's just not sent by *this* splice() call, as user
space has to handle the signal first.
What is the semantics of SPLICE_F_MORE in that kind of situation?
Which is why I really think that it would be *so* much better if we
really let the whole SPLICE_F_MORE bit be a signal from the *input*
side.
I know I've been harping on this, but just from a "sane semantics"
standpoint, I really think the only thing that *really* makes sense is
for the input side of a splice to say "I gave you X amount of data,
but I have more to give".
And that would *literally* be the semantic meaning of that SPLICE_F_MORE bit.
Wouldn't it be lovely to have some actual documented meaning to it,
which does *not* depend on things like ".. but what if a signal
happens" issues?
And yes, it's entirely possible that I'm missing something, and I'm
misunderstanding what people really want, but I do feel like this is a
somewhat subtle area, and if people really care about the exact
semantics of SPLICE_F_MORE, then we need to *have* exact semantics for
it.
And no, I don't think "splice_end()" can be that exact semantics -
even if it's simple - exactly because splice() is an interruptible
operation, so the "end" of a splice() is simply not a stable thing.
I also do wonder how much we care. What are the situations where the
packet boundaries can really matter in actual real world. Exactly
because I'm not 100% convinced we've had super-stable behavior here.
The fact that a test-case never triggers signal handling in the middle
of a splice() call isn't exactly a huge surprise. The test case
probably doesn't *have* signals. But it just means that the test-case
isn't all that real-life.
Linus
Powered by blists - more mailing lists