linux-kernel - Re: Bug in short splice to socket?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=whgpCNzmQfTAUY7D8P6t9TgzoLx9Uauu7YGQpgZtg-SYg@mail.gmail.com>
Date:   Fri, 2 Jun 2023 12:53:42 -0400
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     David Howells <dhowells@...hat.com>, netdev@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        David Ahern <dsahern@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Chuck Lever <chuck.lever@...cle.com>,
        Boris Pismenny <borisp@...dia.com>,
        John Fastabend <john.fastabend@...il.com>,
        Christoph Hellwig <hch@...radead.org>
Subject: Re: Bug in short splice to socket?

On Fri, Jun 2, 2023 at 12:39 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> Can we add an optional splice_end / short_splice / splice_underflow /
> splice_I_did_not_mean_to_set_more_on_the_previous_call_sorry callback
> to struct file_operations?

A splice_end() operation might well be the simplest model, but I think
it's broken.

It would certainly be easy to implement: file descriptor that doesn't
care about SPLICE_F_MORE - so most of them - would just leave it as
NULL, and the splice code could decide to call it *if* it had left the
last splice with SPLICE_F_MORE, _and_ the user hadn't set it, and the
file descriptor wants that information.

But I think one of the problems here is one of "what the hell is the
meaning of that bit"?

In particular, think about what happens if a signal is pending, and we
return with a partially completed write? There potentially *is* more
data to be sent, it's just not sent by *this* splice() call, as user
space has to handle the signal first.

What is the semantics of SPLICE_F_MORE in that kind of situation?

Which is why I really think that it would be *so* much better if we
really let the whole SPLICE_F_MORE bit be a signal from the *input*
side.

I know I've been harping on this, but just from a "sane semantics"
standpoint, I really think the only thing that *really* makes sense is
for the input side of a splice to say "I gave you X amount of data,
but I have more to give".

And that would *literally* be the semantic meaning of that SPLICE_F_MORE bit.

Wouldn't it be lovely to have some actual documented meaning to it,
which does *not* depend on things like ".. but what if a signal
happens" issues?

And yes, it's entirely possible that I'm missing something, and I'm
misunderstanding what people really want, but I do feel like this is a
somewhat subtle area, and if people really care about the exact
semantics of SPLICE_F_MORE, then we need to *have* exact semantics for
it.

And no, I don't think "splice_end()" can be that exact semantics -
even if it's simple - exactly because splice() is an interruptible
operation, so the "end" of a splice() is simply not a stable thing.

I also do wonder how much we care. What are the situations where the
packet boundaries can really matter in actual real world. Exactly
because I'm not 100% convinced we've had super-stable behavior here.

The fact that a test-case never triggers signal handling in the middle
of a splice() call isn't exactly a huge surprise. The test case
probably doesn't *have* signals. But it just means that the test-case
isn't all that real-life.

              Linus