netdev - Re: Bug in short splice to socket?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wji_2UwFMkUYkygsYRek05NwaQkH-vA=yKQtQS9Js+urQ@mail.gmail.com>
Date: Thu, 1 Jun 2023 11:12:17 -0400
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Howells <dhowells@...hat.com>
Cc: Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Paolo Abeni <pabeni@...hat.com>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
	David Ahern <dsahern@...nel.org>, Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	Chuck Lever <chuck.lever@...cle.com>, Boris Pismenny <borisp@...dia.com>, 
	John Fastabend <john.fastabend@...il.com>, Christoph Hellwig <hch@...radead.org>
Subject: Re: Bug in short splice to socket?

On Thu, Jun 1, 2023 at 10:34 AM David Howells <dhowells@...hat.com> wrote:
>
> At the moment, it transcribes 16 pages at a time.  I could make it set
> MSG_MORE only if (a) SPLICE_F_MORE was passed into the splice() syscall or (b)
> there's yet more data in the buffer.

That would at least be a good first step.

> However, this might well cause a malfunction in UDP, for example.  MSG_MORE
> corks the current packet, so if I ask sendfile() say shove 32K into a packet,
> if, say, 16K is read from the source and entirely transcribed into the packet,

If you use splice() for UDP, I don't think you would normally expect
to get all that well-defined packet boundaries.

That said, I think *this* part of splice_direct_to_actor() is correct:

                if (read_len < len)
                        sd->flags |= SPLICE_F_MORE;    <- WRONG
                else if (!more)
                        sd->flags &= ~SPLICE_F_MORE; <- CORRECT

ie if we've used up all of the 'len' argument, *and* 'more' wasn't set
in the incoming flags, then at that point we should clear
SPLICE_F_MORE.

So that means that UDP packets boundaries will be honored at the
'splice()' system call 'len' argument.

Obviously packet boundaries might happen before that - ie depending on
what the packet size limits are.

But the "set SPLICE_F_MORE" bit is just wrong. The generic code simply
does not know enough to make that determination.

> if I understand what you're proposing, MSG_MORE wouldn't get set and the
> packet would be transmitted early.

No, I'm saying that MSG_MORE should be set depending on what the
splice *input* says.

If the splice input knows that it has more to give but stopped early
for whatever reason (typically that the splice pipe buffers filled up,
but that's not necessarily the *only* reason), then it should set
SPLICE_F_MORE.

But this is literally only something that the input side *can* know.

And as you mention, some input sides cannot know even that. Regular
files typically know if there is more data. Other dynamic sources may
simply not know. And if they know, they just shouldn't set
SPLICE_F_MORE.

Of course, SPLICE_F_MORE may then be set because the *user* passed in
that flag, but that's a completely separate issue. The user may pass
in that flag because the user wants maximally sized packets, and knows
that other things will be fed into the destination (not even
necessarily through splice) after the splice.

So you really have multiple different reasons why SPLICE_F_MORE might
get set, but that

                if (read_len < len)

is *not* a valid reason. And no, extending that logic with more random
logic is still not a valid reason.

            Linus