netdev - Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wjixHw6n_R5TQWW1r0a+GgFAPGw21KMj6obkzr3qXXbYA@mail.gmail.com>
Date: Thu, 29 Jun 2023 11:53:13 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Matthew Wilcox <willy@...radead.org>
Cc: Matt Whitlock <kernel@...twhitlock.name>, David Howells <dhowells@...hat.com>, 
	netdev@...r.kernel.org, Dave Chinner <david@...morbit.com>, Jens Axboe <axboe@...nel.dk>, 
	linux-fsdevel@...ck.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe

On Thu, 29 Jun 2023 at 11:34, Matthew Wilcox <willy@...radead.org> wrote:
>
> I think David muddied the waters by talking about vmsplice().  The
> problem encountered is with splice() from the page cache.  Reading
> the documentation,
>
>        splice()  moves  data  between two file descriptors without copying be‐
>        tween kernel address space and user address space.  It transfers up  to
>        len bytes of data from the file descriptor fd_in to the file descriptor
>        fd_out, where one of the file descriptors must refer to a pipe.

Well, the original intent really always was that it's about zero-copy.

So I do think that the answer to your test-program is that yes, it
really traditionally *should* output "new".

A splice from a file acts like a scatter-gather mmap() in the kernel.
It's the original intent, and it's the whole reason it's noticeably
faster than doing a write.

Now, do I then agree that splice() has turned out to be a nasty morass
of problems?  Yes.

And I even agree that while I actually *think* that your test program
should output "new" (because that is the whole point of the exercise),
it also means that people who use splice() need to *understand* that,
and it's much too easy to get things wrong if you don't understand
that the whole point of splice is to act as a kind of ad-hoc in-kernel
mmap thing.

And to make matters worse, for mmap() we actually do have some
coherency helpers. For splice(), the page ref stays around.

It's kind of like GUP and page pinning - another area where we have
had lots of problems and lots of nasty semantics and complications
with other VM operations over the years.

So I really *really* don't want to complicate splice() even more to
give it some new semantics that it has never ever really had, because
people didn't understand it and used it wrong.

Quite the reverse. I'd be willing to *simplify* splice() by just
saying "it was all a mistake", and just turning it into wrappers
around read/write. But those patches would have to be radical
simplifications, not adding yet more crud on top of the pain that is
splice().

Because it will hurt performance. And I'm ok with that as long as it
comes with huge simplifications. What I'm *not* ok with is "I mis-used
splice, now I want splice to act differently, so let's make it even
more complicated".

               Linus