linux-kernel - Re: [RFC PATCH 11/10] pipe: Add fsync() support [ver #2]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgzRU9RjkZG0L9_yrnFN69REkrSokTQOGZMUkvdispvuQ@mail.gmail.com>
Date:   Sat, 2 Nov 2019 16:02:13 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     David Howells <dhowells@...hat.com>,
        Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Nicolas Dichtel <nicolas.dichtel@...nd.com>, raven@...maw.net,
        Christian Brauner <christian@...uner.io>,
        keyrings@...r.kernel.org, linux-usb@...r.kernel.org,
        linux-block <linux-block@...r.kernel.org>,
        LSM List <linux-security-module@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 11/10] pipe: Add fsync() support [ver #2]

On Sat, Nov 2, 2019 at 3:30 PM Andy Lutomirski <luto@...capital.net> wrote:
>
> So you allocate memory, vmsplice, and munmap() without reusing it?

You can re-use it as much as you want. Just don't write to it.

So the traditional argument for this was "I do a caching http server".
If you don't ever load the data into user space at all and just push
file data out, you just use splice() from the file to the target. But
if you generate some of the data in memory, and you cache it, you use
vmsplice().

And then it really is very easy to set up: make sure you generate your
caches with a new clean private mmap, and you can throw them out with
munmap (or just over-mmap it with the new cache, of course).

If you don't cache it, then there's no advantage to vmsplice() - just
write() it and forget about it. The whole (and only) point of
vmsplice() is when you want to zero-copy the data, and that's
generally likely only an advantage if you can do it multiple times.

But I don't think anybody actually _did_ any of that. But that's
basically the argument for the three splice operations:
write/vmsplice/splice(). Which one you use depends on the lifetime and
the source of your data. write() is obviously for the copy case (the
source data might not be stable), while splice() is for the "data from
another source", and vmsplace() is "data is from stable data in my
vm".

There's the reverse op, of course, but we never implemented that:
mmap() on the pipe could do the reverse of a vmsplice() (moving from
the pipe to the vm), but it would only work if everything was
page-aligned, which it effectively never is. It's basically a
benchmark-only operation.

And the existence of vmsplice() is because we actually had code to
play games with making write() do a zero-copy but mark the source as
being COW. It was _wonderful_ for benchmarks, and was completely
useless for real world case because in the real world you always took
the COW fault. So vmsplice() is basically a "hey, I know what I'm
doing, and you can just take the page as-is because the source is
stable".

             Linus