lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230626-vorverlegen-setzen-c7f96e10df34@brauner>
Date:   Mon, 26 Jun 2023 11:32:16 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Ahelenia Ziemiańska 
        <nabijaczleweli@...ijaczleweli.xyz>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        David Howells <dhowells@...hat.com>,
        Jens Axboe <axboe@...nel.dk>
Subject: Re: Pending splice(file -> FIFO) always blocks read(FIFO),
 regardless of O_NONBLOCK on read side?

On Mon, Jun 26, 2023 at 03:12:09AM +0200, Ahelenia Ziemiańska wrote:
> Hi! (starting with get_maintainers.pl fs/splice.c,
>      idk if that's right though)
> 
> Per fs/splice.c:
>  * The traditional unix read/write is extended with a "splice()" operation
>  * that transfers data buffers to or from a pipe buffer.
> so I expect splice() to work just about the same as read()/write()
> (and, to a large extent, it does so).
> 
> Thus, a refresher on pipe read() semantics
> (quoting Issue 8 Draft 3; Linux when writing with write()):
> 60746  When attempting to read from an empty pipe or FIFO:
> 60747  • If no process has the pipe open for writing, read( ) shall return 0 to indicate end-of-file.
> 60748  • If some process has the pipe open for writing and O_NONBLOCK is set, read( ) shall return
> 60749    −1 and set errno to [EAGAIN].
> 60750  • If some process has the pipe open for writing and O_NONBLOCK is clear, read( ) shall
> 60751    block the calling thread until some data is written or the pipe is closed by all processes that
> 60752    had the pipe open for writing.
> 
> However, I've observed that this is not the case when splicing from
> something that sleeps on read to a pipe, and that in that case all
> readers block, /including/ ones that are reading from fds with
> O_NONBLOCK set!
> 
> As an example, consider these two programs:
> -- >8 --
> // wr.c
> #define _GNU_SOURCE
> #include <fcntl.h>
> #include <stdio.h>
> int main() {
>   while (splice(0, 0, 1, 0, 128 * 1024 * 1024, 0) > 0)
>     ;
>   fprintf(stderr, "wr: %m\n");
> }
> -- >8 --
> 
> -- >8 --
> // rd.c
> #define _GNU_SOURCE
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <unistd.h>
> int main() {
>   fcntl(0, F_SETFL, fcntl(0, F_GETFL) | O_NONBLOCK);
> 
>   char buf[64 * 1024] = {};
>   for (ssize_t rd;;) {
> #if 1
>     while ((rd = read(0, buf, sizeof(buf))) == -1 && errno == EINTR)
>       ;
> #else
>     while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) == -1 &&
>            errno == EINTR)
>       ;
> #endif
>     fprintf(stderr, "rd=%zd: %m\n", rd);
>     write(1, buf, rd);
> 
>     errno = 0;
>     sleep(1);
>   }
> }
> -- >8 --
> 
> Thus:
> -- >8 --
> a$ make rd wr
> a$ mkfifo fifo
> a$ ./rd < fifo                           b$ echo qwe > fifo
> rd=4: Success
> qwe
> rd=0: Success
> rd=0: Success                            b$ sleep 2 > fifo
> rd=-1: Resource temporarily unavailable
> rd=-1: Resource temporarily unavailable
> rd=0: Success
> rd=0: Success                            
> rd=-1: Resource temporarily unavailable  b$ /bin/cat > fifo
> rd=-1: Resource temporarily unavailable
> rd=4: Success                            abc
> abc
> rd=-1: Resource temporarily unavailable
> rd=4: Success                            def
> def
> rd=0: Success                            ^D
> rd=0: Success
> rd=0: Success                            b$ ./wr > fifo
> -- >8 --
> and nothing. Until you actually type a line (or a few) into teletype b
> so that the splice completes, at which point so does the read.
> 
> An even simpler case is 
> -- >8 --
> $ ./wr | ./rd
> abc
> def
> rd=8: Success
> abc
> def
> ghi
> jkl
> rd=8: Success
> ghi
> jkl
> ^D
> wr: Success
> rd=-1: Resource temporarily unavailable
> rd=0: Success
> rd=0: Success
> -- >8 --
> 
> splice flags don't do anything.
> Tested on bookworm (6.1.27-1) and Linus' HEAD (v6.4-rc7-234-g547cc9be86f4).
> 
> You could say this is a "denial of service", since this is a valid
> way of following pipes (and, sans SIGIO, the only portable one),

splice() may block for any of the two file descriptors if they don't
have O_NONBLOCK set even if SPLICE_F_NONBLOCK is raised.

SPLICE_F_NONBLOCK in splice_file_to_pipe() is only relevant if the pipe
is full. If the pipe isn't full then the write is attempted. That of
course involves reading the data to splice from the source file. If the
source file isn't O_NONBLOCK that read may block holding pipe_lock().

If you raise O_NONBLOCK on the source fd in wr.c then your problems go
away. This is pretty long-standing behavior. Splice would have to be
refactored to not rely on pipe_lock(). That's likely major work with a
good portion of regressions if the past is any indication.

If you need that ability to fully async read from a pipe with splice
rn then io_uring will at least allow you to punt that read into an async
worker thread afaict.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ