lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <qk6hjuam54khlaikf2ssom6custxf5is2ekkaequf4hvode3ls@zgf7j5j4ubvw>
Date:   Mon, 26 Jun 2023 03:12:09 +0200
From:   Ahelenia Ziemiańska 
        <nabijaczleweli@...ijaczleweli.xyz>
To:     Alexander Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Pending splice(file -> FIFO) always blocks read(FIFO), regardless of
 O_NONBLOCK on read side?

Hi! (starting with get_maintainers.pl fs/splice.c,
     idk if that's right though)

Per fs/splice.c:
 * The traditional unix read/write is extended with a "splice()" operation
 * that transfers data buffers to or from a pipe buffer.
so I expect splice() to work just about the same as read()/write()
(and, to a large extent, it does so).

Thus, a refresher on pipe read() semantics
(quoting Issue 8 Draft 3; Linux when writing with write()):
60746  When attempting to read from an empty pipe or FIFO:
60747  • If no process has the pipe open for writing, read( ) shall return 0 to indicate end-of-file.
60748  • If some process has the pipe open for writing and O_NONBLOCK is set, read( ) shall return
60749    −1 and set errno to [EAGAIN].
60750  • If some process has the pipe open for writing and O_NONBLOCK is clear, read( ) shall
60751    block the calling thread until some data is written or the pipe is closed by all processes that
60752    had the pipe open for writing.

However, I've observed that this is not the case when splicing from
something that sleeps on read to a pipe, and that in that case all
readers block, /including/ ones that are reading from fds with
O_NONBLOCK set!

As an example, consider these two programs:
-- >8 --
// wr.c
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
int main() {
  while (splice(0, 0, 1, 0, 128 * 1024 * 1024, 0) > 0)
    ;
  fprintf(stderr, "wr: %m\n");
}
-- >8 --

-- >8 --
// rd.c
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main() {
  fcntl(0, F_SETFL, fcntl(0, F_GETFL) | O_NONBLOCK);

  char buf[64 * 1024] = {};
  for (ssize_t rd;;) {
#if 1
    while ((rd = read(0, buf, sizeof(buf))) == -1 && errno == EINTR)
      ;
#else
    while ((rd = splice(0, 0, 1, 0, 128 * 1024 * 1024, 0)) == -1 &&
           errno == EINTR)
      ;
#endif
    fprintf(stderr, "rd=%zd: %m\n", rd);
    write(1, buf, rd);

    errno = 0;
    sleep(1);
  }
}
-- >8 --

Thus:
-- >8 --
a$ make rd wr
a$ mkfifo fifo
a$ ./rd < fifo                           b$ echo qwe > fifo
rd=4: Success
qwe
rd=0: Success
rd=0: Success                            b$ sleep 2 > fifo
rd=-1: Resource temporarily unavailable
rd=-1: Resource temporarily unavailable
rd=0: Success
rd=0: Success                            
rd=-1: Resource temporarily unavailable  b$ /bin/cat > fifo
rd=-1: Resource temporarily unavailable
rd=4: Success                            abc
abc
rd=-1: Resource temporarily unavailable
rd=4: Success                            def
def
rd=0: Success                            ^D
rd=0: Success
rd=0: Success                            b$ ./wr > fifo
-- >8 --
and nothing. Until you actually type a line (or a few) into teletype b
so that the splice completes, at which point so does the read.

An even simpler case is 
-- >8 --
$ ./wr | ./rd
abc
def
rd=8: Success
abc
def
ghi
jkl
rd=8: Success
ghi
jkl
^D
wr: Success
rd=-1: Resource temporarily unavailable
rd=0: Success
rd=0: Success
-- >8 --

splice flags don't do anything.
Tested on bookworm (6.1.27-1) and Linus' HEAD (v6.4-rc7-234-g547cc9be86f4).

You could say this is a "denial of service", since this is a valid
way of following pipes (and, sans SIGIO, the only portable one),
and this makes it so the reader is completely blocked,
and impervious to all deadly signals (incl. SIGKILL).
I've also observed strace hanging (but it responded to SIGKILL).


Rudimentary analysis shows that sys_splice() -> __do_splice() ->
do_splice() -> {!(ipipe && opipe) -> !(ipipe) -> (ipipe)}:
splice_file_to_pipe which then does
-- >8 --
	pipe_lock(opipe);
	ret = wait_for_space(opipe, flags);
	if (!ret)
		ret = do_splice_to(in, offset, opipe, len, flags);
	pipe_unlock(opipe);
-- >8 --
conversely:
-- >8 --
static ssize_t
pipe_read(struct kiocb *iocb, struct iov_iter *to)
{
        size_t total_len = iov_iter_count(to);
        struct file *filp = iocb->ki_filp;
        struct pipe_inode_info *pipe = filp->private_data;
        bool was_full, wake_next_reader = false;
        ssize_t ret;

        /* Null read succeeds. */
        if (unlikely(total_len == 0))
                return 0;

        ret = 0;
        __pipe_lock(pipe);
-- >8 --
so, naturally(?), all non-empty reads wait for the splice to end.

To validate this, I've applied the following
(which I'm 100% sure is wrong and breaks unrelated stuff):
-- >8 --
diff --git a/fs/pipe.c b/fs/pipe.c
index 2d88f73f585a..a76535839d32 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -240,6 +240,11 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
 	if (unlikely(total_len == 0))
 		return 0;
 
+	if ((filp->f_flags & O_NONBLOCK) || (iocb->ki_flags & IOCB_NOWAIT)) {
+		if (mutex_is_locked(&pipe->mutex))
+			return -EAGAIN;
+	}
+
 	ret = 0;
 	__pipe_lock(pipe);
-- >8 --
which does make the aforementioned cases less pathological
(naturally this now means it always EAGAINs even if there is data to be
 read since the splice() loop keeps it locked, but you get the picture).

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ