[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3fgbhh5bxahcutguae7suxf6y54sjnwxld5gwnuvmnqqksiw2w@tarta.nabijaczleweli.xyz>
Date: Sat, 16 Dec 2023 06:36:07 +0100
From:
Ahelenia Ziemiańska <nabijaczleweli@...ijaczleweli.xyz>
To: Jens Axboe <axboe@...nel.dk>
Cc: Christian Brauner <brauner@...nel.org>,
Alexander Viro <viro@...iv.linux.org.uk>, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RERESEND 10/11] splice: file->pipe: -EINVAL for
non-regular files w/o FMODE_NOWAIT
On Fri, Dec 15, 2023 at 08:47:15AM -0700, Jens Axboe wrote:
> On 12/14/23 11:45 AM, Ahelenia Ziemiańska wrote:
> > We request non-blocking I/O in the generic implementation, but some
> > files ‒ ttys ‒ only check O_NONBLOCK. Refuse them here, lest we
> > risk sleeping with the pipe locked for indeterminate lengths of
> > time.
> A worthy goal here is ensuring that _everybody_ honors IOCB_NOWAIT,
> rather than just rely on O_NONBLOCK. This does involve converting to
> ->read_iter/->write_iter if the driver isn't already using it, but some
> of them already have that, yet don't check IOCB_NOWAIT or treat it the
> same as O_NONBLOCK.
This doesn't really mean much to me, sorry.
> Adding special checks like this is not a good idea, imho.
That's what Linus said I should do so that's what I did
https://lore.kernel.org/linux-fsdevel/CAHk-=wimmqG_wvSRtMiKPeGGDL816n65u=Mq2+H3-=uM2U6FmA@mail.gmail.com/
I can't fix the tty layer :/
> > This also masks inconsistent wake-ups (usually every second line)
> > when splicing from ttys in icanon mode.
> >
> > Regular files don't /have/ a distinct O_NONBLOCK mode,
> > because they always behave non-blockingly, and for them FMODE_NOWAIT is
> > used in the purest sense of
> > /* File is capable of returning -EAGAIN if I/O will block */
> > which is not set by the vast majority of filesystems,
> > and it's not the semantic we want here.
>
> The main file systems do very much set it, like btrfs, ext4, and xfs. If
> you look at total_file_systems / ones_flagging_it the ratio may be high,
> but in terms of installed userbase, the majority definitely will have
> it. Also see comment on cover letter for addressing this IOCB_NOWAIT
> confusion.
Reassessing
[1] https://lore.kernel.org/linux-fsdevel/5osglsw36dla3mubtpsmdwdid4fsdacplyd6acx2igo4atogdg@yur3idyim3cc/
I see FMODE_NOWAIT in
blockdevs
/dev/{null,zero,random,urandom}
btrfs/ext4/f2fs/ocfs2/xfs
eventfd
pipes
sockets
tun/tap
which means that vfat/fuse/nfs/tmpfs/ramfs/procfs/sysfs don't.
(zfs also doesn't, but that's not for this list.)
I don't know if that's actually a "majority" in a meaningful sense,
I agree, but I think I primarily committed to this exclusion because
tmpfs/ramfs didn't.
I s'pose ramfs can already be tagged since it already returns
-EAGAIN when I/O would block (never).
tmpfs not being spliceable is also questionable.
But this'd also mean effectively deleting
afs_file_splice_read
ceph_splice_read
coda_file_splice_read
ecryptfs_splice_read_update_atime
fuse_dev_splice_read
nfs_file_splice_read
orangefs_file_splice_read
shmem_file_splice_read
v9fs_file_splice_read
(not to mention the many others (adfs/affs/bfs/bcachefs/cramfs/erofs/fat/hfs*/hostfs/hpfs/jffs2/jfs/minix/nilfs/ntfs/omfs/reiserfs/isofs/sysv/ubifs/udf/ufs/vboxsf/squashfs/romfs)
which just use the filemap impl verbatim).
There's no point to restricting splice access on a per-filesystem level
(which this'd do), since to mount a malicious network filesystem you
need to be root.
A denial of service attack makes no sense if you're already root.
(Maybe except for fuse, which people typically run suid;
that I could see potentially making sense to disable..)
I have indeed managed to confuse myself into the NOWAIT hole,
but this is actually about
"not letting unprivileged users escalate into
hanging system daemons by writing to a pipe"
rather than
"if we ever hold the pipe lock for >2µs we die instantly".
O_NONBLOCK filtered by FMODE_NOWAIT is used as a semantic proxy for
the 10 different types of files anyone can create that we know are safe.
Anyone can open a socket and not write to it, so we must refuse to
splice from a socket with no data in it.
But only root can mount filesystems, so a regular file is always safe.
And, actually defining a slightly-heuristic per-file policy in the syscall
itself is stupid, you've talked me out of this.
This check only actually applies to the generic copy_splice_read()
implementation, since the "real"/non-generic splices
(fiemap_splice_read/per-filesystem ‒
all the others that this patchset touches)
are already known to be safe
(and aren't reads so FMODE_NOWAIT doesn't factor in at all).
I've dropped this patch and have instead added this to 01/11:
diff --git a/fs/splice.c b/fs/splice.c
index f8bfc9cf8cdc..6d369d7d56d5 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -331,0 +332,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
+ /*
+ * This generic implementation is only safe w.r.t. the pipe lock
+ * if the file actually respects IOCB_NOWAIT, which ttys don't.
+ */
+ if (!(in->f_mode & FMODE_NOWAIT))
+ return -EINVAL;
(Indeed, in many ways, Linus' post to which I reply in [1] pretty much
says this explcitly. Actually he literally says this. I just don't
realise and instead of adding the snippet to copy_splice_read(),
which he already diffed and talks about, I copied it to the syscall.)
Now I just need to re-consider the prose in a way
that avoids this deeply embarrassing IOCB_NOWAIT/regular-file nonsense,
and this series ends up being just "fixing splice implementations"
without also special-casing the syscall itself.
Thanks for asking the right questions.
Sorry for longposting.
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists