[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20191126100010.GA15941@deco.navytux.spb.ru>
Date: Tue, 26 Nov 2019 10:00:20 +0000
From: Kirill Smelkov <kirr@...edi.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Kenneth R. Crudup" <kenny@...ix.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Commit 0be0ee71 ("fs: properly and reliably lock f_pos in fdget_pos()") breaking userspace
On Mon, Nov 25, 2019 at 07:39:34PM -0800, Linus Torvalds wrote:
> On Mon, Nov 25, 2019 at 7:21 PM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > Of course, this may fix the f_pos locking issue, but replace it with a
> > "oops, the character device driver tried to look at *pos anyway", and
> > that will give you a nice OOPS instead.
>
> Confirmed. At least the x86 firmware update code uses
> "simple_read_from_buffer()", which does use the file position, but
> doesn't actually allow llseek().
>
> So no, "it's a character device no llseek" does not mean that it acts
> as a pure streaming device with no file position, and we'd actually
> have to mark individual drivers (either by adding 'stream_open()' in
> their open routines, or adding the extra field to 'struct
> file_operations' that I mentioned).
>
> I think I'll have to revert that trial commit. I'll give it another
> day in case somebody has a better idea, but it looks like it's too
> early to do that nice cleanup as things are now.
>
> Linus
Linus, thanks for heads up and for pushing FMODE_STREAM conversion
forward.
I still believe the most practical approach to do it is via automation -
via extending and teaching stream_open.cocci to find other places that
actually don't use ppos / ki->ki_pos at all and at least warn that
stream_open() should be used for that file. Else it will be a lot of
pain.
Of course we can manually convert the core cases like pipes and sockets.
But those things are easy to do. On the other hand by manually
converting them, we lesser the extent to where the automation is applied
and tested, and that leaves us with just many small corner cases that
will be left in potentially racy/deadlocked state until someone actually
hit that problem and cares to debug it to understand what is going on.
Those will be many different places and so likely many different people,
and since debugging concurrency issues is not easy, it will likely last for
many years. Of course things like KCSAN helps a lot, but, if I
understand correctly, they do not provide full coverage for the whole
kernel, especially they are likely not providing coverage for leaf
kernel bits in drivers.
Of course, I might be missing something, and there is a way to e.g.
combine automation+pain approaches. Then that would be good to make
combined progress.
I still keep in mind extending stream_open.cocci myself, but for now I
cannot delve into it before finishing my transactional filesystem...
Kirill
P.S. even though I was put into cc there, somehow I did not received any
notification email for commits d8e464ecc17b (vfs: mark pipes and sockets as
stream-like file descriptors) and 0be0ee71816b (vfs: properly and
reliably lock f_pos in fdget_pos()).
P.P.S completely untested, but looks sane:
---- 8< ----
>From 634e1af8775aa27799b4879fdb527e4a3b8b31ef Mon Sep 17 00:00:00 2001
From: Kirill Smelkov <kirr@...edi.com>
Date: Tue, 26 Nov 2019 12:40:56 +0300
Subject: [PATCH] socket: No need to check for ki_pos != 0 anymore
Commit d8e464ecc17b (vfs: mark pipes and sockets as stream-like file
descriptors) converted sockets to use stream_open() which forbids llseek
and sets FMODE_STREAM for which VFS ksys_read/ksys_write routines make
sure not to change file->f_pos at all and always pass either ppos=NULL,
or ki->ki_pos=0 to fops->read/write, because there is no file position
for stream-like files.
Drop unnecessary checks.
Signed-off-by: Kirill Smelkov <kirr@...edi.com>
---
net/socket.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/net/socket.c b/net/socket.c
index 17bc1eee198a..d8c583c755ae 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -959,9 +959,6 @@ static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to)
if (file->f_flags & O_NONBLOCK)
msg.msg_flags = MSG_DONTWAIT;
- if (iocb->ki_pos != 0)
- return -ESPIPE;
-
if (!iov_iter_count(to)) /* Match SYS5 behaviour */
return 0;
@@ -978,9 +975,6 @@ static ssize_t sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
.msg_iocb = iocb};
ssize_t res;
- if (iocb->ki_pos != 0)
- return -ESPIPE;
-
if (file->f_flags & O_NONBLOCK)
msg.msg_flags = MSG_DONTWAIT;
--
2.20.1
Powered by blists - more mailing lists