[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZG0slV2BhSZkRL_y@codewreck.org>
Date: Wed, 24 May 2023 06:13:57 +0900
From: Dominique Martinet <asmadeus@...ewreck.org>
To: Christian Brauner <brauner@...nel.org>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>,
Pavel Begunkov <asml.silence@...il.com>,
Stefan Roesch <shr@...com>, Clay Harris <bugs@...ycon.org>,
Dave Chinner <david@...morbit.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
io-uring@...r.kernel.org
Subject: Re: [PATCH v2 1/6] fs: split off vfs_getdents function of getdents64
syscall
Christian Brauner wrote on Tue, May 23, 2023 at 05:39:08PM +0200:
> > @@ -362,11 +369,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
> > };
> > int error;
> >
> > - f = fdget_pos(fd);
> > - if (!f.file)
> > - return -EBADF;
> > -
> > - error = iterate_dir(f.file, &buf.ctx);
> > + error = iterate_dir(file, &buf.ctx);
>
> So afaict this isn't enough.
> If you look into iterate_shared() you should see that it uses and
> updates f_pos. But that can't work for io_uring and also isn't how
> io_uring handles read and write. You probably need to use a local pos
> similar to what io_uring does in rw.c for rw->kiocb.ki_pos. But in
> contrast simply disallow any offsets for getdents completely. Thus not
> relying on f_pos anywhere at all.
Using a private offset from the sqe was the previous implementation
discussed around here[1], and Al Viro pointed out that the iterate
filesystem implementations don't validate the offset makes sense as it's
either costly or for some filesystems downright impossible, so I went
into a don't let users modify it approach.
[1] https://lore.kernel.org/all/20211221164004.119663-1-shr@fb.com/T/#m517583f23502f32b040c819d930359313b3db00c
I agree it's not how io_uring usually works -- it dislikes global
states -- but it works perfectly well as long as you don't have multiple
users on the same file, which the application can take care of.
Not having any offsets would work for small directories but make reading
large directories impossible so some sort of continuation is required,
which means we need to keep the offset around; I also suggested keeping
the offset in argument as the previous version but only allowing the
last known offset (... so ultimately still updating f_pos anyway as we
don't have anywhere else to store it) or 0, but if we're going to do
that it looks much simpler to me to expose the same API as getdents.
--
Dominique Martinet | Asmadeus
Powered by blists - more mailing lists