[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240528193624.GH2118490@ZenIV>
Date: Tue, 28 May 2024 20:36:24 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: Alice Ryhl <aliceryhl@...gle.com>
Cc: a.hindborg@...sung.com, alex.gaynor@...il.com, arve@...roid.com,
benno.lossin@...ton.me, bjorn3_gh@...tonmail.com,
boqun.feng@...il.com, brauner@...nel.org, cmllamas@...gle.com,
dan.j.williams@...el.com, dxu@...uu.xyz, gary@...yguo.net,
gregkh@...uxfoundation.org, joel@...lfernandes.org,
keescook@...omium.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, maco@...roid.com, ojeda@...nel.org,
peterz@...radead.org, rust-for-linux@...r.kernel.org,
surenb@...gle.com, tglx@...utronix.de, tkjos@...roid.com,
tmgross@...ch.edu, wedsonaf@...il.com, willy@...radead.org,
yakoyoku@...il.com, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v6 3/8] rust: file: add Rust abstraction for `struct file`
On Mon, May 27, 2024 at 04:03:56PM +0000, Alice Ryhl wrote:
> > In other words, if current->files->count is equal to 1 at fdget() time
> > we can skip incrementing refcount. Matching fdput() would need to
> > skip decrement, of course. Note that we must record that (borrowed
> > vs. cloned) in struct fd - the condition cannot be rechecked at fdput()
> > time, since the table that had been shared at fdget() time might no longer
> > be shared by the time of fdput().
>
> This is great! It matches my understanding. I didn't know the details
> about current->files and task->files.
>
> You should copy this to the kernel documentation somewhere. :)
Probably, after it's turned into something more coherent - and after
the description of struct fd scope rules is corrected ;-/
Correction in question: you _are_ allowed to move reference from
descriptor table while in scope of struct fd; what you are not allowed
is dropping that reference until the end of scope.
That's what binder_do_fd_close() is about - binder_deferred_fd_close()
is called in a struct fd scope (anything called from ->unlocked_ioctl()
instances is). It *does* remove a struct file reference from
descriptor table:
twcb->file = file_close_fd(fd);
moves that reference to twcb->file, and subsequent
get_file(twcb->file);
filp_close(twcb->file, current->files);
completes detaching it from the table[*] and the reference itself
is dropped via task_work, which is going to be executed on the
way out to userland, definitely after we leave the scope of
struct fd.
Incidentally, I'm very tempted to unexport close_fd(); in addition to
being a source of bugs when called from ioctl on user-supplied descriptor
it encourages racy crap - just look at e.g. 1819200166ce
"drm/amdkfd: Export DMABufs from KFD using GEM handles", where we
call drm_gem_prime_handle_to_fd(), immediately followed by
dmabuf = dma_buf_get(fd);
close_fd(fd);
dup2() from another thread with guessed descriptor number as target and
you've got a problem... It's not a violation of fdget() use rules
(it is called from ioctl, but descriptor is guaranteed to be different
from the one passed to ioctl(2)), but it's still wrong. Would take
some work, though...
"Detaching from the table" bit also needs documenting, BTW. If you look
at that thing, you'll see that current->files is converted to fl_owner_t,
which is an opaque pointer. What happens is that dnotify and POSIX lock
use current->files as opaque tags (->dn_owner and ->flc_owner resp.) and
filp_close() (well, filp_flush() these days) needs to be called to
purge all of those associated with given struct file and given tag value.
That needs to be done between removal of file reference from table and
destruction of the table itself and it guarantees that those opaque references
won't outlive the table (more importantly, don't survive until a different
files_struct instance is allocated at the same address).
[*] NB: might make sense to export filp_flush(), since that's what this
sequence boils down to. We really need a better identifier, though -
"filp" part is a leftover from OSDI, AFAICT; that's a hungarism for
"file pointer" and it makes no sense. file_flush() would be better,
IMO - or flush_file(), for that matter.
Powered by blists - more mailing lists