[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240527160356.3909000-1-aliceryhl@google.com>
Date: Mon, 27 May 2024 16:03:56 +0000
From: Alice Ryhl <aliceryhl@...gle.com>
To: viro@...iv.linux.org.uk
Cc: a.hindborg@...sung.com, alex.gaynor@...il.com, aliceryhl@...gle.com,
arve@...roid.com, benno.lossin@...ton.me, bjorn3_gh@...tonmail.com,
boqun.feng@...il.com, brauner@...nel.org, cmllamas@...gle.com,
dan.j.williams@...el.com, dxu@...uu.xyz, gary@...yguo.net,
gregkh@...uxfoundation.org, joel@...lfernandes.org, keescook@...omium.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, maco@...roid.com,
ojeda@...nel.org, peterz@...radead.org, rust-for-linux@...r.kernel.org,
surenb@...gle.com, tglx@...utronix.de, tkjos@...roid.com, tmgross@...ch.edu,
wedsonaf@...il.com, willy@...radead.org, yakoyoku@...il.com
Subject: Re: [PATCH v6 3/8] rust: file: add Rust abstraction for `struct file`
Al Viro <viro@...iv.linux.org.uk> writes:
> > > You obviously are aware of this but I'm just spelling it out. Iirc,
> > > there will practically only ever be one light refcount per file.
> > >
> > > For a light refcount to be used we know that the file descriptor table
> > > isn't shared with any other task. So there are no threads that could
> > > concurrently access the file descriptor table. We also know that the
> > > file descriptor table cannot become shared while we're in system call
> > > context because the caller can't create new threads and they can't
> > > unshare the file descriptor table.
> > >
> > > So there's only one fdget() caller (Yes, they could call fdget()
> > > multiple times and then have to do fdput() multiple times but that's a
> > > level of weirdness that we don't need to worry about.).
> >
> > Hmm. Is it not the case that different processes with different file
> > descriptor tables could reference the same underlying `struct file` and
> > both use light refcounts to do so, as long as each fd table is not
> > shared? So there could be multiple light refcounts to the same `struct
> > file` at the same time on different threads.
>
> Relevant rules:
>
> * Each file pointer in any descriptor table contributes to refcount
> of file.
>
> * All assignments to task->files are done by the task itself or,
> during task creation, by its parent The latter happens before the task
> runs for the first time. The former is done with task_lock(current)
> held.
>
> * current->files is always stable. The object it points to
> is guaranteed to stay alive at least until you explicitly change
> current->files.
> * task->files is stable while you are holding task_lock(task).
> The object it points to is guaranteed to stay alive until you release
> task_lock(task).
> * task->files MAY be fetched (racily) without either of the
> above, but it should not be dereferenced - the memory may be freed
> and reused right after you've fetched the pointer.
>
> * descriptor tables are refcounted by table->count.
> * descriptor table is created with ->count equal to 1 and
> destroyed when its ->count reaches 0.
> * each task with task->files == table contributes to table->count.
> * before the task dies, its ->files becomes NULL (see exit_files()).
> * when task is born (see copy_process() and copy_files())) the parent
> is responsible for setting the value of task->files and making sure that
> refcounts are correct; that's the only case where one is allowed to acquire
> an extra reference to existing table (handling of clone(2) with COPY_FILES).
>
> * the only descriptor table one may modify is that pointed to
> by current->files. Any access to other threads' descriptor tables is
> read-only.
>
> * struct fd is fundamentally thread-local. It should never be
> passed around, put into shared data structures, etc.
>
> * if you have done fdget(N), the matching fdput() MUST be done
> before the caller modifies the Nth slot of its descriptor table,
> spawns children that would share the descriptor table.
>
> * fdget() MAY borrow a reference from caller's descriptor table.
> That can be done if current->files->count is equal to 1.
> In that case we can be certain that the file reference we fetched from
> our descriptor table will remain unchanged (and thus contributing to refcount
> of file) until fdput(). Indeed,
> + at the time of fdget() no other thread has task->files pointing
> to our table (otherwise ->count would be greater than 1).
> + our thread will remain the sole owner of descriptor table at
> least until fdput(). Indeed, the first additional thread with task->files
> pointing to our table would have to have been spawned by us and we are
> forbidden to do that (rules for fdget() use)
> + no other thread could modify our descriptor table (they would
> have to share it first).
> + we are allowed to modify our table, but we are forbidden to touch
> the slot we'd copied from (rules for fdget() use).
>
> In other words, if current->files->count is equal to 1 at fdget() time
> we can skip incrementing refcount. Matching fdput() would need to
> skip decrement, of course. Note that we must record that (borrowed
> vs. cloned) in struct fd - the condition cannot be rechecked at fdput()
> time, since the table that had been shared at fdget() time might no longer
> be shared by the time of fdput().
This is great! It matches my understanding. I didn't know the details
about current->files and task->files.
You should copy this to the kernel documentation somewhere. :)
> > And this does *not* apply to `fdget_pos`, which checks the refcount of
> > the `struct file` instead of the refcount of the fd table.
>
> False. fdget_pos() is identical to fdget() as far as file refcount
> handling goes. The part that is different is that grabbing ->f_pos_lock
> is sensitive to file refcount in some cases. This is orthogonal to
> "does this struct fd contribute to file refcount".
Sorry, I see now that I didn't phrase that quite right. What I meant is
that there are ways of sharing a `struct file` reference during an fdget
scope that are not dangerous, but where it *would be* dangerous if it
was an fdget_pos scope instead. Specifically, the reason they are
dangerous is that they can lead to a data race on the file position if
the fdget_pos scope did not take the f_pos_lock mutex.
For example, during an `fdget(N)` scope, you can always do a `get_file`
and then send it to another process and `fd_install` it into that other
process. There's no way that this could result in the deletion of the
Nth entry of `current->files`.
However, during an `fdget_pos(N)` scope, then it is *not* the case that
it's always okay to send a `get_file` reference to another thread and
`fd_install` it. Because after the remote process returns from the
syscall in which we `fd_install`ed the file, the remote process could
proceed to call another syscall that in turn modifies the file position.
And if the original `fdget_pos(N)` scope modifies the file position
after sending the `get_file` reference, then that could be a data race
on f_pos.
> Again, "light" references are tied to thread; they can only be created
> if we are guaranteed that descriptor table's slot they came from will
> remain unchanged for as long as the reference is used.
>
> And yes, there may be several light references to the same file - both
> in different processes that do not share descriptor table *and* in the
> same thread, if e.g. sendfile(in_fd, out_fd, ...) is called with
> in_fd == out_fd.
Thanks for confirming this!
I hope this reply along with my reply to Christian Brauner also
addresses your other thread. Let me know if it doesn't.
Alice
Powered by blists - more mailing lists