[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjOpedPDj+3KCJjk@zeniv-ca.linux.org.uk>
Date: Thu, 17 Mar 2022 21:34:49 +0000
From: Al Viro <viro@...iv.linux.org.uk>
To: Imran Khan <imran.f.khan@...cle.com>
Cc: tj@...nel.org, gregkh@...uxfoundation.org,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RESEND PATCH v7 1/8] kernfs: Introduce interface to access
global kernfs_open_file_mutex.
On Thu, Mar 17, 2022 at 06:26:05PM +1100, Imran Khan wrote:
> @@ -570,9 +571,10 @@ static void kernfs_put_open_node(struct kernfs_node *kn,
> struct kernfs_open_file *of)
> {
> struct kernfs_open_node *on = kn->attr.open;
> + struct mutex *mutex = NULL;
> unsigned long flags;
>
> - mutex_lock(&kernfs_open_file_mutex);
> + mutex = kernfs_open_file_mutex_lock(kn);
> spin_lock_irqsave(&kernfs_open_node_lock, flags);
Can that ever be reached with local interrupts disabled? I mean, what is
that spin_lock_irqsave() about?
> @@ -745,11 +747,12 @@ static int kernfs_fop_release(struct inode *inode, struct file *filp)
> {
> struct kernfs_node *kn = inode->i_private;
> struct kernfs_open_file *of = kernfs_of(filp);
> + struct mutex *lock = NULL;
>
> if (kn->flags & KERNFS_HAS_RELEASE) {
> - mutex_lock(&kernfs_open_file_mutex);
> + lock = kernfs_open_file_mutex_lock(kn);
> kernfs_release_file(kn, of);
> - mutex_unlock(&kernfs_open_file_mutex);
> + mutex_unlock(lock);
Careful - you are about to remove the existing exclusion between *all*
->release() instances, same node or not.
In particular, if some driver had them manipulate a driver-local list of
some kind, relying upon the kernfs to provide the exclusion, it'd break
as soon as you turn that thing into per-node (or hashed) mutex.
It's _probably_ safe, seeing that the one and only instance of ->release()
in the entire tree (cgroup_file_release()) is rather limited in what
it's doing, and while it calls a submethod (cftype.release()) there's only
a couple of instances of that (cgroup_procs_release() and
cgroup_pressure_release(), both in kernel/cgroup/cgroup.c). Neither
seems to rely upon the global exclusion.
However, that's a change of rules and it needs to be documented as such.
Incidentally, what's the point of having kernfs_open_node->refcnt
atomic_t? All users are under kernfs_open_node_lock... AFAICS,
it's simply "->files is non-empty or something is in
kernfs_drain_open_files() for the node in question", so I'm not
even sure we want a counter there...
Note that kernfs_drain_open_files() can't overlap with
kernfs_fops_open() adding to the list of files (and we seriously
rely upon that - you don't want ops->release() called while in
the middle of ops->open()). kernfs_fops_open() starts with
grabbing an active reference; kernfs_drain_open_files() is
not called until we had
* prevented new active references being grabbed and
* waited for all active references to be dropped.
So kernfs_drain_open_files() can do the following:
1) optimistically check for ->attr.open being NULL;
bugger off if it is. We know that nobody could be currently
trying to add anything to it, mutex or no mutex.
2) grab the mutex
3) recheck ->attr.open; it might have become NULL.
If it had, unlock and bugger off.
4) walk the list, doing unmaps/releases.
5) unlock and bugger off.
The only thing doing removals from the list is
kernfs_put_open_node() and it grabs that mutex.
So it can't get to the "remove from list, free the container
of list head" until we are through.
IOW, there's no reason to hold a reference to kernfs_open_node
in kernfs_drain_open_files() at all. And that makes ->refcnt
completely useless - kernfs_put_open_node() should do
list_del(&of->list);
if (list_empty(&on->files))
kn->attr.open = NULL;
else
on = NULL;
and to hell with refcounting.
As the matter of fact, we can do even better - make freeing
that thing rcu-delayed, use rcu_assign_pointer() for stores,
rcu_dereference() for loads and have kernfs_notify() do
rcu_read_lock();
on = rcu_dereference(kn->attr.open);
if (on) {
atomic_inc(&on->event);
wake_up_interruptible(&on->poll);
}
rcu_read_unlock();
and kernfs_open_node_lock becomes useless - all places that
grab it are under kernfs_open_file_mutex.
Powered by blists - more mailing lists