lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 21 Mar 2022 17:55:53 +0000
From:   Al Viro <viro@...iv.linux.org.uk>
To:     Tejun Heo <tj@...nel.org>
Cc:     Imran Khan <imran.f.khan@...cle.com>, gregkh@...uxfoundation.org,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RESEND PATCH v7 7/8] kernfs: Replace per-fs rwsem with hashed
 rwsems.

On Mon, Mar 21, 2022 at 06:46:53AM -1000, Tejun Heo wrote:
> On Mon, Mar 21, 2022 at 07:29:45AM +0000, Al Viro wrote:
> ...
> > stabilizing the tree topology.  Turn it into rwlock if you wish,
> > with that thing being a reader and existing users - writers.
> > And don't bother with further scaling, until and unless you see a real
> > contention on it.
> 
> Given how rare these renames are, in the (unlikely) case the rename rwsem
> becomes a problem, we should probably just switch it to a percpu_rwsem.

Why bother with rwsem, when we don't need anything blocking under it?
DEFINE_RWLOCK instead of DEFINE_SPINLOCK and don't make it static.

Again, we already have a spinlock protecting ->parent and ->name.
Existing users:

kernfs_name() - can be shared.
kernfs_path_from_node() - can be shared.

pr_cont_kernfs_name() - exclusive, since that thing works into a static buffer.
pr_cont_kernfs_path() - exclusive, same reasons.

kernfs_get_parent() - can be shared, but its callers need to be reviewed;
that's the prime breeding ground for rename races.

kernfs_walk_ns() - this is fucking insane; on the surface, it needs to
be exclusive due to the use of the same static buffer.  It uses that
buffer to generate a pathname, *THEN* walks over it with strsep().
That's an... interesting approach, for the lack of other printable
terms - we walk the chain of ancestors, concatenating their names
into a buffer and separating those names with slashes, then we walk
that buffer, searching for slashes...  WTF?

kernfs_rename_ns() - exclusive; that's where the tree topology gets
changed.

So we can just turn that spinlock into rwlock, replace the existing
uses with read_lock()/read_unlock() in kernfs_{name,path_from_node,get_parent}
and with write_lock()/write_unlock() in the rest of fs/kernfs/dir.c,
make it non-static, put extern into kernfs-internal.h and there you
go...

Wait a sec; what happens if e.g. kernfs_path_from_node() races with
__kernfs_remove()?  We do _not_ clear ->parent, but we do drop references
that used to pin what it used to point to, unless I'm misreading that
code...  Or is it somehow prevented by drain-related logics?  Seeing
that it seems to be possible to have kernfs_path_from_node() called from
an interrupt context, that could be delicate...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ