lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YjoCqqs+UDxZ55nM@slm.duckdns.org>
Date:   Tue, 22 Mar 2022 07:08:58 -1000
From:   Tejun Heo <tj@...nel.org>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Imran Khan <imran.f.khan@...cle.com>, gregkh@...uxfoundation.org,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [RESEND PATCH v7 7/8] kernfs: Replace per-fs rwsem with hashed
 rwsems.

Hello,

On Tue, Mar 22, 2022 at 02:40:54AM +0000, Al Viro wrote:
> Sorry, misread that thing - the reason it copies the damn thing at all is
> the use of strsep().  Yecch...  Rule of the thumb regarding strsep() use,
> be it in kernel or in the userland: don't.  It's almost never the right
> primitive to use.

Yeah, it's ugly. I was being lazy on top of the existing interface.

> Lookups should use qstr; it has both the length and place for hash.
> Switch kernfs_find_ns() to that (and lift the calculation of length
> into the callers that do not have it - note that kernfs_iop_lookup()
> does) and you don't need the strsep() shite (or copying) anymore.

and yeah this would be cleaner.

> That would allow for kernfs_walk_ns() to take kernfs_rename_lock shared.
> 
> HOWEVER, that's not the only lock needed there and this patchset is
> broken in that respect - it locks the starting node, then walks the
> path.  Complete with lookups in rbtrees of children in the descendents
> of that node and those are *not* locked.
> 
> > > Wait a sec; what happens if e.g. kernfs_path_from_node() races with
> > > __kernfs_remove()?  We do _not_ clear ->parent, but we do drop references
> > > that used to pin what it used to point to, unless I'm misreading that
> > > code...  Or is it somehow prevented by drain-related logics?  Seeing
> > > that it seems to be possible to have kernfs_path_from_node() called from
> > > an interrupt context, that could be delicate...
> > 
> > kernfs_remove() is akin to freeing of the node and all its descendants. The
> > caller shouldn't be racing that against any other operations in the subtree.
> 
> That's interesting...  My impression had been that some of these functions
> could be called from interrupt contexts (judging by the spin_lock_irqsave()
> in there).  What kind of async contexts those are, and what do you use to
> make sure they don't leak into overlap with kernfs_remove()?

The spin_lock_irqsave()'s are there because they're often used when printing
messages which can happen from any context. e.g. cpuset ends up calling into
them to print current's cgroup under rcu_read_lock(), iocost to print
warning message under an irq-safe lock. In both and similar cases, the
caller knows that the cgroup is accessible which in turn guarantees that the
kernfs node hasn't be deleted.

> In particular, if you ever use those from RCU callbacks, you need to make
> sure that you have a grace period somewhere; the wait_event() you have in
> kernfs_drain() won't do it.

I don't know of cases where they are called from rcu callbacks although
there are cases where they're used from rcu criticial sections. That said,
at least in cgroup's case, cgroup destruction path has a grace period and
that's how a rcu critical section is safe to deref a cgroup and the
kernfs_node inside it.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ