linux-kernel - Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ecf41abd583d5d2c775d9d385ea2a0af7b275037.camel@themaw.net>
Date:   Sat, 19 Dec 2020 08:53:40 +0800
From:   Ian Kent <raven@...maw.net>
To:     Fox Chen <foxhlchen@...il.com>
Cc:     Tejun Heo <tj@...nel.org>, Greg KH <gregkh@...uxfoundation.org>,
        akpm@...ux-foundation.org, dhowells@...hat.com,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        miklos@...redi.hu, ricklind@...ux.vnet.ibm.com,
        sfr@...b.auug.org.au, viro@...iv.linux.org.uk
Subject: Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency
 improvement

On Fri, 2020-12-18 at 21:20 +0800, Fox Chen wrote:
> On Fri, Dec 18, 2020 at 7:21 PM Ian Kent <raven@...maw.net> wrote:
> > On Fri, 2020-12-18 at 16:01 +0800, Fox Chen wrote:
> > > On Fri, Dec 18, 2020 at 3:36 PM Ian Kent <raven@...maw.net>
> > > wrote:
> > > > On Thu, 2020-12-17 at 10:14 -0500, Tejun Heo wrote:
> > > > > Hello,
> > > > > 
> > > > > On Thu, Dec 17, 2020 at 07:48:49PM +0800, Ian Kent wrote:
> > > > > > > What could be done is to make the kernfs node attr_mutex
> > > > > > > a pointer and dynamically allocate it but even that is
> > > > > > > too
> > > > > > > costly a size addition to the kernfs node structure as
> > > > > > > Tejun has said.
> > > > > > 
> > > > > > I guess the question to ask is, is there really a need to
> > > > > > call kernfs_refresh_inode() from functions that are usually
> > > > > > reading/checking functions.
> > > > > > 
> > > > > > Would it be sufficient to refresh the inode in the
> > > > > > write/set
> > > > > > operations in (if there's any) places where things like
> > > > > > setattr_copy() is not already called?
> > > > > > 
> > > > > > Perhaps GKH or Tejun could comment on this?
> > > > > 
> > > > > My memory is a bit hazy but invalidations on reads is how
> > > > > sysfs
> > > > > namespace is
> > > > > implemented, so I don't think there's an easy around that.
> > > > > The
> > > > > only
> > > > > thing I
> > > > > can think of is embedding the lock into attrs and doing xchg
> > > > > dance
> > > > > when
> > > > > attaching it.
> > > > 
> > > > Sounds like your saying it would be ok to add a lock to the
> > > > attrs structure, am I correct?
> > > > 
> > > > Assuming it is then, to keep things simple, use two locks.
> > > > 
> > > > One global lock for the allocation and an attrs lock for all
> > > > the
> > > > attrs field updates including the kernfs_refresh_inode()
> > > > update.
> > > > 
> > > > The critical section for the global lock could be reduced and
> > > > it
> > > > changed to a spin lock.
> > > > 
> > > > In __kernfs_iattrs() we would have something like:
> > > > 
> > > > take the allocation lock
> > > > do the allocated checks
> > > >   assign if existing attrs
> > > >   release the allocation lock
> > > >   return existing if found
> > > > othewise
> > > >   release the allocation lock
> > > > 
> > > > allocate and initialize attrs
> > > > 
> > > > take the allocation lock
> > > > check if someone beat us to it
> > > >   free and grab exiting attrs
> > > > otherwise
> > > >   assign the new attrs
> > > > release the allocation lock
> > > > return attrs
> > > > 
> > > > Add a spinlock to the attrs struct and use it everywhere for
> > > > field updates.
> > > > 
> > > > Am I on the right track or can you see problems with this?
> > > > 
> > > > Ian
> > > > 
> > > 
> > > umm, we update the inode in kernfs_refresh_inode, right??  So I
> > > guess
> > > the problem is how can we protect the inode when
> > > kernfs_refresh_inode
> > > is called, not the attrs??
> > 
> > But the attrs (which is what's copied from) were protected by the
> > mutex lock (IIUC) so dealing with the inode attributes implies
> > dealing with the kernfs node attrs too.
> > 
> > For example in kernfs_iop_setattr() the call to setattr_copy()
> > copies
> > the node attrs to the inode under the same mutex lock. So, if a
> > read
> > lock is used the copy in kernfs_refresh_inode() is no longer
> > protected,
> > it needs to be protected in a different way.
> > 
> 
> Ok, I'm actually wondering why the VFS holds exclusive i_rwsem for
> .setattr but
>  no lock for .getattr (misdocumented?? sometimes they have as you've
> found out)?
> What does it protect against?? Because .permission does a similar
> thing
> here -- updating inode attributes, the goal is to provide the same
> protection level
> for .permission as for .setattr, am I right???

As far as the documentation goes that's probably my misunderstanding
of it.

It does happen that the VFS makes assumptions about how call backs
are meant to be used.

Read like call backs, like .getattr() and .permission() are meant to
be used, well, like read like functions so the VFS should be ok to
take locks or not based on the operation context at hand.

So it's not about the locking for these call backs per se, it's about
the context in which they are called.

For example, in link_path_walk(), at the beginning of the component
lookup loop (essentially for the containing directory at that point),
may_lookup() is called which leads to a call to .permission() without
any inode lock held at that point.

But file opens (possibly following a path walk to resolve a path)
are different.

For example, do_filp_open() calls path_openat() which leads to a
call to open_last_lookups(), which leads to a call to .permission()
along the way. And in this case there are two contexts, an open()
create or one without create, the former needing the exclusive inode
lock and the later able to use the shared lock.

So it's about the locking needed for the encompassing operation that
is being done not about those functions specifically.

TBH the VFS is very complex and Al has a much, much better
understanding of it than I do so he would need to be the one to answer
whether it's the file systems responsibility to use these calls in the
way the VFS expects.

My belief is that if a file system needs to use a call back in a way
that's in conflict with what the VFS expects it's the file systems'
responsibility to deal with the side effects.

Ian