lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 13 Jul 2015 19:54:28 -0500
From:	Ben Myers <bpm@....com>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [RFC] freeing unlinked file indefinitely delayed

Hey Al,

On Mon, Jul 13, 2015 at 08:56:46PM +0100, Al Viro wrote:
> On Mon, Jul 13, 2015 at 01:17:51PM -0500, Ben Myers wrote:
> > > For one thing, this patch does *not* check for i_nlink at all.
> > 
> > I agree that no checking of i_nlink has the advantage of brevity.
> > Anyone who is using dentry.d_fsdata with an open_by_handle workload (if
> > there are any) will be affected.
> 
> Translate, please.  What does d_fsdata have to anything above?

If my understanding of your patch is correct, any disconnected dentries will be
killed immediately at dput instead of being put on the dentry lru.  I have some
code that I think may rely on the old behavior with regard to
DCACHE_DISCONNECTED.  That is, the disconnected dentry sits on the lru and there
are lots of open_by_handle calls and nothing to keep a ref on the inode between
open_by_handle/close except that dentry's ref.  Maybe there are better ways to
do that and I just need to look for a workaround.

I wonder if there are others who will be affected by this change in behavior.
In particular, if there are filesystems that need to keep internal state in
d_fsdata across open_by_handle calls.  They won't be able to use d_fsdata
very well anymore for this workload.

> > > For another, there's no such thing as 'filesystems internal lock' for
> > > i_nlink protection - that's handled by i_mutex...  And what does
> > > iget() have to do with any of that?
> > 
> > i_mutex is good enough only for local filesystems.
> > Network/clustered/distributed filesystems need to take an internal lock
> > to provide exclusion for this .unlink with a .link on another host.
> > That's where I'm coming from with iget().  
> > 
> > Maybe plumbing i_op.unlink with another argument to return i_nlink is
> > something to consider?  A helper for the few filesystems that need to do
> > this might be good enough in the near term.
> 
> ????
> 
> a) iget() had been gone since way back
> b) it never had been called by VFS - it's a filesystem's responsibility
> c) again, what the hell does iget() or its replacements have to do with
> dentry eviction?  It does *NOT* affect dentry refcount.  Never had.

I misspoke with regard to iget.  It is the locking for i_nlink that I am
concerned about.

> d) checks for _inode_ retention in icache are done by filesystem code, which
> is certainly free to use its locks.  Incidentally, for normal filesystems
> no locks are needed at all - everything that changes ->i_nlink is holding
> a referfence to in-core inode, so in a situation when its refcount is zero
> and ->i_lock is held (providing an exclusion with icache lookups), ->i_nlink
> is guaranteed to be stable.

Thanks for that explanation, that's what I needed.  For filesystems which have a
distributed lock manager involved on multiple hosts, i_nlink is not stable on an
individual host even with i_lock held.  You would also need to hold whatever
finer-grained lock the filesystem is protecting i_nlink with.  So the VFS
_can't_ know when i_nlink has gone to zero, only the filesystem can.  That's
what I was trying to get at.

> e) why would VFS possibly want to know if there are links remaining after
> successful ->unlink()?

The VFS wouldn't.  I guess the argument goes that it would be nice to keep the
old behavior and still cache disconnected dentries, and leave it to the
filesystems which require it to destroy the anonymous dentries in ->unlink when
they they know that i_nlink has gone to zero, having taken whatever internal
locks they need to use to adequately protect i_nlink from other hosts.  Thats
where a helper to prune disconnected dentries would come in handy.
 
> I'm sorry, but you are not making any sense...

I'm sorry for the confusion, it took me a little awhile to digest this.  The
change in caching behavior could affect more than nfs.  Any filesystem which
expects disconnected dentries to remain cached is affected.

Thanks,
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ