lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150516144527.20b89194@notabene.brown>
Date:	Sat, 16 May 2015 14:45:27 +1000
From:	NeilBrown <neilb@...e.de>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andreas Dilger <adilger@...ger.ca>,
	Dave Chinner <david@...morbit.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU
 symlinks

On Sat, 16 May 2015 02:47:18 +0100 Al Viro <viro@...IV.linux.org.uk> wrote:

> On Sat, May 16, 2015 at 11:25:03AM +1000, NeilBrown wrote:
> > But surely those things can be managed with a spinlock.
> > 
> > I think a big part of the problem is that the VFS tries to control
> > filesystems rather than provide services to them.
> 
> What with being the thing syscalls talk to for sending the requests to
> filesystems...  Do you really want to push the pathname resolution into
> fs code?  You've looked at it lately, right?

Yes, I've looked lately :-)
I think that all of RCU-walk, and probably some of REF-walk should happen
before the filesystem gets to see anything.
But once you hit a non-positive dentry or the parent of the target name, I'd
rather hand over the the FS.

NFSv4 has the ability to look up multiple components in a single LOOKUP call.
VFS doesn't give it a chance to try because it wants to go step-by-step, and
wants each entry in the cache to have an inode etc.

The earlier the filesystem gets control, the less completely-general the VFS
needs to be.

> 
> > I'm not convinced that serialising 'lookup' calls is vital.  If two threads
> > find a 'not-validated' dentry, and both try to look up the inode, they
> > will both ultimately get the same struct_inode from the icache, and will both
> > succeed in connecting it to the dentry.  Obviously it would be better to
> > avoid two concurrent NFS "LOOKUP" requests, but that is a problem for NFS to
> > solve.  I suspect that using d_fsdata to point to a pending LOOKUP request
> > would allow the "second" thread to wait for that request to finish.  Other
> > filesystems would take a completely different approach.
> 
> See upthread regarding multiple negative dentries with the same name and fun
> consequences thereof.  There might be _NO_ inode.  At all.  dcache has a large
> negative component and without it you'd get really fucked on NFS as soon
> as you try to compile anything.  Shitloads of headers, looked up in a lot of
> directories.  Most of the lookups ending up negative.  We really do need that
> stuff...

Of course negative dentries are important and having multiple would be
unfortunate.  I don't suggest that for a moment.
I'm suggesting three different states for a dentry: positive, negative, don't
know.  "don't know" is a new state that isn't currently allowed.

While a filesystem is performing 'lookup', doing its own locking or not, the
dentry would be "don't know".  Anything that needed to know would block
somewhere in the filesystem code on whatever lock or waitqueue or whatever
that the filesystem developer felt as appropriate.  On i_mutex if
generic_foo() was in use.

If NFSv4 did a multi-component lookup, the intermediate dentries would be
"don't know" even while they had children.  For local filesystems, that sort
of thing would never happen.  For NFS - which has to allow for random changes
on the server anyway - it is just part of the game.

NeilBrown


Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ