lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 15 May 2015 17:45:56 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	NeilBrown <neilb@...e.de>
Cc:	Andreas Dilger <adilger@...ger.ca>,
	Dave Chinner <david@...morbit.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC][PATCHSET v3] non-recursive pathname resolution & RCU symlinks

On Fri, May 15, 2015 at 4:30 PM, NeilBrown <neilb@...e.de> wrote:
>
> .. and I've been wondering what to do about i_mutex and NFS.  I've had
> customer reports of slowness in creating files that seems to be due to
> i_mutex on the directory being held over the whole 'create' RPC, so only one
> of those can be in flight at the one time.
> "make  -j" on a large source directory can easily want to create lots of
> "*.o" files at "the same time".
>
> And NFS doesn't need i_mutex at all because the server will provide the
> needed guarantees.

So i_mutex on a directory is probably the nastiest lock we have in the fs layer.

It's used for several different half-related things:

 - serialize filename creation/deletion

   This is partly for the benefit of the filesystem itself (and not
helpful for NFS, as you note), but it's also very much about making
sure we have uniqueness guarantees at the VFS layer too.

   So even with NFS, it's not just "the server provides the needed
guarantees", because some of the guarantees are really client-local.

   For example, simply that we only ever have one single dentry for a
particular name, and that we only ever have one active lookup per
dentry. Those things happen independently of - and before - the server
even sees the operation.

   So the whole local directory tree consistency ends up depending on this.

 - readdir(). This is mostly to make it hard for filesystems to do the
wrong thing when there is concurrent file creation.

I suspect readdir could fairly easily push the i_mutex down from the
caller and into the filesystem, and then filesystems might narrow down
the use (or even get rid of it). The initial patch might even be
automated with coccinelle. However, rather few loads actually have a
lot of readdir() activity, and samba is probably the only major one.
I've seen benchmarks where it matters, but they are rare (and I
haven't seen one in literally years).

So the readdir case could probably be at least relaxed fairly easily.
But the thing that tends to hurt on more loads is, as you note, the
filename lookup/creation/movement case. And that's much harder to fix.

Al, do you have any ideas? Personally, I've wanted to make I_mutex a
rwsem for a long time, but right now pretty much everything uses it
for exclusion. For example, filename lookup is clearly just reading
the directory, so it should take a rwsem for reading, right? No. Not
the way it is done now. Filename lookup wants the directory inode
exclusively because that guarantees that we create just one dentry and
call the filesystem ->lookup only once on that dentry.

Again, there tend to be no simple benchmarks or loads that people care
about that show this. Most of the time it's fairly hard to see.

                      Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ