linux-kernel - Re: [patch 00/27] [rfc] vfs scalability patchset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090425041829.GX8633@ZenIV.linux.org.uk>
Date:	Sat, 25 Apr 2009 05:18:29 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	npiggin@...e.de
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 00/27] [rfc] vfs scalability patchset

On Sat, Apr 25, 2009 at 11:20:20AM +1000, npiggin@...e.de wrote:
> Here is my current patchset for improving vfs locking scalability. Since
> last posting, I have fixed several bugs, solved several more problems, and
> done an initial sweep of filesystems (autofs4 is probably the trickiest,
> and unfortunately I don't have a good test setup here for that yet, but
> at least I've looked through it).
> 
> Also started to tackle files_lock, vfsmount_lock, and inode_lock.
> (I included my mnt_want_write patches before the vfsmount_lock scalability
> stuff because that just made it a bit easier...). These appear to be the
> problematic global locks in the vfs.
> 
> It's running stably here so far on basic stress testing here on several file
> systems (xfs, tmpfs, ext?). But it still might eat your data of course.
> 
> Would be very interested in any feedback.

First of all, I happily admit that wrt locking I'm a barbarian, and proud
of it.  I.e. simpler locking scheme beats theoretical improvement, unless
we have really good evidence that there's a real-world problem.  All things
equal, complexity loses.  All things not quite equal - ditto.  Amount of
fuckups is at least quadratic by the number of lock types, with quite a big
chunk on top added by each per-something kind of lock.

Said that, I like mnt_want_write part, vfsmount_lock splitup (modulo
several questions) and _maybe_ doing something about files_lock.
Like as in "would seriously consider merging next cycle".  I'd keep
dcache and icache parts separate for now.

However, files_lock part 2 looks very dubious - if nothing else, I would
expect that you'll get *more* cross-CPU traffic that way, since the CPU
where final fput() runs will correlate only weakly (if at all) with one
where open() had been done.  So you are getting more cachelines bouncing.
I want to see the numbers for this one, and on different kinds of loads,
but as it is I've very sceptical.  BTW, could you try to collect stats
along the lines of "CPU #i has done N_{i,j} removals from sb list for
files that had been in list #j"?

Splitting files_lock on per-sb basis might be an interesting variant, too.

Another thing: could you pull outright bugfixes as early as possible in the
queue?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/