linux-kernel - Re: [PATCH -tip] remove the BKL: Replace BKL in mount/umount syscalls with a mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090417180832.GL26366@ZenIV.linux.org.uk>
Date:	Fri, 17 Apr 2009 19:08:32 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Ingo Molnar <mingo@...e.hu>,
	Alessio Igor Bogani <abogani@...ware.it>,
	Alexander Viro <viro@....linux.org.uk>,
	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH -tip] remove the BKL: Replace BKL in mount/umount
	syscalls with a mutex

On Fri, Apr 17, 2009 at 10:21:06AM -0700, Linus Torvalds wrote:

> Of course, right now we do hold the BKL over _multiple_ downcalls, so in 
> that sense it's not actually totally 100% correct and straightforward to 
> just move it down. Eg in the generic_shutdown_super() case we do
> 
> 	lock_kernel();
> 	  ->write_super();
> 	  ->put_super();
> 	invalidate_inodes();
> 	unlock_kernel();
> 
> and obviously if we split it up so that we push a lock_kernel() into both, 
> we end up unlocking in between. I doubt anything cares, but it's still a 
> technical difference.

No, that's OK.  Anything that would expect on lack of blocking between
the callers of ->write_super() and ->put_super() is simply insane.  Not
that other callers of ->write_super() had been under BKL, while we are
at it...

> There are similar issues with 'remount' holding the BKL over longer 
> sequences.
> 
> Btw, the superblock code really does seem to depend on lock_kernel. Those 
> "sb->s_flags" accesses are literally not protected by anything else afaik.

Modifications in there *should* be protected by ->s_umount.  Except that
emergency_remount() does down_read() instead of down_write(), for some
reason.  And that fs going r/o on error very likely will not hold any
locks at all, BKL included.

Note that most of the readers really couldn't care less about protection.
Single-shot tests for one bit like "is this fs mounted noatime right now?" 
are OK as is - we don't *care* if it races with remount and no way to
do anything about such race anyway.

Read-only is the main exception; we should be mostly OK since the per-vfsmount
r/o rework, but "I have an error and I'll go r/o now" stuff is still messy.

> That said, I think that fs/locks.c is likely a much bigger issue. Very few 
> people care about any realtimeness of mount/unmount/remount. But file 
> locking? That is much more likely to be an issue.

That is much more likely to require really non-trivial work, BTW.  That code
is a *mess* and inventing sane locking for it will be painful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/