linux-kernel - Re: [PATCH 0/6] kill-the-BKL/reiserfs3: performance improvements, faster than Bkl based scheme

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.0905011350500.5379@localhost.localdomain>
Date:	Fri, 1 May 2009 14:11:31 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Frederic Weisbecker <fweisbec@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Jeff Mahoney <jeffm@...e.com>,
	ReiserFS Development List <reiserfs-devel@...r.kernel.org>,
	Chris Mason <chris.mason@...cle.com>,
	Alexander Beregalov <a.beregalov@...il.com>,
	Alessio Igor Bogani <abogani@...ware.it>,
	Jonathan Corbet <corbet@....net>,
	Alexander Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH 0/6] kill-the-BKL/reiserfs3: performance improvements,
 faster than Bkl based scheme


On Fri, 1 May 2009, Ingo Molnar wrote:
> * Ingo Molnar <mingo@...e.hu> wrote:
> 
> > The BKL is clearly removed at a faster reate with such debugging 
> > measures in place. With such measures the BKL _really_ hurts, and 
> > very visibly so - and that results in active removal.
> 
> Btw., if you can think of any way to create a cleaner debug tool 
> here i'd be glad to start a new tree that adds only _that_ tool - 
> and that could perhaps be dealt with independently and possibly 
> pulled upstream too, if clean enough.

Quite frankly, for something as clearly defined as as a filesystem, I 
would literally suggest starting out with a few trivial stages:

 Stage 1:
 - raw search-and-replace of [un]lock_kernel()

	sed 's/unlock_kernel()/reiserfs_unlock(sb)/g'
	sed 's/lock_kernel()/reiserfs_lock(sb)/g'

   and then obviously fix it up so that 'sb' always exists (ie you would 
   need to add a a few

	struct super_block *sb = inode->i_sb;

   lines manually to make the end result work.

 - add that 'reiserfs_[un]lock()' pair around every single VFS entry-point 
   that currently gets called with lock_kernel(). They'd _still_ get 
   called with lock-kernel too (you're not fixing that), but now they'd 
   _also_ take the new lock.

 - make reiserfs_unlock(sb) just do [un]lock_kernel(), and use that as a 
   known good starting point. Nothing has really changed (and you in fact 
   _increased_ the nesting on the BKL, since now the BKL is gotten twice 
   in those areas that were called from the VFS with the BKL held), but 
   you now have a mindless and pretty trivial conversion where you still 
   have a working system, and nothing really changed - except you now have 
   the beginnings of a nice abstraction.

IOW, "stage 1" is all totally mindless, and never breaks anything. It's 
very much designed to be "obviously correct".

 Stage 2:
 - switch over reiserfs_[un]lock() to a per-superblock mutex, and enable 
   lockdep, and start looking for nesting problems.

IOW, "stage 2" is when you make a _minimal_ change to now enable all the 
good lockdep infrastructure. But it's also designed to just do _one_ 
thing: look at nesting. Nothing else.

 Stage 3:
 - once you've fixed all nesting problems (well, most of them - enable a 
   swapfile on that filesystem and I bet you'll find a few more), you 
   might want to look at performance issues. In particular, you want to 
   drop the lock over at least the most _common_ blocking operations, if 
   at all possible.

And stage 3 is where it would make absolutely _tons_ of sense to just add 
some very simple infrastructure for having a per-thread "IO warning 
counter", and then simply increment/decrement that counter in 
reiserfs_[un]lock().

But exactly because we do NOT want to get warnings about BKL use in all 
the _other_ subsystems, we don't want to mix this up with the BKL counter 
that we already have. You could probably even use the tracing 
infrastructure to do this: enable tracing in reiserfs_lock(), disable it 
in reiserfs_unlock(), and look at what you catch in between.

I dunno. That's how I would go about it if I just wanted to clean up a 
specific subsystem, and didn't want to tie it together with getting 
everything else right too.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/