linux-kernel - Re: [PATCH]: Fix SMP-reordering race in mark_buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 2 Apr 2008 19:12:07 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	mikulas@...ax.karlin.mff.cuni.cz, viro@...iv.linux.org.uk,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH]: Fix SMP-reordering race in mark_buffer_dirty

On Wed, 2 Apr 2008 16:52:14 -0700 (PDT) Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> 
> 
> On Wed, 2 Apr 2008, Andrew Morton wrote:
> > 
> > But then the test-and-set of an already-set flag would newly cause the
> > cacheline to be dirtied, requiring additional bus usage to write it back?
> 
> Looking around a bit, I don't see any realistic case where this could 
> possibly be the case and that is performance-sensitive.

You sure?  A pretty common case would be overwrite of an already-dirty page
and from a quick read the only place where we modify bh.b_state is the
set_buffer_uptodate() and clear_buffer_new() in __block_commit_write(),
both of which could/should be converted to use the same trick.  Like
__block_prepare_write(), which already does

	if (!buffer_uptodate(bh))
		set_buffer_uptodate(bh);

What happened here was back in about, umm, 2001 we discovered one or two
code paths which when optimised in this way led to overall-measurably (not
just oprofile-measurably) improvements.  I don't recall which ones they
were.

So we then said oh-goody and sprinkled the same pattern all over the place
on the off-chance.  But I'm sure that over the ages we've let that
optimisation rot (witness __block_commit_write() above).

These were simpler times, and we didn't worry our little heads overly much
about this ordering stuff.

> The VFS-level uses of mark_buffer_dirty() seem to all be coupled with 
> other uses that clear or set other bits in the buffer status word, so the 
> cacheline will always apparently be dirty. 

As I say, I expect we could fix this if we want to.  The key point here is
that a page overwrite does not do lock_buffer(), so it should be possible
to do the whole operation without modifying bh.b_state.  If we wish to do
that.

> The low-level filesystems sometimes do it for things like block bitmap 
> changes, and I could imagine that there it actually (a) is no longer in 
> the cache and (b) the buffer really was dirty to start with, but in ext3 
> for example, you'd end up in journal_dirty_metadata which spinlocks on the 
> BH_State bit first etc etc.

Yeah, ext3 is probably a lost cause.  Anyone who cares about performance is
using ext2 and a UPS ;)

> So the cacheline *will* be dirty, and this function doesn't seem like it 
> could possibly ever show up in a real profile for any real load anyway, so 
> it seems odd to try to optimize it this way.
> 

I don't think the world would end if we took it out.  Particularly as not
many people use ext2 and we already broke it.

The trick will be to hunt down all the other places where we did a similar
thing and check them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/