[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.00.0804021422570.14670@woody.linux-foundation.org>
Date: Wed, 2 Apr 2008 14:31:46 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>
cc: viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org
Subject: Re: [PATCH]: Fix SMP-reordering race in mark_buffer_dirty
On Wed, 2 Apr 2008, Mikulas Patocka wrote:
>
> So you're right, the gain of mfence is so little that you can remove it
> and use only test_set_buffer_dirty.
Well, I suspect that part of the issue is that quite often you end up
with *both* because the buffer wasn't already dirty from before.
Re-dirtying a dirty buffer is pretty common for things like bitmap blocks
etc, so it's probably a worthy optimization if it has no cost, and on
Core2 I suspect your version is worth it, but it's not like it's going to
be necessarily a 99% kind of case. I suspect quite a lot of the
mark_buffer_dirty() calls are actually on clean buffers.
(Of course, a valid argument is that if it was already dirty, we'll skip
the other expensive parts, so only the "already dirty" case is worth
optimizing for. Maybe true. There might also be cases where it means one
less dirty cacheline in memory.)
> I don't know if there are other architectures where smb_mb() would be
> significantly faster than test_and_set_bit.
Probably none, since it test_and_set_bit() implies a smp_mb(), and
generally the bigger cost is in the barrier than in the bit setting
itself.
Core 2 is the outlier in having a noticeably faster "mfence" than atomic
instructions (and judging by noises Intel makes, Nehalem will undo that
outlier).
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists