[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.00.0804030724210.14670@woody.linux-foundation.org>
Date: Thu, 3 Apr 2008 07:34:37 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andrew Morton <akpm@...ux-foundation.org>
cc: mikulas@...ax.karlin.mff.cuni.cz, viro@...iv.linux.org.uk,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH]: Fix SMP-reordering race in mark_buffer_dirty
On Wed, 2 Apr 2008, Andrew Morton wrote:
>
> You sure? A pretty common case would be overwrite of an already-dirty page
> and from a quick read the only place where we modify bh.b_state is the
> set_buffer_uptodate() and clear_buffer_new() in __block_commit_write(),
> both of which could/should be converted to use the same trick. Like
> __block_prepare_write(), which already does
>
> if (!buffer_uptodate(bh))
> set_buffer_uptodate(bh);
Well, that particular optimization is safe, but it's safe because
"uptodate" is sticky. Once it gets set, it is never reset.
But in general, it's simply a bad idea to do
if (read)
atomic-read-modify-write;
because it so often has races. This is pretty much exactly the same bug as
we had not long ago with
if (!waitqueue_empty(..))
wake_up(..);
and for very similar reasons - the "read" part is very fast, yes, but it's
also by definition not actually doing all the careful things that the
atomic operation (whether a CPU-atomic one, or a software-written atomic
with a spinlock one) does.
> What happened here was back in about, umm, 2001 we discovered one or two
> code paths which when optimised in this way led to overall-measurably (not
> just oprofile-measurably) improvements. I don't recall which ones they
> were.
>
> So we then said oh-goody and sprinkled the same pattern all over the place
> on the off-chance. But I'm sure that over the ages we've let that
> optimisation rot (witness __block_commit_write() above).
And the problem is that
if (!buffer_uptodate(bh))
set_buffer_uptodate(bh);
really isn't "the same" optimization at all as
if (!buffer_dirty(bh) && test_and_set_buffer_dirty(bh)) {
..
and the latter is simply fundamentally different.
> As I say, I expect we could fix this if we want to. The key point here is
> that a page overwrite does not do lock_buffer(), so it should be possible
> to do the whole operation without modifying bh.b_state. If we wish to do
> that.
Well, if we really want to do this op, then I'd rather make the code be
really obvious what the smp_mb is about, but also make sure that we don't
unnecessarily do *both* the smp_mb and the actual already-serialized bit
operation.
But I'd be even happier if we only did these kinds of things when we have
real performance-data that they help.
Linus
---
fs/buffer.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 9819632..39ff144 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1181,7 +1181,20 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size)
void mark_buffer_dirty(struct buffer_head *bh)
{
WARN_ON_ONCE(!buffer_uptodate(bh));
- if (!buffer_dirty(bh) && !test_set_buffer_dirty(bh))
+
+ /*
+ * Very *carefully* optimize the it-is-already-dirty case.
+ *
+ * Don't let the final "is it dirty" escape to before we
+ * perhaps modified the buffer.
+ */
+ if (buffer_dirty(bh)) {
+ smp_mb();
+ if (buffer_dirty(bh))
+ return;
+ }
+
+ if (!test_set_buffer_dirty(bh))
__set_page_dirty(bh->b_page, page_mapping(bh->b_page), 0);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists