[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1157555559.23501.25.camel@dyn9047017100.beaverton.ibm.com>
Date: Wed, 06 Sep 2006 08:12:38 -0700
From: Badari Pulavarty <pbadari@...ibm.com>
To: Jan Kara <jack@...e.cz>
Cc: Andrew Morton <akpm@...l.org>,
Anton Altaparmakov <aia21@....ac.uk>, sct@...hat.com,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>,
ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [RFC][PATCH] set_page_buffer_dirty should skip unmapped buffers
On Wed, 2006-09-06 at 14:47 +0200, Jan Kara wrote:
> Hi,
>
> > On Fri, 2006-09-01 at 10:18 -0700, Andrew Morton wrote:
> > > > > > Kernel BUG at fs/buffer.c:2791
> > > > > > invalid opcode: 0000 [1] SMP
> > > > > >
> > > > > > Its complaining about BUG_ON(!buffer_mapped(bh)).
> >
> > Here is the change that seems to cause the problem. Jana Kara
> > introduced a new mode "SWRITE" for ll_rw_block() - where it
> > waits for buffer to be unlocked (WRITE will skip locked
> > buffers) + jbd journal_commit_transaction() has been changed
> > to make use of SWRITE.
> >
> > http://marc.theaimsgroup.com/?l=linux-fsdevel&m=112109788702895&w=2
> >
> > Theoritically same race (between journal_commit_transaction() and
> > journal_unmap_buffer()+set_page_dirty()) could exist before his changes
> > - which should trigger bug in submit_bh(). But I can't seem to hit
> > it without his changes. My guess is ll_rw_block() is always skipping
> > those cleaned up buffers, before page gets dirtied again ..
> I think that the change to ll_rw_block() just widens the window much
> more...
>
Yes.
> > Andrew, what should we do ? Do you suggest handling this in jbd
> > itself (like this patch) ?
> Actually that part of commit code needs rewrite anyway (and after that
> rewrite you get rid of ll_rw_block()) because of other problems - the
> code assumes that whenever buffer is locked, it is being written to disk
> which is not true... I have some preliminary patches for that but they
> are not very nice and so far I didn't have enough time to find a nice
> solution.
Are you okay with current not-so-elegant fix ?
Thanks,
Badari
Patch to fix: Kernel BUG at fs/buffer.c:2791
on 1k (2k) filesystems while running fsx.
journal_commit_transaction collects lots of dirty buffer from
and does a single ll_rw_block() to write them out. ll_rw_block()
locks the buffer and checks to see if they are dirty and submits
them for IO.
In the mean while, journal_unmap_buffers() as part of
truncate can unmap the buffer and throw it away. Since its
a 1k (2k) filesystem - each page (4k) will have more than
one buffer_head attached to the page and and we can't free
up buffer_heads attached to the page (if we are not
invalidating the whole page).
Now, any call to set_page_dirty() (like msync_interval)
could end up setting all the buffer heads attached to
this page again dirty, including the ones those got
cleaned up :(
So, -not-so-elegant fix would be to have make jbd skip all
the buffers that are not mapped.
Signed-off-by: Badari Pulavarty <pbadari@...ibm.com>
---
fs/jbd/commit.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
Index: linux-2.6.18-rc5/fs/jbd/commit.c
===================================================================
--- linux-2.6.18-rc5.orig/fs/jbd/commit.c 2006-08-27 20:41:48.000000000 -0700
+++ linux-2.6.18-rc5/fs/jbd/commit.c 2006-09-01 10:43:54.000000000 -0700
@@ -160,6 +160,34 @@ static int journal_write_commit_record(j
return (ret == -EIO);
}
+static void jbd_write_buffers(int nr, struct buffer_head *bhs[])
+{
+ int i;
+
+ for (i = 0; i < nr; i++) {
+ struct buffer_head *bh = bhs[i];
+
+ lock_buffer(bh);
+
+ /*
+ * case 1: Buffer could have been flushed by now,
+ * if so nothing to do for us.
+ * case 2: Buffer could have been unmapped up by
+ * journal_unmap_buffer - but still attached to the
+ * page. Any calls to set_page_dirty() would dirty
+ * the buffer even though its not mapped. If so,
+ * we need to skip them.
+ */
+ if (buffer_mapped(bh) && test_clear_buffer_dirty(bh)) {
+ bh->b_end_io = end_buffer_write_sync;
+ get_bh(bh);
+ submit_bh(WRITE, bh);
+ continue;
+ }
+ unlock_buffer(bh);
+ }
+}
+
/*
* journal_commit_transaction
*
@@ -356,7 +384,7 @@ write_out_data:
jbd_debug(2, "submit %d writes\n",
bufs);
spin_unlock(&journal->j_list_lock);
- ll_rw_block(SWRITE, bufs, wbuf);
+ jbd_write_buffers(bufs, wbuf);
journal_brelse_array(wbuf, bufs);
bufs = 0;
goto write_out_data;
@@ -379,7 +407,7 @@ write_out_data:
if (bufs) {
spin_unlock(&journal->j_list_lock);
- ll_rw_block(SWRITE, bufs, wbuf);
+ jbd_write_buffers(bufs, wbuf);
journal_brelse_array(wbuf, bufs);
spin_lock(&journal->j_list_lock);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists