[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080605135413.GI8942@skywalker>
Date: Thu, 5 Jun 2008 19:24:13 +0530
From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
To: Jan Kara <jack@...e.cz>
Cc: cmm@...ibm.com, linux-ext4@...r.kernel.org
Subject: Re: [PATCH] ext4: Fix delalloc sync hang with journal lock
inversion
On Mon, Jun 02, 2008 at 12:27:59PM +0200, Jan Kara wrote:
> On Mon 02-06-08 15:29:56, Aneesh Kumar K.V wrote:
> > On Mon, Jun 02, 2008 at 11:35:00AM +0200, Jan Kara wrote:
> > > > BUG_ON(buffer_locked(bh));
> > > > if (buffer_dirty(bh))
> > > > mpage_add_bh_to_extent(mpd, logical, bh);
> > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > index 789b6ad..655b8bf 100644
> > > > --- a/mm/page-writeback.c
> > > > +++ b/mm/page-writeback.c
> > > > @@ -881,7 +881,12 @@ int write_cache_pages(struct address_space *mapping,
> > > > pagevec_init(&pvec, 0);
> > > > if (wbc->range_cyclic) {
> > > > index = mapping->writeback_index; /* Start from prev offset */
> > > > - end = -1;
> > > > + /*
> > > > + * write only till the specified range_end even in cyclic mode
> > > > + */
> > > > + end = wbc->range_end >> PAGE_CACHE_SHIFT;
> > > > + if (!end)
> > > > + end = -1;
> > > > } else {
> > > > index = wbc->range_start >> PAGE_CACHE_SHIFT;
> > > > end = wbc->range_end >> PAGE_CACHE_SHIFT;
> > > Are you sure you won't break other users of range_cyclic with this
> > > change?
> > >
> > I haven't run any specific test to verify that. The concern was that if
> > we force cyclic mode for writeout in delalloc we may be starting the
> > writeout from a different offset than specified and would be writing
> > more. So the changes was to use the offset specified. A quick look at
> > the kernel suggested most of them had range_end as 0 with cyclic_mode.
> > I haven't audited the full kernel. I will do that. Meanwhile if you
> > think it is risky to make this changes i guess we should drop this
> > part. But i guess we can keep the below change
> Hmm, I've just got an idea that it may be better to introduce a new flag
> for wbc like range_cont and it would mean that we start scan at
> writeback_index (we use range_start if writeback_index is not set) and
> end with range_end. That way we don't have to be afraid of interference
> with other range_cyclic users and in principle, range_cyclic is originally
> meant for other uses...
>
something like below ?. With this ext4_da_writepages have
pgoff_t writeback_index = 0;
.....
if (!wbc->range_cyclic) {
/*
* If range_cyclic is not set force range_cont
* and save the old writeback_index
*/
wbc->range_cont = 1;
writeback_index = mapping->writeback_index;
mapping->writeback_index = 0;
}
...
mpage_da_writepages(..)
..
if (writeback_index)
mapping->writeback_index = writeback_index;
return ret;
mm: Add range_cont mode for writeback.
From: Aneesh Kumar K.V <aneesh.kumar@...ux.vnet.ibm.com>
Filesystems like ext4 needs to start a new transaction in
the writepages for block allocation. This happens with delayed
allocation and there is limit to how many credits we can request
from the journal layer. So we call write_cache_pages multiple
times with wbc->nr_to_write set to the maximum possible value
limitted by the max journal credits available.
Add a new mode to writeback that enables us to handle this
behaviour. If mapping->writeback_index is not set we use
wbc->range_start to find the start index and then at the end
of write_cache_pages we store the index in writeback_index. Next
call to write_cache_pages will start writeout from writeback_index.
Also we limit writing to the specified wbc->range_end.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@...ux.vnet.ibm.com>
---
include/linux/writeback.h | 1 +
mm/page-writeback.c | 10 +++++++++-
2 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index f462439..0d8573e 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -63,6 +63,7 @@ struct writeback_control {
unsigned for_writepages:1; /* This is a writepages() call */
unsigned range_cyclic:1; /* range_start is cyclic */
unsigned more_io:1; /* more io to be dispatched */
+ unsigned range_cont:1;
};
/*
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 789b6ad..014a9f2 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -882,6 +882,12 @@ int write_cache_pages(struct address_space *mapping,
if (wbc->range_cyclic) {
index = mapping->writeback_index; /* Start from prev offset */
end = -1;
+ } else if (wbc->range_cont) {
+ if (!mapping->writeback_index)
+ index = wbc->range_start >> PAGE_CACHE_SHIFT;
+ else
+ index = mapping->writeback_index;
+ end = wbc->range_end >> PAGE_CACHE_SHIFT;
} else {
index = wbc->range_start >> PAGE_CACHE_SHIFT;
end = wbc->range_end >> PAGE_CACHE_SHIFT;
@@ -954,7 +960,9 @@ int write_cache_pages(struct address_space *mapping,
index = 0;
goto retry;
}
- if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))
+ if (wbc->range_cyclic ||
+ (range_whole && wbc->nr_to_write > 0) ||
+ wbc->range_cont)
mapping->writeback_index = index;
return ret;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists