[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1245386680.2560.416.camel@ymzhang>
Date: Fri, 19 Jun 2009 12:44:40 +0800
From: "Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To: Jens Axboe <jens.axboe@...cle.com>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
chris.mason@...cle.com, david@...morbit.com, hch@...radead.org,
akpm@...ux-foundation.org, jack@...e.cz, richard@....demon.co.uk,
damien.wyart@...e.fr, dedekind1@...il.com, fweisbec@...il.com
Subject: Re: [PATCH 0/15] Per-bdi writeback flusher threads v10
On Thu, 2009-06-18 at 14:35 +0200, Jens Axboe wrote:
> On Thu, Jun 18 2009, Zhang, Yanmin wrote:
> > On Thu, 2009-06-18 at 07:13 +0200, Jens Axboe wrote:
> > > On Thu, Jun 18 2009, Zhang, Yanmin wrote:
> > > > On Tue, 2009-06-16 at 21:53 +0200, Jens Axboe wrote:
> > > > > On Tue, Jun 16 2009, Jens Axboe wrote:
> > > > > > On Tue, Jun 16 2009, Zhang, Yanmin wrote:
> > > > > > > On Fri, 2009-06-12 at 14:54 +0200, Jens Axboe wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Here's the 10th version of the writeback patches. Changes since v9:
> > > > > > > >
> > > > > > > > - Fix bdi task exit race leaving work on the list, flush it after we
> > > > > > > > know we cannot be found anymore.
> > > > > > > > - Rename flusher tasks from bdi-foo to flush-foo. Should make it more
> > > > > > > > clear to the casual observer.
> > > > > > > > - Fix a problem with the btrfs bdi register patch that would spew
> > > > > > > > warnings for > 1 mounted btrfs file system.
> > > > > > > > - Rebase to current -git, there were some conflicts with the latest work
> > > > > > > > from viro/hch.
> > > > > > > > - Fix a block layer core problem were stacked devices would overwrite
> > > > > > > > the bdi state, causing problems and warning spew.
> > > > > > > > - In bdi_writeback_all(), in the race occurence of a work allocation
> > > > > > > > failure, restart scanning from the beginning. Then we can drop the
> > > > > > > > bdi_lock mutex before diving into bdi specific writeback.
> > > > > > > > - Convert bdi_lock to a spinlock.
> > > > > > > > - Use spin_trylock() in bdi_writeback_all(), if this isn't a data
> > > > > > > > integrity writeback. Debatable, I kind of like it...
> > > > > > > > - Get rid of BDI_CAP_FLUSH_FORKER, just check for match with the
> > > > > > > > default_backing_dev_info.
> > > > > > > > - Fix race in list checking in bdi_forker_task().
> > > > > > > >
> > > > > > > >
> > > > > > > > For ease of patching, I've put the full diff here:
> > > > > > > >
> > > > > > > > http://kernel.dk/writeback-v10.patch
> > > > > > > Jens,
> > > > > > >
> > > > > > > I applied the patch to 2.6.30 and got a confliction. The attachment is
> > > > > > > the patch I ported to 2.6.30. Did I miss anything?
> > > > > > >
> > > > > > >
> > > > > > > With the patch, kernel reports below messages on 2 machines.
> > > > > > >
> > > > > > > INFO: task sync:29984 blocked for more than 120 seconds.
> > > > > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > sync D ffff88002805e300 6168 29984 24581
> > > > > > > ffff88022f84b780 0000000000000082 7fffffffffffffff ffff880133dbfe70
> > > > > > > 0000000000000000 ffff88022e2b4c50 ffff88022e2b4fd8 00000001000c7bb8
> > > > > > > ffff88022f513fd0 ffff880133dbfde8 ffff880133dbfec8 ffff88022d5d13c8
> > > > > > > Call Trace:
> > > > > > > [<ffffffff802b69e4>] ? bdi_sched_wait+0x0/0xd
> > > > > > > [<ffffffff80780fde>] ? schedule+0x9/0x1d
> > > > > > > [<ffffffff802b69ed>] ? bdi_sched_wait+0x9/0xd
> > > > > > > [<ffffffff8078158d>] ? __wait_on_bit+0x40/0x6f
> > > > > > > [<ffffffff802b69e4>] ? bdi_sched_wait+0x0/0xd
> > > > > > > [<ffffffff80781628>] ? out_of_line_wait_on_bit+0x6c/0x78
> > > > > > > [<ffffffff8024a426>] ? wake_bit_function+0x0/0x23
> > > > > > > [<ffffffff802b67ac>] ? bdi_writeback_all+0x12a/0x152
> > > > > > > [<ffffffff802b6805>] ? generic_sync_sb_inodes+0x31/0xde
> > > > > > > [<ffffffff802b6935>] ? sync_inodes_sb+0x83/0x88
> > > > > > > [<ffffffff802b6980>] ? __sync_inodes+0x46/0x8f
> > > > > > > [<ffffffff802b94f2>] ? do_sync+0x36/0x5a
> > > > > > > [<ffffffff802b9538>] ? sys_sync+0xe/0x12
> > > > > > > [<ffffffff8020b9ab>] ? system_call_fastpath+0x16/0x1b
> > > > > >
> > > > > > I don't think it is your backport, for some reason the v10 missed a
> > > > > > change that I think could solve this race. If not, there's another in
> > > > > > there that I need to look at.
> > > > > >
> > > > > > So against your current base, could you try with the below added as
> > > > > > well? The printk() is just so we can see if this triggers for you or
> > > > > > not.
> > > > >
> > > > > OK that wont work, since we need to actually wait for the work to be
> > > > > flushed, otherwise we wreak things when we free the bdi immediately
> > > > > after that.
> > > > >
> > > > > Can you try with this patch?
> > > > Jens,
> > > >
> > > > I tested below patch on 4 machines (run all fio sub-test cases twice which
> > > > need more than 10 hours). The previous 2 machines don't stop this time.
> > > > Unfortunately, the 3rd machine stops. I double-check the disassembled codes
> > > > of kernel and make sure bdi_start_fn really calls wb_do_writeback.
> > >
> > > Sorry I should have made that more clear when posting v11. This patch
> > > wont fully solve the problem, however the v11 patch series should. So if
> > > you test with that, hopefully all soft hangs should be gone.
> > Ok. I will start new testing against V11. I also add some debugging codes into
> > V11.
>
> Great, thanks! There's a small issue with v11 that you should be aware
> of. The test for bdi_add_default_flusher_task() was inverted. I'm
> attaching a diff at the end. The interesting bit is the 2nd hunk of
> backing-dev.c, the others are just a cleanup.
Jens,
I did entensive testing with fio (especially the aio randread which triggers
the hang)/ffsb and a couple of other testing and didn't hit the hang issue.
So V11 does fix the issue.
>>From performance point of view, there is no big difference than old versions.
Yanmin
>
> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> index 6815f8b..e623c57 100644
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -107,7 +107,6 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
> long nr_pages, enum writeback_sync_modes sync_mode);
> int bdi_writeback_task(struct bdi_writeback *wb);
> void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc);
> -void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
> void bdi_add_flusher_task(struct backing_dev_info *bdi);
> int bdi_has_dirty_io(struct backing_dev_info *bdi);
>
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index b4517ee..c2eec72 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -37,6 +37,8 @@ static int bdi_sync_supers(void *);
> static void sync_supers_timer_fn(unsigned long);
> static void arm_supers_timer(void);
>
> +static void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
> +
> #ifdef CONFIG_DEBUG_FS
> #include <linux/debugfs.h>
> #include <linux/seq_file.h>
> @@ -496,7 +498,7 @@ static int bdi_forker_task(void *ptr)
> list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
> if (bdi->wb.task)
> continue;
> - if (!list_empty(&bdi->work_list) &&
> + if (list_empty(&bdi->work_list) &&
> !bdi_has_dirty_io(bdi))
> continue;
>
> @@ -607,7 +609,7 @@ static int flusher_add_helper_test(struct backing_dev_info *bdi)
> * Add the default flusher task that gets created for any bdi
> * that has dirty data pending writeout
> */
> -void bdi_add_default_flusher_task(struct backing_dev_info *bdi)
> +void static bdi_add_default_flusher_task(struct backing_dev_info *bdi)
> {
> bdi_add_one_flusher_task(bdi, flusher_add_helper_test);
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists