[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0a9a62a43c4419687c69fff15a7c043@SGPMBX1004.APAC.bosch.com>
Date: Wed, 22 Jun 2016 11:55:12 +0000
From: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
To: "jack@...e.cz" <jack@...e.cz>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
Theodore Ts'o <tytso@....edu>
Subject: FW: ext4 out of order when use cfq scheduler
Hi Kara,
I saw the patch " fix data exposure after a crash " in the kernel 4.6.2 release.
I remember you provided 2 more patches for performance optimization. Could you tell me whether they are necessary ?
Thanks.
Best regards
Weller HUANG
-----Original Message-----
From: Jan Kara [mailto:jack@...e.cz]
Sent: Wednesday, March 16, 2016 4:10 AM
To: Theodore Ts'o <tytso@....edu>
Cc: Jan Kara <jack@...e.cz>; HUANG Weller (CM/ESW12-CN) <Weller.Huang@...bosch.com>; linux-ext4@...r.kernel.org; Li, Michael <huayil@....qualcomm.com>
Subject: Re: ext4 out of order when use cfq scheduler
On Tue 15-03-16 15:46:33, Jan Kara wrote:
> On Tue 15-03-16 11:46:34, Jan Kara wrote:
> > On Mon 14-03-16 10:36:35, Ted Tso wrote:
> > > On Mon, Mar 14, 2016 at 08:39:28AM +0100, Jan Kara wrote:
> > > > No, that won't be enough. blkdev_issue_flush() is not guaranteed
> > > > to do anything to IOs which have not reported completion before
> > > > blkdev_issue_flush() was called. Specifically, CFQ will queue
> > > > submitted bio in its internal RB tree, following flush request
> > > > completely bypasses this tree and goes directly to the disk
> > > > where it flushes caches. And only later CFQ decides to schedule
> > > > async writeback from the flusher thread which is queued in the RB tree...
> > >
> > > Oh, right. I am forgetting about the flushing mahchinery rewrite.
> > > Thanks for pointing that out.
> > >
> > > But what we *could* do is to swap those two calls and then in the
> > > case where delalloc is enabled, could maintain a list of inodes
> > > where we only need to call filemap_fdatawait(), and not initiate
> > > writeback for any dirty pages which had been caused by non-allocating writes.
> >
> > We actually don't need to swap those two calls - page is already
> > marked as under writeback in
> >
> > mpage_map_and_submit_buffers() -> mpage_submit_page ->
> > ext4_bio_write_page
> >
> > which gets called while we still hold the transaction handle. I
> > agree calling filemap_fdatawait() from JBD2 during commit should be
> > enough to fix issues with delalloc writeback. I'm just somewhat
> > afraid that it will be more fragile: If we add inode to
> > transaction's list in ext4_map_blocks(), we are pretty sure there's
> > no way to allocate block to an inode without introducing data
> > exposure issues (which are then very hard to spot). If we depend on
> > callers of ext4_map_blocks() to properly add inode to appropriate
> > transaction list, we have much more places to check. I'll think whether we could make this more robust.
>
> OK, I have something - Huang, can you check whether the attached
> patches also fix your data exposure issues please? The first patch is
> the original fix, patch two is a cleanup, patches 3 and 4 implement
> the speedup suggested by Ted. Patches are only lightly tested so far.
> I'll run more comprehensive tests later and in particular I want to
> check whether the additional complexity actually brings us some
> advantage at least for workloads which redirty pages in addition to
> writing some new ones using delayed allocation.
OK, there was a bug in patch 3. Attached is a new version of patches 3 and 4.
Honza
View attachment "0003-jbd2-Add-support-for-avoiding-data-writes-during-tra.patch" of type "text/x-patch" (7651 bytes)
View attachment "0004-ext4-Do-not-ask-jbd2-to-write-data-for-delalloc-buff.patch" of type "text/x-patch" (4389 bytes)
Powered by blists - more mailing lists