linux-ext4 - FW: ext4 out of order when use cfq scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f0a9a62a43c4419687c69fff15a7c043@SGPMBX1004.APAC.bosch.com>
Date:	Wed, 22 Jun 2016 11:55:12 +0000
From:	"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
To:	"jack@...e.cz" <jack@...e.cz>
CC:	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Theodore Ts'o <tytso@....edu>
Subject: FW: ext4 out of order when use cfq scheduler

Hi Kara,

I saw the patch " fix data exposure after a crash " in the kernel 4.6.2 release.
I remember you provided 2 more patches for performance optimization. Could you tell me whether they are necessary ?

Thanks. 
Best regards

 Weller HUANG



-----Original Message-----
From: Jan Kara [mailto:jack@...e.cz] 
Sent: Wednesday, March 16, 2016 4:10 AM
To: Theodore Ts'o <tytso@....edu>
Cc: Jan Kara <jack@...e.cz>; HUANG Weller (CM/ESW12-CN) <Weller.Huang@...bosch.com>; linux-ext4@...r.kernel.org; Li, Michael <huayil@....qualcomm.com>
Subject: Re: ext4 out of order when use cfq scheduler

On Tue 15-03-16 15:46:33, Jan Kara wrote:
> On Tue 15-03-16 11:46:34, Jan Kara wrote:
> > On Mon 14-03-16 10:36:35, Ted Tso wrote:
> > > On Mon, Mar 14, 2016 at 08:39:28AM +0100, Jan Kara wrote:
> > > > No, that won't be enough. blkdev_issue_flush() is not guaranteed 
> > > > to do anything to IOs which have not reported completion before
> > > > blkdev_issue_flush() was called. Specifically, CFQ will queue 
> > > > submitted bio in its internal RB tree, following flush request 
> > > > completely bypasses this tree and goes directly to the disk 
> > > > where it flushes caches. And only later CFQ decides to schedule 
> > > > async writeback from the flusher thread which is queued in the RB tree...
> > > 
> > > Oh, right.  I am forgetting about the flushing mahchinery rewrite.
> > > Thanks for pointing that out.
> > > 
> > > But what we *could* do is to swap those two calls and then in the 
> > > case where delalloc is enabled, could maintain a list of inodes 
> > > where we only need to call filemap_fdatawait(), and not initiate 
> > > writeback for any dirty pages which had been caused by non-allocating writes.
> > 
> > We actually don't need to swap those two calls - page is already 
> > marked as under writeback in
> > 
> >   mpage_map_and_submit_buffers() -> mpage_submit_page -> 
> > ext4_bio_write_page
> > 
> > which gets called while we still hold the transaction handle. I 
> > agree calling filemap_fdatawait() from JBD2 during commit should be 
> > enough to fix issues with delalloc writeback. I'm just somewhat 
> > afraid that it will be more fragile: If we add inode to 
> > transaction's list in ext4_map_blocks(), we are pretty sure there's 
> > no way to allocate block to an inode without introducing data 
> > exposure issues (which are then very hard to spot). If we depend on 
> > callers of ext4_map_blocks() to properly add inode to appropriate 
> > transaction list, we have much more places to check. I'll think whether we could make this more robust.
> 
> OK, I have something - Huang, can you check whether the attached 
> patches also fix your data exposure issues please? The first patch is 
> the original fix, patch two is a cleanup, patches 3 and 4 implement 
> the speedup suggested by Ted. Patches are only lightly tested so far.  
> I'll run more comprehensive tests later and in particular I want to 
> check whether the additional complexity actually brings us some 
> advantage at least for workloads which redirty pages in addition to 
> writing some new ones using delayed allocation.

OK, there was a bug in patch 3. Attached is a new version of patches 3 and 4.
							Honza

View attachment "0003-jbd2-Add-support-for-avoiding-data-writes-during-tra.patch" of type "text/x-patch" (7651 bytes)

View attachment "0004-ext4-Do-not-ask-jbd2-to-write-data-for-delalloc-buff.patch" of type "text/x-patch" (4389 bytes)