linux-kernel - Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090417123805.GC7117@mit.edu>
Date:	Fri, 17 Apr 2009 08:38:05 -0400
From:	Theodore Tso <tytso@....edu>
To:	Andrea Righi <righi.andrea@...il.com>
Cc:	Paul Menage <menage@...gle.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	agk@...rceware.org, akpm@...ux-foundation.org, axboe@...nel.dk,
	baramsori72@...il.com, Carl Henrik Lunde <chlunde@...g.uio.no>,
	dave@...ux.vnet.ibm.com, Divyesh Shah <dpshah@...gle.com>,
	eric.rannaud@...il.com, fernando@....ntt.co.jp,
	Hirokazu Takahashi <taka@...inux.co.jp>,
	Li Zefan <lizf@...fujitsu.com>, matt@...ehost.com,
	dradford@...ehost.com, ngupta@...gle.com, randy.dunlap@...cle.com,
	roberto@...it.it, Ryo Tsuruta <ryov@...inux.co.jp>,
	Satoshi UCHIDA <s-uchida@...jp.nec.com>,
	subrata@...ux.vnet.ibm.com, yoshikawa.takuya@....ntt.co.jp,
	Jens Axboe <jens.axboe@...cle.com>,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO

On Tue, Apr 14, 2009 at 10:21:20PM +0200, Andrea Righi wrote:
> Delaying journal IO can unnecessarily delay other independent IO
> operations from different cgroups.
> 
> Add BIO_RW_META flag to the ext3 journal IO that informs the io-throttle
> subsystem to account but not delay journal IO and avoid potential
> priority inversion problems.

So this worries me for two reasons.  First of all, the meaning of
BIO_RW_META is not well defined, but I'm concerned that you are using
the flag in a manner that in a way that wasn't its original intent.
I've included Jens on the cc list so he can comment on that score.

Secondly, there are many more locations than these which can end up
causing I/O which will ending up causing the journal commit to block
until they are completed.  I've done a lot of work in the past few
weeks to make sure those writes get marked using BIO_RW_SYNC.  In
data=ordered mode, the journal commit will block waiting for data
blocks to be written out, and that implies you really need to treat as
high priority all of the block writes that are marked with the
BIO_RW_SYNC flag.

The flip side of this is it may end up making your I/O controller to
leaky; that is, someone might be able to evade your I/O controller's
attempt to impose limits by using fsync() all the time.  This is a
hard problem, though, because filesystem I/O is almost always
intertwined.

What sort of scenarios and workloads are you envisioning might use
this I/O controller?  And can you say more about the specifics about
the priority inversion problem you are concerned about?

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/