lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090417123805.GC7117@mit.edu>
Date:	Fri, 17 Apr 2009 08:38:05 -0400
From:	Theodore Tso <tytso@....edu>
To:	Andrea Righi <righi.andrea@...il.com>
Cc:	Paul Menage <menage@...gle.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	agk@...rceware.org, akpm@...ux-foundation.org, axboe@...nel.dk,
	baramsori72@...il.com, Carl Henrik Lunde <chlunde@...g.uio.no>,
	dave@...ux.vnet.ibm.com, Divyesh Shah <dpshah@...gle.com>,
	eric.rannaud@...il.com, fernando@....ntt.co.jp,
	Hirokazu Takahashi <taka@...inux.co.jp>,
	Li Zefan <lizf@...fujitsu.com>, matt@...ehost.com,
	dradford@...ehost.com, ngupta@...gle.com, randy.dunlap@...cle.com,
	roberto@...it.it, Ryo Tsuruta <ryov@...inux.co.jp>,
	Satoshi UCHIDA <s-uchida@...jp.nec.com>,
	subrata@...ux.vnet.ibm.com, yoshikawa.takuya@....ntt.co.jp,
	Jens Axboe <jens.axboe@...cle.com>,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO

On Tue, Apr 14, 2009 at 10:21:20PM +0200, Andrea Righi wrote:
> Delaying journal IO can unnecessarily delay other independent IO
> operations from different cgroups.
> 
> Add BIO_RW_META flag to the ext3 journal IO that informs the io-throttle
> subsystem to account but not delay journal IO and avoid potential
> priority inversion problems.

So this worries me for two reasons.  First of all, the meaning of
BIO_RW_META is not well defined, but I'm concerned that you are using
the flag in a manner that in a way that wasn't its original intent.
I've included Jens on the cc list so he can comment on that score.

Secondly, there are many more locations than these which can end up
causing I/O which will ending up causing the journal commit to block
until they are completed.  I've done a lot of work in the past few
weeks to make sure those writes get marked using BIO_RW_SYNC.  In
data=ordered mode, the journal commit will block waiting for data
blocks to be written out, and that implies you really need to treat as
high priority all of the block writes that are marked with the
BIO_RW_SYNC flag.

The flip side of this is it may end up making your I/O controller to
leaky; that is, someone might be able to evade your I/O controller's
attempt to impose limits by using fsync() all the time.  This is a
hard problem, though, because filesystem I/O is almost always
intertwined.

What sort of scenarios and workloads are you envisioning might use
this I/O controller?  And can you say more about the specifics about
the priority inversion problem you are concerned about?

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ