linux-kernel - Re: [RFC] writeback and cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120424155843.GG26708@redhat.com>
Date:	Tue, 24 Apr 2012 11:58:43 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Fengguang Wu <fengguang.wu@...el.com>, Tejun Heo <tj@...nel.org>,
	Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org,
	sjayaraman@...e.com, andrea@...terlinux.com, jmoyer@...hat.com,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	kamezawa.hiroyu@...fujitsu.com, lizefan@...wei.com,
	containers@...ts.linux-foundation.org, cgroups@...r.kernel.org,
	ctalbott@...gle.com, rni@...gle.com, lsf@...ts.linux-foundation.org
Subject: Re: [RFC] writeback and cgroup

On Tue, Apr 24, 2012 at 04:56:55PM +0200, Jan Kara wrote:

[..]
> > > I think having separate weigths for sync IO groups and async IO is not
> > > very appealing. There should be one notion of group weight and bandwidth
> > > distrubuted among groups according to their weight.
> > 
> > There have to be some scheme, either explicitly or implicitly. Maybe
> > you are baring in mind some "equal split among queues" policy? For
> > example, if the cgroup has 9 active sync queues and 1 async queue,
> > split the weight equally to the 10 queues?  So the sync IOs get 90%
> > share, and the async writes get 10% share.
>   Maybe I misunderstand but there doesn't have to be (and in fact isn't)
> any split among sync / async IO in CFQ. At each moment, we choose a queue
> with the highest score and dispatch a couple of requests from it. Then we
> go and choose again. The score of the queue depends on several factors
> (like age of requests, whether the queue is sync or async, IO priority,
> etc.).
> 
> Practically, over a longer period system will stabilize on some ratio
> but that's dependent on the load so your system should not impose some
> artificial direct/buffered split but rather somehow deal with the reality
> how IO scheduler decides to dispatch requests...

Yes. CFQ does not have the notion of giving a fixed share to async
requests. In fact right now it is so biased in favor of sync reqeusts,
that in some cases it can starve async writes or introduce long delays
resulting in "task hung for 120 second" warnings.

So if there are issues w.r.t how disk is shared between sync/async IO
with in a cgroup, that should be handled at IO scheduler level. Writeback
code trying to dictate that ratio, sounds odd.

> 
> > For dirty throttling w/o cgroup awareness, balance_dirty_pages()
> > splits the writeout bandwidth equally among all dirtier tasks. Since
> > cfq works with queues, it seems most natural for it to do equal split
> > among all queues (inside the cgroup).
>   Well, but we also have IO priorities which change which queue should get
> preference.
> 
> > I'm not sure when there are N dd tasks doing direct IO, cfq will
> > continuously run N sync queues for them (without many dynamic queue
> > deletion and recreations). If that is the case, it should be trivial
> > to support the queue based fair split in the global async queue
> > scheme. Otherwise I'll have some trouble detecting the N value when
> > trying to do the N:1 sync:async weight split.
>   And also sync queues for several processes can get merged when CFQ
> observes these processes cooperate together on one area of disk and get
> split again when processes stop cooperating. I don't think you really want
> to second-guess what CFQ does inside...

Agreed. Trying to predict what CFQ will do and then trying to influence
sync/async ration based on root cgroup weight does not seem to be the
right way. Especially that will also mean either assuming that everything
in root group is sync or we shall have to split sync/async weight notion.

sync/async ratio is a IO scheduler thing and is not fixed. So writeback
layer making assumptions and changing weigths sounds very awkward to me.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/