linux-kernel - Re: [PATCH 0/8 v2] Introduce CFQ group hierarchical scheduling and "use

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 13 Dec 2010 22:29:27 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Gui Jianfeng <guijianfeng@...fujitsu.com>
Cc:	Jens Axboe <axboe@...nel.dk>, Corrado Zoccolo <czoccolo@...il.com>,
	Chad Talbott <ctalbott@...gle.com>,
	Nauman Rafique <nauman@...gle.com>,
	Divyesh Shah <dpshah@...gle.com>,
	linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/8 v2] Introduce CFQ group hierarchical scheduling and
 "use_hierarchy" interface

On Tue, Dec 14, 2010 at 11:06:26AM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > On Mon, Dec 13, 2010 at 09:44:10AM +0800, Gui Jianfeng wrote:
> >> Hi
> >>
> >> Previously, I posted a patchset to add support of CFQ group hierarchical scheduling
> >> in the way that it puts all CFQ queues in a hidden group and schedules with other 
> >> CFQ group under their parent. The patchset is available here,
> >> http://lkml.org/lkml/2010/8/30/30
> >>
> >> Vivek think this approach isn't so instinct that we should treat CFQ queues
> >> and groups at the same level. Here is the new approach for hierarchical 
> >> scheduling based on Vivek's suggestion. The most big change of CFQ is that
> >> it gets rid of cfq_slice_offset logic, and makes use of vdisktime for CFQ
> >> queue scheduling just like CFQ group does. But I still give cfqq some jump 
> >> in vdisktime based on ioprio, thanks for Vivek to point out this. Now CFQ 
> >> queue and CFQ group uses the same scheduling algorithm. 
> > 
> > Hi Gui,
> > 
> > Thanks for the patches. Few thoughts.
> > 
> > - I think we can implement vdisktime jump logic for both cfq queue and
> >   cfq groups. So any entity (queue/group) which is being backlogged fresh
> >   will get the vdisktime jump but anything which has been using its slice
> >   will get queued at the end of tree.
> 
> Vivek,
> 
> vdisktime jump for both CFQ queue and CFQ group is ok to me.
> what do you mean "anything which has been using its slice will get queued at the 
> end of tree."
> Currently, if a CFQ entity uses up its time slice, we'll update its vdisktime,
> why should we put it at the end of tree.

Sorry, what I actually meant was that any queue/group which has been using
its slice and is being requeued will be queue at a position based on vdisktime
calculation and no boost logic required. For queues/groups which gets queued
new gets a vdisktime boost. That way once we disable slice_idle=0 and
group_idle=0, we might get good bandwidth utilization at the same time
some service differentation for higher weight queues/groups.

> 
> 
> > 
> > - Have you done testing in true hierarchical mode. In the sense that
> >   create atleast two level of hierarchy and see if bandwidth division
> >   is happening properly. Something like as follows.
> > 
> > 			root
> > 		       /    \ 
> > 		    test1  test2
> > 	           /    \   /  \
> > 		  G1    G2  G3  G4
> 
> yes, I tested with two level, and works fine.
> 
> > 
> > - On what kind of storage you have been doing your testing? I have noticed
> >   that IO controllers works well only with idling on and with idling on
> >   performance is bad on high end storage. The simple reason being that
> >   an storage array can support multiple IOs at the same time and if we
> >   are idling on queue or group in an attempt to provide fairness it hurts.
> >   It hurts especially more if we are doing random IO (I am assuming this
> >   is more typical of workloads).
> >  
> >   So we need to come up with a proper logic so that we can provide some
> >   kind of fairness even with idle disabled. I think that's where this
> >   vdisktime jump logic comes into picture and is important to get it
> >   right.
> > 
> >   So can you also do some testing with idle disabled (both queue 
> >   and group) and see if the vdisktime logic is helping with providing
> >   some kind of service differentation. I think results will vary 
> >   based on what is the storage and what queue depth are you driving. You
> >   can even try to do this testing on an SSD.
> 
> I tested on sata. will do more tests when idle disabled.

Ok, actulally SATA with low queue depth is the case where block IO controller
works best. I am also keen to make it work well for SSDs and faster storage
like storage arrays without losing too much of throughput in the process.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/