linux-kernel - Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move blkio_group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 4 Apr 2012 07:52:24 -0700
From:	Shaohua Li <shli@...nel.org>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Tao Ma <tm@....ma>, Tejun Heo <tj@...nel.org>, axboe@...nel.dk,
	ctalbott@...gle.com, rni@...gle.com, linux-kernel@...r.kernel.org,
	cgroups@...r.kernel.org, containers@...ts.linux-foundation.org
Subject: Re: IOPS based scheduler (Was: Re: [PATCH 18/21] blkcg: move
 blkio_group_conf->weight to cfq)

2012/4/4 Vivek Goyal <vgoyal@...hat.com>:
> On Wed, Apr 04, 2012 at 05:35:49AM -0700, Shaohua Li wrote:
>
> [..]
>> >> How iops_weight and switching different than CFQ group scheduling logic?
>> >> I think shaohua was talking of using similar logic. What would you do
>> >> fundamentally different so that without idling you will get service
>> >> differentiation?
>> > I am thinking of differentiate different groups with iops, so if there
>> > are 3 groups(the weight are 100, 200, 300) we can let them submit 1 io,
>> > 2 io and 3 io in a round-robin way. With a intel ssd, every io can be
>> > finished within 100us. So the maximum latency for one io is about 600us,
>> > still less than 1ms. But with cfq, if all the cgroups are busy, we have
>> > to switch between these group in ms which means the maximum latency will
>> > be 6ms. It is terrible for some applications since they use ssds now.
>> Yes, with iops based scheduling, we do queue switching for every request.
>> Doing the same thing between groups is quite straightforward. The only issue
>> I found is this will introduce more process context switch, this isn't
>> a big issue
>> for io bound application, but depends. It cuts latency a lot, which I
>> guess is more
>> important for web 2.0 application.
>
> In iops_mode(), expire each cfqq after dispatch of 1 or bunch of requests
> and you should get the same behavior (with slice_idle=0 and group_idle=0).
> So why write a new scheduler.
>
> Only thing is that with above, current code will provide iops fairness only
> for groups. We should be able to tweak queue scheduling to support iops
> fairness also.
Agreed, we can tweak cfq to make it support iops fairness because the two
are conceptually the same. The problem is if this is a mess. CFQ is quite
complicated already. In iops mode, a lot of code isn't required, like idle,
queue merging, thinktime/seek detection and so on, as the scheduler
will be only for ssd. With recent iocontext cleanup, the iops scheduler
code is quite short actually.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/