linux-kernel - Re: [RFC PATCH] cfq-iosced: Implement IOPS mode and group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100722140044.GA28684@redhat.com>
Date:	Thu, 22 Jul 2010 10:00:44 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	linux-kernel@...r.kernel.org, axboe@...nel.dk, nauman@...gle.com,
	dpshah@...gle.com, guijianfeng@...fujitsu.com, jmoyer@...hat.com,
	czoccolo@...il.com
Subject: Re: [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle
 tunable V3

On Thu, Jul 22, 2010 at 01:56:02AM -0400, Christoph Hellwig wrote:
> On Wed, Jul 21, 2010 at 03:06:18PM -0400, Vivek Goyal wrote:
> > On high end storage (I got on HP EVA storage array with 12 SATA disks in 
> > RAID 5),
> 
> That's actually quite low end storage for a server these days :)
> 

Yes it is. Just that this is the best I got access to. :-)

> > So this is not the default mode. This new tunable group_idle, allows one to
> > set slice_idle=0 to disable some of the CFQ features and and use primarily
> > group service differentation feature.
> 
> While this is better than before needing a sysfs tweak to get any
> performance out of any kind of server class hardware still is pretty
> horrible.  And slice_idle=0 is not exactly the most obvious paramter
> I would look for either.    So having some way to automatically disable
> this mode based on hardware characteristics would be really useful,

An IO scheduler able to change its behavior based on unerlying storage
property is the ideal and most convenient thing. For that we will need
some kind of auto tuning features in CFQ where we monitor for the ongoing
IO (for sequentiality, for block size) and then try to make some
predictions about the storage property.

Auto tuning is little hard to implement. So I thought that in first step we
can make sure things work reasonably well with the help of tunables and
then look into auto tuning the stuff.

I was actually thinking of writting a user space utility which can do
some specific IO patterns to the disk/lun and setup some IO scheduler
tunables automatically.

> and if that's not possible at least make sure it's very obviously
> document and easily found using web searches.

Sure. I think I will create a new file Documentation/block/cfq-iosched.txt
and document this new mode there. Becuase this mode primarily is useful
for group scheduling, I will also add some info in
Documentation/cgroups/blkio-controller.txt.

> 
> Btw, what effect does slice_idle=0 with your patches have to single SATA
> disk and single SSD setups?

I am not expecting any major effect of IOPS mode on a non-group setup on 
any kind of storage.

IOW, currently if one sets slice_idle=0 in CFQ, then we kind of become almost
like deadline (with some differences here and there). Notion of ioprio
almost disappears except that in some cases you can still see some
service differentation among queues of different prio level.

With this patchset, one would swtich to IOPS mode with slice_idle=0. We
will still show a deadlinish behavior. The only difference will be that
there will be no service differentation among ioprio levels.

I am not bothering about fixing it currently because in slice_idle=0 mode,
notion of ioprio is so weak and unpredictable that I think it is not worth
fixing it at this point of time. If somebody is looking for service
differentation with slice_idle=0, using cgroups might turn out to be a
better bet.

In summary, in non cgroup setup, wth slice_idle=0, one should not see
significant change with this patchset on any kind of storage. With
slice_idle=0, CFQ stops idling and achieves much better throughput and
even in IOPS mode it will continue doing that.

The difference is primarily visible for cgroup users where we get better
accounting done in IOPS mode and are able to provide service differentation
among groups in a more predictable manner.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/