linux-kernel - Re: Performance regression in IO scheduler still there

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4e5e476b0911100937s31767d1dh52831126c5e8cf47@mail.gmail.com>
Date:	Tue, 10 Nov 2009 18:37:57 +0100
From:	Corrado Zoccolo <czoccolo@...il.com>
To:	Jeff Moyer <jmoyer@...hat.com>
Cc:	Jan Kara <jack@...e.cz>, jens.axboe@...cle.com,
	LKML <linux-kernel@...r.kernel.org>,
	Chris Mason <chris.mason@...cle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mike Galbraith <efault@....de>
Subject: Re: Performance regression in IO scheduler still there

On Tue, Nov 10, 2009 at 5:47 PM, Jeff Moyer <jmoyer@...hat.com> wrote:
> Corrado Zoccolo <czoccolo@...il.com> writes:
>
>> Jeff, Jens,
>> do you think we should try to do more auto-tuning of cfq parameters?
>> Looking at those numbers for SANs, I think we are being suboptimal in
>> some cases.
>> E.g. sequential read throughput is lower than random read.
>
> I investigated this further, and this was due to a problem in the
> benchmark.  It was being run with only 500 samples for random I/O and
> 65536 samples for sequential.  After fixing this, we see random I/O is
> slower than sequential, as expected.
Ok.
>> I also think that current slice_idle and slice_sync values are good
>> for devices with 8ms seek time, but they are too high for non-NCQ
>> flash devices, where "seek" penalty is under 1ms, and we still prefer
>> idling.
>
> Do you have numbers to back that up?  If not, throw a fio job file over
> the fence and I'll test it on one such device.
>
It is based on reasoning.
Currently idling is based on the assumption that we can wait up to
10ms, to get a better request than jumping far away, since the jump
will likely cost more than that. If the jump costs around 1ms, like on
flash cards, then waiting 10ms is surely wasted time.
On the other hand, on flash cards a random write could cost 50ms or
more, so we will need to differentiate the last idle before switching
to async writes from the inter-read idles. This should be possible
with the new workload based infrastructure, but we need to measure
those characteristic times in order to use them in the heuristics.

>> If we agree on this, should the measurement part (I'm thinking to
>> measure things like seek time, throughput, etc...) be added to the
>> common elevator code, or done inside cfq?
>
> Well, if it's something that is of interest to others, than pushing it
> up a layer makes sense.  If only CFQ is going to use it, keep it there.
If the direction is to have only one intelligent I/O scheduler, as the
removal of anticipatory indicates, then it is the latter. I don't
think noop or deadline will ever make any use of them.
But it could still be useful for reporting performance as seen by the
kernel, after the page cache.

Thanks
Corrado
>
> Cheers,
> Jeff
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/