[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e5e476b1001130005p4acfdd55na387f925ad6078f3@mail.gmail.com>
Date: Wed, 13 Jan 2010 09:05:21 +0100
From: Corrado Zoccolo <czoccolo@...il.com>
To: Vivek Goyal <vgoyal@...hat.com>
Cc: Jens Axboe <jens.axboe@...cle.com>,
Linux-Kernel <linux-kernel@...r.kernel.org>,
Jeff Moyer <jmoyer@...hat.com>,
Shaohua Li <shaohua.li@...el.com>,
Gui Jianfeng <guijianfeng@...fujitsu.com>,
Yanmin Zhang <yanmin_zhang@...ux.intel.com>
Subject: Re: [PATCH] cfq-iosched: rework seeky detection
On Wed, Jan 13, 2010 at 12:17 AM, Corrado Zoccolo <czoccolo@...il.com> wrote:
> On Tue, Jan 12, 2010 at 11:36 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
>>> The fact is, can we reliably determine which of those two setups we
>>> have from cfq?
>>
>> I have no idea at this point of time but it looks like determining this
>> will help.
>>
>> May be something like keep a track of number of processes on "sync-noidle"
>> tree and average read times when sync-noidle tree is being served. Over a
>> period of time we need to monitor what's the number of processes
>> (threshold), after which average read time goes up. For sync-noidle we can
>> then drive "queue_depth=nr_thrshold" and once queue depth reaches that,
>> then idle on the process. So for single spindle, I guess tipping point
>> will be 2 processes and we can idle on sync-noidle process. For more
>> spindles, tipping point will be higher.
>>
>> These are just some random thoughts.
> It seems reasonable.
I think, though, that the implementation will be complex.
We should limit this to request sizes that are <= stripe size (larger
requests will hit more disks, and have a much lower optimal queue
depth), so we need to add a new service_tree (they will become:
SYNC_IDLE_LARGE, SYNC_IDLE_SMALL, SYNC_NOIDLE, ASYNC), and the
optimization will apply only to the SYNC_IDLE_SMALL tree.
Moreover, we can't just dispatch K queues and then idle on the last
one. We need to have a set of K active queues, and wait on any of
them. This makes this optimization very complex, and I think for
little gain. In fact, usually we don't have sequential streams of
small requests, unless we misuse mmap or direct I/O.
BTW, the mmap problem could be easily fixed adding madvise(WILL_NEED)
to the userspace program, when dealing with data.
I think we only have to worry about binaries, here.
> Something similar to what we do to reduce depth for async writes.
> Can you see if you get similar BW improvements also for parallel
> sequential direct I/Os with block size < stripe size?
Thanks,
Corrado
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists