linux-kernel - Re: cfq-iosched preempt issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Thu, 03 Mar 2011 09:05:33 +0800
From:	Shaohua Li <shaohua.li@...el.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Jeff Moyer <jmoyer@...hat.com>,
	"jaxboe@...ionio.com" <jaxboe@...ionio.com>,
	"czoccolo@...il.com" <czoccolo@...il.com>,
	"guijianfeng@...fujitsu.com" <guijianfeng@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: cfq-iosched preempt issues

On Thu, 2011-03-03 at 05:27 +0800, Vivek Goyal wrote:
> On Wed, Mar 02, 2011 at 04:05:30PM -0500, Jeff Moyer wrote:
> > Vivek Goyal <vgoyal@...hat.com> writes:
> > 
> > > On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
> > >> queue preemption is good for some workloads and not for others. With commit
> > >> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> > >> 1. In a multi-threaded workload, each thread runs a random read/write (for
> > >> example, mmap write) with iodepth 1. I found the queue depth gets smaller
> > >> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> > >> are waitting for write, and on the other hand, there are less threads doing
> > >> read. This will make the queue depth small, so performance drops a little.
> > >> So in this case, speed up write can speed up read too, but we can't detect
> > >> it.
> > >> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> > >> What about if there are two queues, one sync and one async? If the sync queue's
> > >> think time is small, we can treat it as the sole queue, because the sync queue
> > >> will preempt async queue, so we don't need care about the async queue's latency.
> > >> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
> > >> 
> > >> Any idea?
> > >
> > > CFQ is already very complicated, lets try to keep it simple. Because it
> > > is complicated, making it hierarchical for cgroup becomes even harder.
> > >
> > > IIUC, you are saying that cfqd->busy_queues check is not sufficient as
> > > it takes async queues also in account.
> > >
> > > So we can keep another count say, cfqd->busy_sync_queues and if there
> > > are no busy_sync_queues, allow unlimited depth and that should be
> > > a really simple few lines change.
> > 
> > That covers workload 2, but what about 1?  I'm really not sure what the
> > workload there is.
> 
> But CFQ can't track that if reads are stuck behind peding writes. And the
> whole philosophy is that give READS the importance and not WRITES. So I
> am not sure what we can do about first case.
I'm also not sure if we should take care about the case, since we should
give READ priority.

> If we are really worried about performance and willing to loose isolation
> in the process (read vs write isolation, or isolation across groups), then
> may be we can think of implementing another tunables say min_queue_depth.
> That tells CFQ that don't idle if you are not driving min_queue_depth.
The NCQ disk gives a lot of challenges to CFQ. It is hard to utilize the
full disk queue depth without loosing isolation. A tunable seems the
best option for people who don't so care about latency.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/