linux-kernel - Re: submitting read(1%)/write(99%) IO within a kernel thread, vs doing it in userspace (aio) with CFQ shows drastic drop. Ideas?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20110427130403.GA29593@dumpdata.com>
Date:	Wed, 27 Apr 2011 09:04:03 -0400
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Jens Axboe <jaxboe@...ionio.com>, linux-kernel@...r.kernel.org
Subject: Re: submitting read(1%)/write(99%) IO within a kernel thread, vs
 doing it in userspace (aio) with CFQ shows drastic drop. Ideas?

On Tue, Apr 26, 2011 at 02:33:21PM -0400, Vivek Goyal wrote:
> On Tue, Apr 26, 2011 at 01:37:32PM -0400, Konrad Rzeszutek Wilk wrote:
> > 
> > I was hoping you could shed some light at a peculiar problem I am seeing
> > (this is with the PV block backend I posted recently [1]).
> > 
> > I am using the IOmeter fio test, with two threads and modified it slightly
> > (please see at the bottom). The "disk" the I/Os are being done on is an iSCSI disk
> > that on the other side is LIO TCM 10G RAMdisk. The network is 1GB and
> > the line speed when doing just full blow random reads or full random writes
> > is 112MB/s (native or from the guest).
> > 
> > I launch a guest and inside the guest I run the 'fio iometer'. When launching
> > the guest I have the option of using two different block backends:
> > the kernel one (simple code [1] doing 'submit_bio') or the userspace one (which
> > uses the AIO library and opens the disk using O_DIRECT). The throughput and submit
> > latency are widely different for this particular workload. If I swap the IO
> > scheduler in the host for the iSCSI disk from 'cfq' to deadline or noop - throughput
> > and latencies become the same (however CPU usage is not, but that is not important here).
> > Here is a simple table with the numbers:
> > 
> > IOmeter       |       |      |          |
> > 64K, randrw   |  NOOP | CFQ  | deadline |
> > randrwmix=80  |       |      |          |
> > --------------+-------+------+----------+
> > blkback       |103/27 |32/10 | 102/27   |
> > --------------+-------+------+----------+
> > QEMU qdisk    |103/27 |102/27| 102/27   |
> > 
> > What I found out is that if I pollute the ring request with just one
> > different type of I/O operation (so 99% is WRITE, and I stick 1% READ on it)
> > the I/O  plummets if I use the kernel thread. But that problem does
> > not show up when the I/O operations are plumbed through the AIO library.
> 
> Konrad,
> 
> I suspect that difference is that sync vs async requests. In the case of
> a kernel thread submitting IO, I think all the WRITES might be being
> considered as async and will go in a different queue. If you mix those
> with some READS, they are always sync and will go in differnet queue.
> In presence of sync queue, CFQ will idle and choke up WRITES in
> an attempt to improve latencies of READs.
> 
> In case of AIO, I am assuming it is direct IO and both READS and WRITES
> will be considered SYNC and will go in a single queue and no choking
> of WRITES will take place. 
> 
> Can you run blktrace on your host iscsi device (15-20 seconds) and upload
> the traces somewhere. That might give us some ideas.
> 
> The bio's you are preparing in kernel thread, if you flag them sync using
> (REQ_SYNC flag), then this problem might disappear (Only if my problem
> analysis is right. :-))

Your analysis was spot-on-dead right. Thank you!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/