[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121219113202.GD7742@redhat.com>
Date: Wed, 19 Dec 2012 13:32:02 +0200
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Rolf Eike Beer <eike-kernel@...tec.de>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
gaowanlong@...fujitsu.com, hutao@...fujitsu.com,
linux-scsi@...r.kernel.org,
virtualization@...ts.linux-foundation.org, rusty@...tcorp.com.au,
asias@...hat.com, stefanha@...hat.com, nab@...ux-iscsi.org
Subject: Re: [PATCH v2 0/5] Multiqueue virtio-scsi, and API for piecewise
buffer submission
On Wed, Dec 19, 2012 at 09:52:59AM +0100, Paolo Bonzini wrote:
> Il 18/12/2012 23:18, Rolf Eike Beer ha scritto:
> > Paolo Bonzini wrote:
> >> Hi all,
> >>
> >> this series adds multiqueue support to the virtio-scsi driver, based
> >> on Jason Wang's work on virtio-net. It uses a simple queue steering
> >> algorithm that expects one queue per CPU. LUNs in the same target always
> >> use the same queue (so that commands are not reordered); queue switching
> >> occurs when the request being queued is the only one for the target.
> >> Also based on Jason's patches, the virtqueue affinity is set so that
> >> each CPU is associated to one virtqueue.
> >>
> >> I tested the patches with fio, using up to 32 virtio-scsi disks backed
> >> by tmpfs on the host. These numbers are with 1 LUN per target.
> >>
> >> FIO configuration
> >> -----------------
> >> [global]
> >> rw=read
> >> bsrange=4k-64k
> >> ioengine=libaio
> >> direct=1
> >> iodepth=4
> >> loops=20
> >>
> >> overall bandwidth (MB/s)
> >> ------------------------
> >>
> >> # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs
> >> 1 540 626 599
> >> 2 795 965 925
> >> 4 997 1376 1500
> >> 8 1136 2130 2060
> >> 16 1440 2269 2474
> >> 24 1408 2179 2436
> >> 32 1515 1978 2319
> >>
> >> (These numbers for single-queue are with 4 VCPUs, but the impact of adding
> >> more VCPUs is very limited).
> >>
> >> avg bandwidth per LUN (MB/s)
> >> ----------------------------
> >>
> >> # of targets single-queue multi-queue, 4 VCPUs multi-queue, 8 VCPUs
> >> 1 540 626 599
> >> 2 397 482 462
> >> 4 249 344 375
> >> 8 142 266 257
> >> 16 90 141 154
> >> 24 58 90 101
> >> 32 47 61 72
> >
> > Is there an explanation why 8x8 is slower then 4x8 in both cases?
>
> Regarding the "in both cases" part, it's because the second table has
> the same data as the first, but divided by the first column.
>
> In general, the "strangenesses" you find are probably within statistical
> noise or due to other effects such as host CPU utilization or contention
> on the big QEMU lock.
>
> Paolo
>
That's exactly what bothers me. If the IOPS divided by host CPU
goes down, then the win on lightly loaded host will become a regression
on a loaded host.
Need to measure that.
> 8x1 and 8x2
> > being slower than 4x1 and 4x2 is more or less expected, but 8x8 loses against
> > 4x8 while 8x4 wins against 4x4 and 8x16 against 4x16.
> >
> > Eike
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists