[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5046F959.8020500@redhat.com>
Date: Wed, 05 Sep 2012 09:03:53 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: "Nicholas A. Bellinger" <nab@...ux-iscsi.org>
CC: linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
kvm@...r.kernel.org, rusty@...tcorp.com.au, jasowang@...hat.com,
mst@...hat.com, virtualization@...ts.linux-foundation.org,
Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
target-devel <target-devel@...r.kernel.org>,
Asias He <asias@...hat.com>
Subject: Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support
Il 04/09/2012 22:11, Nicholas A. Bellinger ha scritto:
>>> As tgt->tgt_lock is taken in virtscsi_queuecommand_multi() before the
>>> atomic_inc_return(tgt->reqs) check, it seems like using atomic_dec() w/o
>>> smp_mb__after_atomic_dec or tgt_lock access here is not using atomic.h
>>> accessors properly, no..?
>>
>> No, only a single "thing" is being accessed, and there is no need to
>> order the decrement with respect to preceding or subsequent accesses to
>> other locations.
>>
>> In other words, tgt->reqs is already synchronized with itself, and that
>> is enough.
>>
>
> However, it's still my understanding that the use of atomic_dec() in the
> completion path mean that smp_mb__after_atomic_dec() is a requirement to
> be proper portable atomic.hcode, no..? Otherwise tgt->regs should be
> using something other than an atomic_t, right..?
Memory barriers aren't _always_ requested, only when you need to order
accesses to multiple locations.
In this case, there is no other location that the
queuecommand/completion handlers needs to synchronize against, so no
barrier is required. You can see plenty of atomic_inc/atomic_dec in the
code without a barrier afterwards (the typical case is the opposite as
in this patch: a refcount increment needs no barrier, a refcount
decrement uses atomic_dec_return).
>> virtio-scsi multiqueue has a performance benefit up to 20% (for a single
>> LUN) or 40% (on overall bandwidth across multiple LUNs). I doubt that a
>> single memory barrier can have that much impact. :)
>>
>
> I've no doubt that this series increases the large block high bandwidth
> for virtio-scsi, but historically that has always been the easier
> workload to scale. ;)
This is with a mixed workload (random 4k-64k) and tmpfs backend on the host.
> Yes, I think Jen's new approach is providing some pretty significant
> gains for raw block drivers with extremly high packet (small block
> random I/O) workloads, esp with hw block drivers that support genuine mq
> with hw num_queues > 1.
I need to look into it, to understand how the queue steering here can be
adapted to his code.
>> Have you measured the host_lock to be a bottleneck in high-iops
>> benchmarks, even for a modern driver that does not hold it in
>> queuecommand? (Certainly it will become more important as the
>> virtio-scsi queuecommand becomes thinner and thinner).
>
> This is exactly why it would make such a good vehicle to re-architect
> SCSI core. I'm thinking it can be the first sw LLD we attempt to get
> running on an (currently) future scsi-mq prototype.
Agreed.
Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists