linux-kernel - Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120904110905.GA9119@redhat.com>
Date:	Tue, 4 Sep 2012 14:09:05 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Paolo Bonzini <pbonzini@...hat.com>
Cc:	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>,
	linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
	kvm@...r.kernel.org, rusty@...tcorp.com.au, jasowang@...hat.com,
	virtualization@...ts.linux-foundation.org,
	Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
	target-devel <target-devel@...r.kernel.org>
Subject: Re: [PATCH 5/5] virtio-scsi: introduce multiqueue support

On Tue, Sep 04, 2012 at 12:25:03PM +0200, Paolo Bonzini wrote:
> Il 04/09/2012 10:46, Michael S. Tsirkin ha scritto:
> >>>> +static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
> >>>> +				       struct scsi_cmnd *sc)
> >>>> +{
> >>>> +	struct virtio_scsi *vscsi = shost_priv(sh);
> >>>> +	struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
> >>>> +	unsigned long flags;
> >>>> +	u32 queue_num;
> >>>> +
> >>>> +	/* Using an atomic_t for tgt->reqs lets the virtqueue handler
> >>>> +	 * decrement it without taking the spinlock.
> >>>> +	 */
> > 
> > Above comment is not really helpful - reader can be safely assumed to
> > know what atomic_t is.
> 
> Sure, the comment explains that we use an atomic because _elsewhere_ the
> tgt_lock is not held while modifying reqs.
> 
> > Please delete, and replace with the text from commit log
> > that explains the heuristic used to select req_vq.
> 
> Ok.
> 
> > Also please add a comment near 'reqs' definition.
> > Something like "number of outstanding requests - used to detect idle
> > target".
> 
> Ok.
> 
> > 
> >>>> +	spin_lock_irqsave(&tgt->tgt_lock, flags);
> > 
> > Looks like this lock can be removed - req_vq is only
> > modified when target is idle and only used when it is
> > not idle.
> 
> If you have two incoming requests at the same time, req_vq is also
> modified when the target is not idle; that's the point of the lock.
> 
> Suppose tgt->reqs = 0 initially, and you have two processors/queues.
> Initially tgt->req_vq is queue #1.  If you have this:
> 
>     queuecommand on CPU #0         queuecommand #2 on CPU #1
>   --------------------------------------------------------------
>     atomic_inc_return(...) == 1
>                                    atomic_inc_return(...) == 2
>                                    virtscsi_queuecommand to queue #1
>     tgt->req_vq = queue #0
>     virtscsi_queuecommand to queue #0
> 
> then two requests are issued to different queues without a quiescent
> point in the middle.

What happens then? Does this break correctness?

> >>>> +	if (atomic_inc_return(&tgt->reqs) == 1) {
> >>>> +		queue_num = smp_processor_id();
> >>>> +		while (unlikely(queue_num >= vscsi->num_queues))
> >>>> +			queue_num -= vscsi->num_queues;
> >>>> +		tgt->req_vq = &vscsi->req_vqs[queue_num];
> >>>> +	}
> >>>> +	spin_unlock_irqrestore(&tgt->tgt_lock, flags);
> >>>> +	return virtscsi_queuecommand(vscsi, tgt, sc);
> >>>> +}
> >>>> +
> >>>> +
> > 
> > .....
> > 
> >>>> +static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
> >>>> +                                       struct scsi_cmnd *sc)
> >>>> +{
> >>>> +       struct virtio_scsi *vscsi = shost_priv(sh);
> >>>> +       struct virtio_scsi_target_state *tgt = vscsi->tgt[sc->device->id];
> >>>> +
> >>>> +       atomic_inc(&tgt->reqs);
> >>>> +       return virtscsi_queuecommand(vscsi, tgt, sc);
> >>>> +}
> >>>> +
> > 
> > Here, reqs is unused - why bother incrementing it?
> > A branch on completion would be cheaper IMHO.
> 
> Well, I could also let tgt->reqs go negative, but it would be a bit untidy.
> 
> Another alternative is to access the target's target_busy field with
> ACCESS_ONCE, and drop reqs altogether.  Too tricky to do this kind of
> micro-optimization so early, though.

So keep it simple and just check a flag.

> >> virtio-scsi multiqueue has a performance benefit up to 20%
> > 
> > To be fair, you could be running in single queue mode.
> > In that case extra atomics and indirection that this code
> > brings will just add overhead without benefits.
> > I don't know how significant would that be.
> 
> Not measurable in my experiments.
> 
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/