linux-kernel - Re: [PATCH 3/3] virtio_blk: implement blk_mq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YK8Ho3mC117M8GXS@T590>
Date:   Thu, 27 May 2021 10:44:51 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Stefan Hajnoczi <stefanha@...hat.com>
Cc:     virtualization@...ts.linux-foundation.org,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        Christoph Hellwig <hch@....de>,
        Jason Wang <jasowang@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Jens Axboe <axboe@...nel.dk>, slp@...hat.com,
        sgarzare@...hat.com, "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [PATCH 3/3] virtio_blk: implement blk_mq_ops->poll()

On Thu, May 20, 2021 at 03:13:05PM +0100, Stefan Hajnoczi wrote:
> Request completion latency can be reduced by using polling instead of
> irqs. Even Posted Interrupts or similar hardware support doesn't beat
> polling. The reason is that disabling virtqueue notifications saves
> critical-path CPU cycles on the host by skipping irq injection and in
> the guest by skipping the irq handler. So let's add blk_mq_ops->poll()
> support to virtio_blk.
> 
> The approach taken by this patch differs from the NVMe driver's
> approach. NVMe dedicates hardware queues to polling and submits
> REQ_HIPRI requests only on those queues. This patch does not require
> exclusive polling queues for virtio_blk. Instead, it switches between
> irqs and polling when one or more REQ_HIPRI requests are in flight on a
> virtqueue.
> 
> This is possible because toggling virtqueue notifications is cheap even
> while the virtqueue is running. NVMe cqs can't do this because irqs are
> only enabled/disabled at queue creation time.
> 
> This toggling approach requires no configuration. There is no need to
> dedicate queues ahead of time or to teach users and orchestration tools
> how to set up polling queues.

This approach looks good, and very neat thanks per-vq lock.

BTW, is there any virt-exit saved by disabling vq interrupt? I understand
there isn't since virt-exit may only be involved in remote completion
via sending IPI.

> 
> Possible drawbacks of this approach:
> 
> - Hardware virtio_blk implementations may find virtqueue_disable_cb()
>   expensive since it requires DMA. If such devices become popular then

You mean the hardware need to consider order between DMA completion and
interrupt notify? But it is disabling notify, guest just calls
virtqueue_get_buf() to see if there is buffer available, if not, it will be
polled again.

>   the virtio_blk driver could use a similar approach to NVMe when
>   VIRTIO_F_ACCESS_PLATFORM is detected in the future.
> 
> - If a blk_poll() thread is descheduled it not only hurts polling
>   performance but also delays completion of non-REQ_HIPRI requests on
>   that virtqueue since vq notifications are disabled.
> 
> Performance:
> 
> - Benchmark: fio ioengine=pvsync2 numjobs=4 direct=1
> - Guest: 4 vCPUs with one virtio-blk device (4 virtqueues)

4 jobs can consume up all 4 vCPUs. Just run a quick fio test with
'ioengine=io_uring --numjobs=1' on single vq, and IOPS can be improved
by ~20%(hipri=1 vs hipri=0) with the 3 patches, and the virtio-blk is
still backed on NVMe SSD.


Thanks, 
Ming