linux-kernel - Re: [PATCH 0/1] vhost: add vhost

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 6 Nov 2018 15:08:26 -0500
From:   Vitaly Mayatskih <v.mayatskih@...il.com>
To:     stefanha@...il.com
Cc:     Jason Wang <jasowang@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, Kevin Wolf <kwolf@...hat.com>,
        "Michael S . Tsirkin" <mst@...hat.com>, den@...tuozzo.com
Subject: Re: [PATCH 0/1] vhost: add vhost_blk driver

On Tue, Nov 6, 2018 at 10:40 AM Stefan Hajnoczi <stefanha@...il.com> wrote:

> Previously vhost_blk.ko implementations were basically the same thing as
> the QEMU x-data-plane=on (dedicated thread using Linux AIO), except they
> were using a kernel thread and maybe submitted bios.
>
> The performance differences weren't convincing enough that it seemed
> worthwhile maintaining another code path which loses live migration, I/O
> throttling, image file formats, etc (all the things that QEMU's block
> layer supports).
>
> Two changes since then:
>
> 1. x-data-plane=on has been replaced with a full trip down QEMU's block
> layer (-object iothread,id=iothread0 -device
> virtio-blk-pci,iothread=iothread0,...).  It's slower and not truly
> multiqueue (yet!).
>
> So from this perspective vhost_blk.ko might be more attractive again, at
> least until further QEMU block layer work eliminates the multiqueue and
> performance overheads.

Yes, this work is a direct consequence of insufficient performance of
virtio-blk's host side. I'm working on a storage driver, but there's
no a good way to feed all these IOs into one disk of one VM. The
nature of storage design dictates the need of very high IOPS seen by
VM. This is only one tiny use case of course, but the vhost/QEMU
change is small enough to share.

> 2. SPDK has become available for users who want the best I/O performance
> and are willing to sacrifice CPU cores for polling.
>
> If you want better performance and don't care about QEMU block layer
> features, could you use SPDK?  People who are the target market for
> vhost_blk.ko would probably be willing to use SPDK and it already
> exists...

Yes. Though in my experience SPDK creates more problems most of times
than it solves ;) What I find very compelling in using a plain Linux
block device is that it is really fast these days (blk-mq) and the
device mapper can be used for even greater flexibility. Device mapper
is less than perfect performance-wise and at some point will need some
work for sure, but still can push few million IOPS through. And it's
all standard code with decades old user APIs.

In fact, Linux kernel is so good now that our pure-software solution
can push IO at rates up to the limits of fat hardware (x00 GbE, bunch
of NVMes) without an apparent need for hardware acceleration. And,
without hardware dependencies, it is much more flexible. Disk
interface between the host and VM was the only major bottleneck.

> From the QEMU userspace perspective, I think the best way to integrate
> vhost_blk.ko is to transparently switch to it when possible.  If the
> user enables QEMU block layer features that are incompatible with
> vhost_blk.ko, then it should fall back to the QEMU block layer
> transparently.

Sounds like an excellent idea! I'll do that. Most of vhost-blk support
in QEMU is a boilerplate code anyways.

> I'm not keen on yet another code path with it's own set of limitations
> and having to educate users about how to make the choice.  But if it can
> be integrated transparently as an "accelerator", then it could be
> valuable.

Understood. Agree.

Thanks!
-- 
wbr, Vitaly