linux-kernel - Re: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y48+AaG5rSCviIhl@ziepe.ca>
Date:   Tue, 6 Dec 2022 09:05:05 -0400
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     Christoph Hellwig <hch@....de>
Cc:     Lei Rao <lei.rao@...el.com>, kbusch@...nel.org, axboe@...com,
        kch@...dia.com, sagi@...mberg.me, alex.williamson@...hat.com,
        cohuck@...hat.com, yishaih@...dia.com,
        shameerali.kolothum.thodi@...wei.com, kevin.tian@...el.com,
        mjrosato@...ux.ibm.com, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org, kvm@...r.kernel.org,
        eddie.dong@...el.com, yadong.li@...el.com, yi.l.liu@...el.com,
        Konrad.wilk@...cle.com, stephen@...eticom.com, hang.yuan@...el.com
Subject: Re: [RFC PATCH 5/5] nvme-vfio: Add a document for the NVMe device

On Tue, Dec 06, 2022 at 07:26:04AM +0100, Christoph Hellwig wrote:
> all here).  In Linux the equivalent would be to implement a mdev driver
> that allows passing through the I/O qeues to a guest, but it might

Definately not - "mdev" drivers should be avoided as much as possible.

In this case Intel has a real PCI SRIOV VF to expose to the guest,
with a full VF RID. The proper VFIO abstraction is the variant PCI
driver as this series does. We want to use the variant PCI drivers
because they properly encapsulate all the PCI behaviors (MSI, config
space, regions, reset, etc) without requiring re-implementation of this
in mdev drivers.

mdev drivers should only be considered if a real PCI VF is not
available - eg because the device is doing "SIOV" or something.

We have several migration drivers in VFIO now following this general
pattern, from what I can see they have done it broadly properly from a
VFIO perspective.

> be a better idea to handle the device model emulation entirely in
> Qemu (or other userspace device models) and just find a way to expose
> enough of the I/O queues to userspace.

This is much closer to the VDPA model which is basically providing a
some kernel support to access the IO queue and a lot of SW in qemu to
generate the PCI device in the VM.

The approach has positives and negatives, we have done both in mlx5
devices and we have a preference toward the VFIO model. VPDA
specifically is very big and complicated compared to the VFIO
approach.

Overall having fully functional PCI SRIOV VF's available lets more
uses cases work than just "qemu to create a VM". qemu can always build
a VDPA like thing by using VFIO and VFIO live migration to shift
control of the device between qemu and HW.

I don't think we know enough about this space at the moment to fix a
specification to one path or the other, so I hope the TPAR will settle
on something that can support both models in SW and people can try
things out.

Jason