lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221207134644.GB21691@lst.de>
Date:   Wed, 7 Dec 2022 14:46:44 +0100
From:   Christoph Hellwig <hch@....de>
To:     Max Gurtovoy <mgurtovoy@...dia.com>
Cc:     Christoph Hellwig <hch@....de>, Jason Gunthorpe <jgg@...pe.ca>,
        Lei Rao <lei.rao@...el.com>, kbusch@...nel.org, axboe@...com,
        kch@...dia.com, sagi@...mberg.me, alex.williamson@...hat.com,
        cohuck@...hat.com, yishaih@...dia.com,
        shameerali.kolothum.thodi@...wei.com, kevin.tian@...el.com,
        mjrosato@...ux.ibm.com, linux-kernel@...r.kernel.org,
        linux-nvme@...ts.infradead.org, kvm@...r.kernel.org,
        eddie.dong@...el.com, yadong.li@...el.com, yi.l.liu@...el.com,
        Konrad.wilk@...cle.com, stephen@...eticom.com, hang.yuan@...el.com
Subject: Re: [RFC PATCH 1/5] nvme-pci: add function nvme_submit_vf_cmd to
 issue admin commands for VF driver.

On Wed, Dec 07, 2022 at 12:59:00PM +0200, Max Gurtovoy wrote:
> Why is it preferred that the migration SW will talk directly to the PF and 
> not via VFIO interface ?

It should never talk directly to any hardware, but through a kernel
interface, and that's probably vfio.  But that interface needs to
centered around the controlling function for all the reasons I've
written down multiple times now.

> It's just an implementation detail.

No, it's not.  While you could come up with awkward ways to map how
the hardware interface must work to a completely contrary kernel
interface that's just going to create the need for lots of boilerplate
code _and_ confuses users.  The function that is beeing migrated can
fundamentally not be in control of itself.  Any interface that pretends
it is broken and a long term nightmare for users and implementers.

> I feel like it's even sounds more reasonable to have a common API like we 
> have today to save_state/resume_state/quiesce_device/freeze_device and each 
> device implementation will translate this functionality to its own SPEC.

Absolutely.

> If I understand your direction is to have QEMU code to talk to 
> nvmecli/new_mlx5cli/my_device_cli to do that and I'm not sure it's needed.

No.

> The controlled device is not aware of any of the migration process. Only 
> the migration SW, system admin and controlling device.

Exactly.

> So in the source:
>
> 1. We enable SRIOV on the NVMe driver

Again.  Nothing in live migration is tied to SR-IOV at all.  SR-IOV
is just one way to get multiple functions.

> 2. We list all the secondary controllers: nvme1, nvme2, nvme3
>
> 3. We allow migrating nvme1, nvme2, nvme3 - now these VFs are migratable 
> (controlling to controlled).
>
> 4. We bind nvme1, nvme2, nvme3 to VFIO NVMe driver
>
> 5. We pass these functions to VM

And you need to pass the controlling function (or rather a handle for
it), because there is absolutely no sane way to discover that from
the controlled function as it can't have that information by the
fact that it is beeing passed to unprivilged VMs.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ