netdev - Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a496713-ae1d-11f2-1260-e4c1956e1eda@nvidia.com>
Date:   Wed, 20 Oct 2021 11:28:04 +0300
From:   Yishai Hadas <yishaih@...dia.com>
To:     Jason Gunthorpe <jgg@...dia.com>,
        Alex Williamson <alex.williamson@...hat.com>
CC:     <bhelgaas@...gle.com>, <saeedm@...dia.com>,
        <linux-pci@...r.kernel.org>, <kvm@...r.kernel.org>,
        <netdev@...r.kernel.org>, <kuba@...nel.org>, <leonro@...dia.com>,
        <kwankhede@...dia.com>, <mgurtovoy@...dia.com>, <maorg@...dia.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
 for mlx5 devices

On 10/20/2021 2:04 AM, Jason Gunthorpe wrote:
> On Tue, Oct 19, 2021 at 02:58:56PM -0600, Alex Williamson wrote:
>> I think that gives us this table:
>>
>> |   NDMA   | RESUMING |  SAVING  |  RUNNING |
>> +----------+----------+----------+----------+ ---
>> |     X    |     0    |     0    |     0    |  ^
>> +----------+----------+----------+----------+  |
>> |     0    |     0    |     0    |     1    |  |
>> +----------+----------+----------+----------+  |
>> |     X    |     0    |     1    |     0    |
>> +----------+----------+----------+----------+  NDMA value is either compatible
>> |     0    |     0    |     1    |     1    |  to existing behavior or don't
>> +----------+----------+----------+----------+  care due to redundancy vs
>> |     X    |     1    |     0    |     0    |  !_RUNNING/INVALID/ERROR
>> +----------+----------+----------+----------+
>> |     X    |     1    |     0    |     1    |  |
>> +----------+----------+----------+----------+  |
>> |     X    |     1    |     1    |     0    |  |
>> +----------+----------+----------+----------+  |
>> |     X    |     1    |     1    |     1    |  v
>> +----------+----------+----------+----------+ ---
>> |     1    |     0    |     0    |     1    |  ^
>> +----------+----------+----------+----------+  Desired new useful cases
>> |     1    |     0    |     1    |     1    |  v
>> +----------+----------+----------+----------+ ---
>>
>> Specifically, rows 1, 3, 5 with NDMA = 1 are valid states a user can
>> set which are simply redundant to the NDMA = 0 cases.
> It seems right
>
>> Row 6 remains invalid due to lack of support for pre-copy (_RESUMING
>> | _RUNNING) and therefore cannot be set by userspace.  Rows 7 & 8
>> are error states and cannot be set by userspace.
> I wonder, did Yishai's series capture this row 6 restriction? Yishai?


It seems so,  by using the below check which includes the 
!VFIO_DEVICE_STATE_VALID clause.

if (old_state == VFIO_DEVICE_STATE_ERROR ||
         !VFIO_DEVICE_STATE_VALID(state) ||
         (state & ~MLX5VF_SUPPORTED_DEVICE_STATES))
         return -EINVAL;

Which is:

#define VFIO_DEVICE_STATE_VALID(state) \
     (state & VFIO_DEVICE_STATE_RESUMING ? \
     (state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_RESUMING : 1)

>
>> Like other bits, setting the bit should be effective at the completion
>> of writing device state.  Therefore the device would need to flush any
>> outbound DMA queues before returning.
> Yes, the device commands are expected to achieve this.
>
>> The question I was really trying to get to though is whether we have a
>> supportable interface without such an extension.  There's currently
>> only an experimental version of vfio migration support for PCI devices
>> in QEMU (afaik),
> If I recall this only matters if you have a VM that is causing
> migratable devices to interact with each other. So long as the devices
> are only interacting with the CPU this extra step is not strictly
> needed.
>
> So, single device cases can be fine as-is
>
> IMHO the multi-device case the VMM should probably demand this support
> from the migration drivers, otherwise it cannot know if it is safe for
> sure.
>
> A config option to override the block if the admin knows there is no
> use case to cause devices to interact - eg two NVMe devices without
> CMB do not have a useful interaction.
>
>> so it seems like we could make use of the bus-master bit to fill
>> this gap in QEMU currently, before we claim non-experimental
>> support, but this new device agnostic extension would be required
>> for non-PCI device support (and PCI support should adopt it as
>> available).  Does that sound right?  Thanks,
> I don't think the bus master support is really a substitute, tripping
> bus master will stop DMA but it will not do so in a clean way and is
> likely to be non-transparent to the VM's driver.
>
> The single-device-assigned case is a cleaner restriction, IMHO.
>
> Alternatively we can add the 4th bit and insist that migration drivers
> support all the states. I'm just unsure what other HW can do, I get
> the feeling people have been designing to the migration description in
> the header file for a while and this is a new idea.
>
> Jason

Just to be sure,

We refer here to some future functionality support with this extra 4th 
bit but it doesn't enforce any change in the submitted code, right ?

The below code uses the (state & ~MLX5VF_SUPPORTED_DEVICE_STATES) clause 
which fails any usage of a non-supported bit as of this one.

if (old_state == VFIO_DEVICE_STATE_ERROR ||
         !VFIO_DEVICE_STATE_VALID(state) ||
         (state & ~MLX5VF_SUPPORTED_DEVICE_STATES))
         return -EINVAL;

Yishai