lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e216c772-1921-c728-b91a-56b41048cbfa@nvidia.com>
Date:   Wed, 29 Sep 2021 13:57:13 +0300
From:   Max Gurtovoy <mgurtovoy@...dia.com>
To:     Alex Williamson <alex.williamson@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>
CC:     Leon Romanovsky <leon@...nel.org>,
        Doug Ledford <dledford@...hat.com>,
        Yishai Hadas <yishaih@...dia.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Kirti Wankhede <kwankhede@...dia.com>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <linux-pci@...r.kernel.org>,
        <linux-rdma@...r.kernel.org>, <netdev@...r.kernel.org>,
        Saeed Mahameed <saeedm@...dia.com>,
        Cornelia Huck <cohuck@...hat.com>
Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state
 transition validity


On 9/28/2021 10:19 PM, Alex Williamson wrote:
> On Mon, 27 Sep 2021 20:12:39 -0300
> Jason Gunthorpe <jgg@...pe.ca> wrote:
>
>> On Mon, Sep 27, 2021 at 04:46:48PM -0600, Alex Williamson wrote:
>>>> +	enum { MAX_STATE = VFIO_DEVICE_STATE_RESUMING };
>>>> +	static const u8 vfio_from_state_table[MAX_STATE + 1][MAX_STATE + 1] = {
>>>> +		[VFIO_DEVICE_STATE_STOP] = {
>>>> +			[VFIO_DEVICE_STATE_RUNNING] = 1,
>>>> +			[VFIO_DEVICE_STATE_RESUMING] = 1,
>>>> +		},
>>> Our state transition diagram is pretty weak on reachable transitions
>>> out of the _STOP state, why do we select only these two as valid?
>> I have no particular opinion on specific states here, however adding
>> more states means more stuff for drivers to implement and more risk
>> driver writers will mess up this uAPI.
> It looks like state transitions were largely discussed in v9 and v10 of
> the migration proposals:
>
> https://lore.kernel.org/all/1573578220-7530-2-git-send-email-kwankhede@nvidia.com/
> https://lore.kernel.org/all/1576527700-21805-2-git-send-email-kwankhede@nvidia.com/
>
> I'm not seeing that we really excluded many transitions there.
>
>> So only on those grounds I'd suggest to keep this to the minimum
>> needed instead of the maximum logically possible..
>>
>> Also, probably the FSM comment from the uapi header file should be
>> moved into a function comment above this function?
> It's not clear this function shouldn't be anything more than:
>
> 	if (new_state > MAX_STATE || old_state > MAX_STATE)
> 		return false;	/* exited via device reset, */
> 				/* entered via transition fault */
>
> 	return true;
>
> That's still only 5 fully interconnected states to work between, and
> potentially a 6th if we decide _RESUMING|_RUNNING is valid for a device
> supporting post-copy.
>
> In defining the device state, we tried to steer away from defining it
> in terms of the QEMU migration API, but rather as a set of controls
> that could be used to support that API to leave us some degree of
> independence that QEMU implementation might evolve.

The state machine is not related to QEMU specifically.

The state machine defines an agreement between user application (let's 
say QEMU) and VFIO.

If a user application would like to move, for example, from RESUMING to 
SAVING state, then the kernel should fail. I don't that there is a 
device that can support it.

If you prefer we check this inside our mlx5 vfio driver, we can do it. 
But we think that this is a common logic according to the defined FSM.

Do you prefer code duplication in vendor vfio-pci drivers ?

> To that extent, it actually seems easier for a device implementation to
> focus on bit definition rather than the state machine node.
>
> I'd also vote that any clarification of state validity and transitions
> belongs in the uAPI header and a transition test function should
> reference that header as the source of truth, rather than the other way
> around.  Thanks,

Yes, I guess this is possible.

>
> Alex
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ