lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <73aa389d-7ef6-5563-0109-a4d6750756df@amd.com>
Date: Fri, 4 Aug 2023 10:34:18 -0700
From: Brett Creeley <bcreeley@....com>
To: Jason Gunthorpe <jgg@...dia.com>, Brett Creeley <brett.creeley@....com>
Cc: kvm@...r.kernel.org, netdev@...r.kernel.org, alex.williamson@...hat.com,
 yishaih@...dia.com, shameerali.kolothum.thodi@...wei.com,
 kevin.tian@...el.com, simon.horman@...igine.com, shannon.nelson@....com
Subject: Re: [PATCH v13 vfio 6/7] vfio/pds: Add support for firmware recovery



On 8/4/2023 10:18 AM, Jason Gunthorpe wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Tue, Jul 25, 2023 at 02:40:24PM -0700, Brett Creeley wrote:
>> It's possible that the device firmware crashes and is able to recover
>> due to some configuration and/or other issue. If a live migration
>> is in progress while the firmware crashes, the live migration will
>> fail. However, the VF PCI device should still be functional post
>> crash recovery and subsequent migrations should go through as
>> expected.
>>
>> When the pds_core device notices that firmware crashes it sends an
>> event to all its client drivers. When the pds_vfio driver receives
>> this event while migration is in progress it will request a deferred
>> reset on the next migration state transition. This state transition
>> will report failure as well as any subsequent state transition
>> requests from the VMM/VFIO. Based on uapi/vfio.h the only way out of
>> VFIO_DEVICE_STATE_ERROR is by issuing VFIO_DEVICE_RESET. Once this
>> reset is done, the migration state will be reset to
>> VFIO_DEVICE_STATE_RUNNING and migration can be performed.
> 
> Have you actually tested this? Does the qemu side respond properly if
> this happens during a migration?
> 
> Jason

Yes, this has actually been tested. It's not necessary clean as far as 
the log messages go because the driver may still be getting requests 
(i.e. dirty log requests), but the noise should be okay because this is 
a very rare event.

QEMU does respond properly and in the manner I mentioned above.

Thanks,

Brett

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ