lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 9 May 2024 09:37:51 +0000
From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
To: Alex Williamson <alex.williamson@...hat.com>, liulongfang
	<liulongfang@...wei.com>
CC: "jgg@...dia.com" <jgg@...dia.com>, Jonathan Cameron
	<jonathan.cameron@...wei.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linuxarm@...neuler.org" <linuxarm@...neuler.org>
Subject: RE: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
 the XQC address



> -----Original Message-----
> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Wednesday, May 8, 2024 7:00 PM
> To: liulongfang <liulongfang@...wei.com>
> Cc: jgg@...dia.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@...wei.com>; Jonathan Cameron
> <jonathan.cameron@...wei.com>; kvm@...r.kernel.org; linux-
> kernel@...r.kernel.org; linuxarm@...neuler.org
> Subject: Re: [PATCH v6 2/5] hisi_acc_vfio_pci: modify the register location of
> the XQC address

[...]
 
> > HiSilicon accelerator equipment can perform general services after
> completing live migration.
> > This kind of business is executed through the user mode driver and only
> needs to use SQE and CQE.
> >
> > At the same time, this device can also perform kernel-mode services in the
> VM through the crypto
> > subsystem. This kind of service requires the use of EQE.
> >
> > Finally, if the device is abnormal, the driver needs to perform a device
> reset, and AEQE needs to
> > be used in this case.
> >
> > Therefore, a complete device live migration function needs to ensure that
> device functions are
> > normal in all these scenarios.
> > Therefore, this data still needs to be migrated.
> 
> Ok, I had jumped to an in-kernel host driver in reference to "kernel
> mode" rather than a guest kernel.  Migrating with bad data only affects
> the current configuration of the device, reloading a guest driver to
> update these registers or a reset of the device would allow proper
> operation of the device, correct?

Yes, after talking to Longfang, the device RAS will trigger a reset and
would function after reset.

> 
> But I think this still isn't really a complete solution, we know
> there's a bug in the migration data stream, so not only would we fix
> the data stream, but I think we should also take measures to prevent
> loading a known bad data stream.  AIUI migration of this device while
> running in kernel mode (ie. a kernel driver within a guest VM) is
> broken.  Therefore, the least we can do in a new kernel, knowing that
> there was previously a bug in the migration data stream, is to fail to
> load that migration data because it risks this scenario where the
> device is broken after migration.  Shouldn't we then also increment a
> migration version field in the data stream to block migrations that
> risk this breakage, or barring that, change the magic data field to
> prevent the migration?  Thanks,

Ok. We could add a new ACC_DEV_MAGIC_V2 and prevent the migration
in vf_qm_check_match(). The only concern here is that, it will completely
block old kernel to new kernel migration and since we can recover the
device after the reset whether it is too restrictive or not.

Thanks,
Shameer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ