lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 20 Apr 2023 17:42:27 -0700
From:   Brett Creeley <bcreeley@....com>
To:     Jason Gunthorpe <jgg@...dia.com>,
        Brett Creeley <brett.creeley@....com>
Cc:     kvm@...r.kernel.org, netdev@...r.kernel.org,
        alex.williamson@...hat.com, yishaih@...dia.com,
        shameerali.kolothum.thodi@...wei.com, kevin.tian@...el.com,
        shannon.nelson@....com, drivers@...sando.io,
        simon.horman@...igine.com
Subject: Re: [PATCH v8 vfio 3/7] vfio/pds: register with the pds_core PF

On 4/14/2023 5:43 AM, Jason Gunthorpe wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Tue, Apr 04, 2023 at 12:01:37PM -0700, Brett Creeley wrote:
>> @@ -30,13 +34,23 @@ pds_vfio_pci_probe(struct pci_dev *pdev,
>>
>>        dev_set_drvdata(&pdev->dev, &pds_vfio->vfio_coredev);
>>        pds_vfio->pdev = pdev;
>> +     pds_vfio->pdsc = pdsc_get_pf_struct(pdev);
> 
> This should not be a void *, it has a type, looks like it is 'struct
> pdsc *' - comment applies to all the places in both series that
> dropped the type here.

Will fix.

> 
> Jason


Hey Jason,

Thanks for the responses/feedback.

For some reason Shannon and I didn't get any of your recent responses in 
our inboxes except this one. We're not really sure why... Due to this, 
I'm replying to all of your responses in this thread.

 >> +	union pds_core_adminq_cmd cmd = { 0 };

 > These should all be = {}, adding the 0 is a subtly different thing in

Will fix.

 >> +int
 >> +pds_vfio_suspend_device_cmd(struct pds_vfio_pci_device *pds_vfio)
 >> +{
 >> +	struct pds_lm_suspend_cmd cmd = {
 >> +		.opcode = PDS_LM_CMD_SUSPEND,
 >> +		.vf_id = cpu_to_le16(pds_vfio->vf_id),
 >> +	};
 >> +	struct pds_lm_suspend_comp comp = {0};
 >> +	struct pci_dev *pdev = pds_vfio->pdev;
 >> +	int err;
 >> +
 >> +	dev_dbg(&pdev->dev, "vf%u: Suspend device\n", pds_vfio->vf_id);
 >> +
 >> +	err = pds_client_adminq_cmd(pds_vfio,
 >> +				    (union pds_core_adminq_cmd *)&cmd,
 >> +				    sizeof(cmd),
 >> +				    (union pds_core_adminq_comp *)&comp,
 >> +				    PDS_AQ_FLAG_FASTPOLL);

 > These casts to a union are really weird, why isn't the union the type
 > on the stack?

Yeah, this is an artifact of initial development that allowed us to 
completely de-couple pds_lm.h from pds_adminq.h, but there don't seem to 
be any conflicts including inluding pds_lm.h in pds_adminq.h. So, it 
will be fixed in the next revision.

 >> +
 >> +	/* alloc sgl */
 >> +	sgl = dma_alloc_coherent(dev, lm_file->num_sge *
 >> +				 sizeof(struct pds_lm_sg_elem),
 >> +				 &lm_file->sgl_addr, GFP_KERNEL);

 > Do you really need a coherent allocation for this?

I don't think it is needed. I will look into this and fix it if 
dma_alloc_coherent() isn't needed.

 >> +#define PDS_VFIO_LM_FILENAME	"pds_vfio_lm"

 > This doesn't need a define, it is typical to write the pseudo filename
 > in the only anon_inode_getfile()

Yeah, we technically only use it in one spot, so it makes sense to just 
have the string inlined to the anon_inode_getfile() call. This also 
allows us to get rid of the const char *name argument in 
pds_vfio_get_lm_file(). Will fix in the next revision.

 >> +static struct pds_vfio_lm_file *
 >> +pds_vfio_get_lm_file(const char *name, const struct file_operations 
*fops,
 >> +		     int flags, u64 size)
 >> +{
 >> +	struct pds_vfio_lm_file *lm_file = NULL;
 >> +	unsigned long long npages;
 >> +	struct page **pages;
 >> +	int err = 0;
 >> +
 >> +	if (!size)
 >> +		return NULL;
 >> +
 >> +	/* Alloc file structure */
 >> +	lm_file = kzalloc(sizeof(*lm_file), GFP_KERNEL);
 >> +	if (!lm_file)
 >> +		return NULL;
 >> +
 >> +	/* Create file */
 >> +	lm_file->filep = anon_inode_getfile(name, fops, lm_file, flags);
 >> +	if (!lm_file->filep)
 >> +		goto err_get_file;
 >> +
 >> +	stream_open(lm_file->filep->f_inode, lm_file->filep);
 >> +	mutex_init(&lm_file->lock);
 >> +
 >> +	lm_file->size = size;
 >> +
 >> +	/* Allocate memory for file pages */
 >> +	npages = DIV_ROUND_UP_ULL(lm_file->size, PAGE_SIZE);
 >> +
 >> +	pages = kcalloc(npages, sizeof(*pages), GFP_KERNEL);
 >> +	if (!pages)
 >> +		goto err_alloc_pages;
 >> +
 >> +	for (unsigned long long i = 0; i < npages; i++) {
 >> +		pages[i] = alloc_page(GFP_KERNEL);
 >> +		if (!pages[i])
 >> +			goto err_alloc_page;
 >> +	}
 >> +
 >> +	lm_file->pages = pages;
 >> +	lm_file->npages = npages;
 >> +	lm_file->alloc_size = npages * PAGE_SIZE;
 >> +
 >> +	/* Create scatterlist of file pages to use for DMA mapping later */
 >> +	err = sg_alloc_table_from_pages(&lm_file->sg_table, pages, npages,
 >> +					0, size, GFP_KERNEL);
 >> +	if (err)
 >> +		goto err_alloc_sg_table;

 > This is the same basic thing the mlx5 driver does, you should move the
 > mlx5 code into some common place and just re-use it here.

I looked at the mlx5 code and even though the two drivers are doing the 
same basic thing, IMHO it doesn't seem like a straight forward task as 
the mlx5 code seems to have some device/driver specifics mixed in. I'd 
prefer not trying to refactor/commonize this bit of code at this point 
in time.

However, it does seem like a good future improvement once things quiet 
down after getting this initial series merged.

 >> diff --git a/drivers/vfio/pci/pds/vfio_dev.h 
b/drivers/vfio/pci/pds/vfio_dev.h
 >> index 10557e8dc829..3f55861ffc7c 100644
 >> --- a/drivers/vfio/pci/pds/vfio_dev.h
 >> +++ b/drivers/vfio/pci/pds/vfio_dev.h
 >> @@ -7,10 +7,20 @@
 >>  #include <linux/pci.h>
 >>  #include <linux/vfio_pci_core.h>
 >>
 >> +#include "lm.h"
 >> +
 >>  struct pds_vfio_pci_device {
 >>  	struct vfio_pci_core_device vfio_coredev;
 >>  	struct pci_dev *pdev;
 >>  	void *pdsc;
 >> +	struct device *coredev;

 > Why? If this is just &pdev->dev it it doesn't need to be in the struct
 > And pdev is just vfio_coredev->pdev, don't need to duplicate it either

This was actually the pds_core's device structure. I have removed this 
in my local tree and instead use the pci_physfn() to get pds_core's 
struct device. Will be fixed in the next revision.

 >> +static void
 >> +pds_vfio_recovery_work(struct work_struct *work)
 >> +{
 >> +	struct pds_vfio_pci_device *pds_vfio =
 >> +		container_of(work, struct pds_vfio_pci_device, work);
 >> +	bool deferred_reset_needed = false;
 >> +
 >> +	/* Documentation states that the kernel migration driver must not
 >> +	 * generate asynchronous device state transitions outside of
 >> +	 * manipulation by the user or the VFIO_DEVICE_RESET ioctl.
 >> +	 *
 >> +	 * Since recovery is an asynchronous event received from the device,
 >> +	 * initiate a deferred reset. Only issue the deferred reset if a
 >> +	 * migration is in progress, which will cause the next step of the
 >> +	 * migration to fail. Also, if the device is in a state that will
 >> +	 * be set to VFIO_DEVICE_STATE_RUNNING on the next action (i.e. VM is
 >> +	 * shutdown and device is in VFIO_DEVICE_STATE_STOP) as that will 
clear
 >> +	 * the VFIO_DEVICE_STATE_ERROR when the VM starts back up.
 >> +	 */
 >> +	mutex_lock(&pds_vfio->state_mutex);
 >> +	if ((pds_vfio->state != VFIO_DEVICE_STATE_RUNNING &&
 >> +	     pds_vfio->state != VFIO_DEVICE_STATE_ERROR) ||
 >> +	    (pds_vfio->state == VFIO_DEVICE_STATE_RUNNING &&
 >> +	     pds_vfio_dirty_is_enabled(pds_vfio)))
 >> +		deferred_reset_needed = true;
 >> +	mutex_unlock(&pds_vfio->state_mutex);
 >> +
 >> +	/* On the next user initiated state transition, the device will
 >> +	 * transition to the VFIO_DEVICE_STATE_ERROR. At this point it's 
the user's
 >> +	 * responsibility to reset the device.
 >> +	 *
 >> +	 * If a VFIO_DEVICE_RESET is requested post recovery and before 
the next
 >> +	 * state transition, then the deferred reset state will be set to
 >> +	 * VFIO_DEVICE_STATE_RUNNING.
 >> +	 */
 >> +	if (deferred_reset_needed)
 >> +		pds_vfio_deferred_reset(pds_vfio, VFIO_DEVICE_STATE_ERROR);
 >> +}

 > Why is this a work? it is threaded on a blocking_notifier_chain so it
 > can call the mutex?

I think the work item can be dropped and the contents of the work 
function can be moved in the notifier callback. I will fix this in the 
next revision.

 > Why is the locking like this, can't you just call
 > pds_vfio_deferred_reset() under the mutex?

It was done to avoid any lock ordering issues with 
pds_vfio_state_mutex_unlock() or pds_vfio_reset().

 >> Add Kconfig entries and pds_vfio.rst. Also, add an entry in the
 >> MAINTAINERS file for this new driver.
 >>
 >> It's not clear where documentation for vendor specific VFIO
 >> drivers should live, so just re-use the current amd
 >> ethernet location.

 > It would be nice to make a kdoc section for vfio.

It seems like there are already vfio docs in Documentation/driver-api/, 
but the kdoc added in this patch is slightly different since it's vendor 
specific. Which of the following locations make the most sense?

[1] Documentation/vfio/<vendor>/<vendor_kdoc>
- Documentation/vfio/amd/pds_vfio.rst

[2] Documentation/vfio/vendor-drivers/<vendor_kdoc>
- Documentation/vfio/vendor-drivers/pds_vfio.rst

Thanks again for the time and feedback,

Brett



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ