[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM6PR11MB2841C1643C0414031941D191DD130@DM6PR11MB2841.namprd11.prod.outlook.com>
Date: Thu, 20 Feb 2020 18:48:04 +0000
From: "Ertman, David M" <david.m.ertman@...el.com>
To: Jason Gunthorpe <jgg@...pe.ca>,
"Kirsher, Jeffrey T" <jeffrey.t.kirsher@...el.com>
CC: "davem@...emloft.net" <davem@...emloft.net>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"nhorman@...hat.com" <nhorman@...hat.com>,
"sassmann@...hat.com" <sassmann@...hat.com>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
"Bowers, AndrewX" <andrewx.bowers@...el.com>
Subject: RE: [RFC PATCH v4 02/25] ice: Create and register virtual bus for
RDMA
> -----Original Message-----
> From: Jason Gunthorpe <jgg@...pe.ca>
> Sent: Friday, February 14, 2020 12:40 PM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@...el.com>
> Cc: davem@...emloft.net; gregkh@...uxfoundation.org; Ertman, David M
> <david.m.ertman@...el.com>; netdev@...r.kernel.org; linux-
> rdma@...r.kernel.org; nhorman@...hat.com; sassmann@...hat.com;
> Nguyen, Anthony L <anthony.l.nguyen@...el.com>; Bowers, AndrewX
> <andrewx.bowers@...el.com>
> Subject: Re: [RFC PATCH v4 02/25] ice: Create and register virtual bus for
> RDMA
>
> On Wed, Feb 12, 2020 at 11:14:01AM -0800, Jeff Kirsher wrote:
> > +/**
> > + * ice_init_peer_devices - initializes peer devices
> > + * @pf: ptr to ice_pf
> > + *
> > + * This function initializes peer devices on the virtual bus.
> > + */
> > +int ice_init_peer_devices(struct ice_pf *pf)
> > +{
> > + struct ice_vsi *vsi = pf->vsi[0];
> > + struct pci_dev *pdev = pf->pdev;
> > + struct device *dev = &pdev->dev;
> > + int status = 0;
> > + int i;
> > +
> > + /* Reserve vector resources */
> > + status = ice_reserve_peer_qvector(pf);
> > + if (status < 0) {
> > + dev_err(dev, "failed to reserve vectors for peer drivers\n");
> > + return status;
> > + }
> > + for (i = 0; i < ARRAY_SIZE(ice_peers); i++) {
> > + struct ice_peer_dev_int *peer_dev_int;
> > + struct ice_peer_drv_int *peer_drv_int;
> > + struct iidc_qos_params *qos_info;
> > + struct iidc_virtbus_object *vbo;
> > + struct msix_entry *entry = NULL;
> > + struct iidc_peer_dev *peer_dev;
> > + struct virtbus_device *vdev;
> > + int j;
> > +
> > + /* structure layout needed for container_of's looks like:
> > + * ice_peer_dev_int (internal only ice peer superstruct)
> > + * |--> iidc_peer_dev
> > + * |--> *ice_peer_drv_int
> > + *
> > + * iidc_virtbus_object (container_of parent for vdev)
> > + * |--> virtbus_device
> > + * |--> *iidc_peer_dev (pointer from internal struct)
> > + *
> > + * ice_peer_drv_int (internal only peer_drv struct)
> > + */
> > + peer_dev_int = devm_kzalloc(dev, sizeof(*peer_dev_int),
> > + GFP_KERNEL);
> > + if (!peer_dev_int)
> > + return -ENOMEM;
> > +
> > + vbo = kzalloc(sizeof(*vbo), GFP_KERNEL);
> > + if (!vbo) {
> > + devm_kfree(dev, peer_dev_int);
> > + return -ENOMEM;
> > + }
> > +
> > + peer_drv_int = devm_kzalloc(dev, sizeof(*peer_drv_int),
> > + GFP_KERNEL);
>
> To me, this looks like a lifetime mess. All these devm allocations
> against the parent object are being referenced through the vbo with a
> different kref lifetime. The whole thing has very unclear semantics
> who should be cleaning up on error
Will cover this at the end after addressing your following points =)
In my reply, I am going to refer to the kernel object that is registering the
virtbus_device(s) as KO_device and the kernel object that is registering
the virtbus_driver(s) as KO_driver.
>
> > + if (!peer_drv_int) {
> > + devm_kfree(dev, peer_dev_int);
> > + kfree(vbo);
>
> ie here we free two things
At this point in the init flow for KO_device, there has only been kallocs done,
no device has been registered with virtbus. So, only memory cleanup is
required.
>
> > + return -ENOMEM;
> > + }
> > +
> > + pf->peers[i] = peer_dev_int;
> > + vbo->peer_dev = &peer_dev_int->peer_dev;
> > + peer_dev_int->peer_drv_int = peer_drv_int;
> > + peer_dev_int->peer_dev.vdev = &vbo->vdev;
> > +
> > + /* Initialize driver values */
> > + for (j = 0; j < IIDC_EVENT_NBITS; j++)
> > + bitmap_zero(peer_drv_int->current_events[j].type,
> > + IIDC_EVENT_NBITS);
> > +
> > + mutex_init(&peer_dev_int->peer_dev_state_mutex);
> > +
> > + peer_dev = &peer_dev_int->peer_dev;
> > + peer_dev->peer_ops = NULL;
> > + peer_dev->hw_addr = (u8 __iomem *)pf->hw.hw_addr;
> > + peer_dev->peer_dev_id = ice_peers[i].id;
> > + peer_dev->pf_vsi_num = vsi->vsi_num;
> > + peer_dev->netdev = vsi->netdev;
> > +
> > + peer_dev_int->ice_peer_wq =
> > + alloc_ordered_workqueue("ice_peer_wq_%d",
> WQ_UNBOUND,
> > + i);
> > + if (!peer_dev_int->ice_peer_wq)
> > + return -ENOMEM;
>
> Here we free nothing
This is a miss on my part. At this point we should keep consistent and free the memory
that has been allocated as we unwind.
>
> > +
> > + peer_dev->pdev = pdev;
> > + qos_info = &peer_dev->initial_qos_info;
> > +
> > + /* setup qos_info fields with defaults */
> > + qos_info->num_apps = 0;
> > + qos_info->num_tc = 1;
> > +
> > + for (j = 0; j < IIDC_MAX_USER_PRIORITY; j++)
> > + qos_info->up2tc[j] = 0;
> > +
> > + qos_info->tc_info[0].rel_bw = 100;
> > + for (j = 1; j < IEEE_8021QAZ_MAX_TCS; j++)
> > + qos_info->tc_info[j].rel_bw = 0;
> > +
> > + /* for DCB, override the qos_info defaults. */
> > + ice_setup_dcb_qos_info(pf, qos_info);
> > +
> > + /* make sure peer specific resources such as msix_count and
> > + * msix_entries are initialized
> > + */
> > + switch (ice_peers[i].id) {
> > + case IIDC_PEER_RDMA_ID:
> > + if (test_bit(ICE_FLAG_IWARP_ENA, pf->flags)) {
> > + peer_dev->msix_count = pf-
> >num_rdma_msix;
> > + entry = &pf->msix_entries[pf-
> >rdma_base_vector];
> > + }
> > + break;
> > + default:
> > + break;
> > + }
> > +
> > + peer_dev->msix_entries = entry;
> > + ice_peer_state_change(peer_dev_int,
> ICE_PEER_DEV_STATE_INIT,
> > + false);
> > +
> > + vdev = &vbo->vdev;
> > + vdev->name = ice_peers[i].name;
> > + vdev->release = ice_peer_vdev_release;
> > + vdev->dev.parent = &pdev->dev;
> > +
> > + status = virtbus_dev_register(vdev);
> > + if (status) {
> > + virtbus_dev_unregister(vdev);
> > + vdev = NULL;
>
> Here we double unregister and free nothing.
>
> You need to go through all of this really carefully and make some kind
> of sane lifetime model and fix all the error unwinding :(
Thanks for catching this. A failure in virtbus_register_device() does
*not* require a call virtbus_unregister_device. The failure path for the
register function handles this. Also, we need to remain consistent with freeing
on unwind.
>
> Why doesn't the release() function of vbo trigger the free of all this
> peer related stuff?
>
> Use a sane design model of splitting into functions to allocate single
> peices of memory, goto error unwind each function, and build things up
> properly.
>
> Jason
I am going to add this to the documentation to record the following information.
The KO_device is responsible for allocating the memory for the virtbus_device
and keeping it viable for the lifetime of the KO_device. KO_device will call
virtbus_register_device to start using the virtbus_device, and KO_device is
responslble for calling virtbus_unregister_device either on KO_device's exit
path (remove/shutdown) or when it is done using the virtbus subsystem.
The KO_driver is responsible for allocating the memory for the virtbus_driver
and keeping it viable for the lifetime of the KO_driver. KO_driver will call
virtbus_register_driver to start using the virtbus_driver, and KO_driver is
responsible for calling virtbus_unregister_driver either on KO_driver's exit
path (remove/shutdown) or when it is done using the virtbus subsystem.
The premise is that the KO_device and KO_driver can load and unload multiple
times and they can reconnect to each other through the virtbus on each
occurrence of their reloads. So one example of a flow looks like the following:
- KO_device loads (order of KO_device and KO_driver loading is irrelevant)
- KO_device allocates memory for virtbus_device(s) it expects to use and
any backing memory it is going to use to interact with KO_driver.
- KO_device performs virtbus_register_device() which is the *only* place
a device_initialize() is performed for virtbus_device.
- KO_driver loads
- KO_driver allocates memory for virtbus_driver(s) it expects to use and
any backing memory it expects to use to interact with KO_device
- KO_driver performs virtbus_register_driver()
- virtbus matches virtbus_device and virtbus_driver and calls the
virtbus_drivers's probe()
- KO_driver and KO_device interact with each other however they choose to do so.
- KO_device (for example) receives a call to its remove callback
- KO_device's unload path severs any interaction the KO_device and KO_driver
were having - implementation dependant
- KO_device's unload path is required to perform a call to
virtbus_unregister_device(). virtbus_unregister_device() is the *only*
place a put_device() is performed.
- KO_device's unload path frees memory associated with the virtbus_device
- vitbus calls KO_drivers's .remove callback defined for the virtbus_driver
So, the lifespan of the virtbus_device is controlled by KO_device and the
lifespan of virtbus_driver is controlled by KO_driver.
It is required for the KO's to "allocate -> register -> unregister -> free"
virtbus objects.
-DaveE
Powered by blists - more mailing lists