[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG7k0ErF+e2vMUYRuh2EBjWmE7iqdOMS1CQv-7r18T1mVbK1aA@mail.gmail.com>
Date: Tue, 30 Apr 2024 10:03:07 +0100
From: Ross Lagerwall <ross.lagerwall@...rix.com>
To: Paul Menzel <pmenzel@...gen.mpg.de>
Cc: netdev@...r.kernel.org, Tony Nguyen <anthony.l.nguyen@...el.com>,
Javi Merino <javi.merino@...nel.org>, intel-wired-lan@...ts.osuosl.org
Subject: Re: [Intel-wired-lan] [PATCH v2] ice: Fix enabling SR-IOV with Xen
On Mon, Apr 29, 2024 at 2:04 PM Paul Menzel <pmenzel@...gen.mpg.de> wrote:
>
> Dear Ross,
>
>
> Thank you for your patch.
>
> Am 29.04.24 um 14:49 schrieb Ross Lagerwall:
> > When the PCI functions are created, Xen is informed about them and
> > caches the number of MSI-X entries each function has. However, the
> > number of MSI-X entries is not set until after the hardware has been
> > configured and the VFs have been started. This prevents
> > PCI-passthrough from working because Xen rejects mapping MSI-X
> > interrupts to domains because it thinks the MSI-X interrupts don't
> > exist.
>
> Thank you for this great problem description. Is there any log message
> shown, you could paste, so people can find this commit when searching
> for the log message?
When this issue occurs, QEMU repeatedly reports:
msi_msix_setup: Error: Mapping of MSI-X (err: 22, vec: 0, entry 0x1)
>
> Do you have a minimal test case, so the maintainers can reproduce this
> to test the fix?
Testing this requires setting up Xen which I wouldn't expect anyone to
do unless they already have an environment set up.
In any case, a "minimal" test would be something like:
1. Set up Xen with dom0 and another VM running Linux.
2. Pass through a VF to the VM. See that QEMU reports the above message
and the VF is not usable within the VM.
3. Rebuild the dom0 kernel with the attached patch.
4. Pass through a VF to the VM. See that the VF is usable within the
VM.
>
> > Fix this by moving the call to pci_enable_sriov() later so that the
> > number of MSI-X entries is set correctly in hardware by the time Xen
> > reads it.
>
> It’d be great if you could be more specific on “later”, and why this is
> the correct place.
"later" in this case means after ice_start_vfs() since it is at that
point that the hardware sets the number of MSI-X entries.
I expect that a maintainer or someone with more knowledge of the
hardware could explain why the hardware only sets the number of MSI-X
entries at this point.
>
> > Signed-off-by: Ross Lagerwall <ross.lagerwall@...rix.com>
> > Signed-off-by: Javi Merino <javi.merino@...nel.org>
> > ---
> >
> > In v2:
> > * Fix cleanup on if pci_enable_sriov() fails.
> >
> > drivers/net/ethernet/intel/ice/ice_sriov.c | 23 +++++++++++++---------
> > 1 file changed, 14 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
> > index a958fcf3e6be..bc97493046a8 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_sriov.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
> > @@ -864,6 +864,8 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
> > int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
> > struct device *dev = ice_pf_to_dev(pf);
> > struct ice_hw *hw = &pf->hw;
> > + struct ice_vf *vf;
> > + unsigned int bkt;
> > int ret;
> >
> > pf->sriov_irq_bm = bitmap_zalloc(total_vectors, GFP_KERNEL);
> > @@ -877,24 +879,20 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
> > set_bit(ICE_OICR_INTR_DIS, pf->state);
> > ice_flush(hw);
> >
> > - ret = pci_enable_sriov(pf->pdev, num_vfs);
> > - if (ret)
> > - goto err_unroll_intr;
> > -
> > mutex_lock(&pf->vfs.table_lock);
> >
> > ret = ice_set_per_vf_res(pf, num_vfs);
> > if (ret) {
> > dev_err(dev, "Not enough resources for %d VFs, err %d. Try with fewer number of VFs\n",
> > num_vfs, ret);
> > - goto err_unroll_sriov;
> > + goto err_unroll_intr;
> > }
> >
> > ret = ice_create_vf_entries(pf, num_vfs);
> > if (ret) {
> > dev_err(dev, "Failed to allocate VF entries for %d VFs\n",
> > num_vfs);
> > - goto err_unroll_sriov;
> > + goto err_unroll_intr;
> > }
> >
> > ice_eswitch_reserve_cp_queues(pf, num_vfs);
> > @@ -905,6 +903,10 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
> > goto err_unroll_vf_entries;
> > }
> >
> > + ret = pci_enable_sriov(pf->pdev, num_vfs);
> > + if (ret)
> > + goto err_unroll_start_vfs;
> > +
> > clear_bit(ICE_VF_DIS, pf->state);
> >
> > /* rearm global interrupts */
> > @@ -915,12 +917,15 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
> >
> > return 0;
> >
> > +err_unroll_start_vfs:
> > + ice_for_each_vf(pf, bkt, vf) {
> > + ice_dis_vf_mappings(vf);
> > + ice_vf_vsi_release(vf);
> > + }
>
> Why wasn’t this needed with `pci_enable_sriov()` done earlier?
Previously ice_start_vifs() was the last function call that may fail
in this function. That is no longer the case so when
pci_enable_sriov() fails, it needs to undo what was done in
ice_start_vifs().
Thanks,
Ross
Powered by blists - more mailing lists