[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <145c27e6-75e6-b94b-18d8-237de8cade9d@linux.ibm.com>
Date: Tue, 25 Jan 2022 10:44:43 -0500
From: Matthew Rosato <mjrosato@...ux.ibm.com>
To: Pierre Morel <pmorel@...ux.ibm.com>, linux-s390@...r.kernel.org
Cc: alex.williamson@...hat.com, cohuck@...hat.com,
schnelle@...ux.ibm.com, farman@...ux.ibm.com,
borntraeger@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com,
gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com,
frankja@...ux.ibm.com, david@...hat.com, imbrenda@...ux.ibm.com,
vneethv@...ux.ibm.com, oberpar@...ux.ibm.com, freude@...ux.ibm.com,
thuth@...hat.com, pasic@...ux.ibm.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 19/30] KVM: s390: pci: provide routines for
enabling/disabling interrupt forwarding
On 1/25/22 7:41 AM, Pierre Morel wrote:
>
>
> On 1/14/22 21:31, Matthew Rosato wrote:
...
>> +/* Modify PCI: Register floating adapter interruption forwarding */
>> +static int kvm_zpci_set_airq(struct zpci_dev *zdev)
>> +{
>> + u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, ZPCI_MOD_FC_REG_INT);
>> + struct zpci_fib fib = {0};
>
> I prefer {} instead of {0} even it does the same it looks wrong to me.
>
OK
...
>> +int kvm_s390_pci_aif_enable(struct zpci_dev *zdev, struct zpci_fib *fib,
>> + bool assist)
>> +{
>> + struct page *aibv_page, *aisb_page = NULL;
>> + unsigned int msi_vecs, idx;
>> + struct zpci_gaite *gaite;
>> + unsigned long bit;
>> + struct kvm *kvm;
>> + phys_addr_t gaddr;
>> + int rc = 0;
>> +
>> + /*
>> + * Interrupt forwarding is only applicable if the device is already
>> + * enabled for interpretation
>> + */
>> + if (zdev->gd == 0)
>> + return -EINVAL;
>> +
>> + kvm = zdev->kzdev->kvm;
>> + msi_vecs = min_t(unsigned int, fib->fmt0.noi, zdev->max_msi);
>> +
>> + /* Replace AIBV address */
>> + idx = srcu_read_lock(&kvm->srcu);
>> + aibv_page = gfn_to_page(kvm, gpa_to_gfn((gpa_t)fib->fmt0.aibv));
>> + srcu_read_unlock(&kvm->srcu, idx);
>> + if (is_error_page(aibv_page)) {
>> + rc = -EIO;
>> + goto out;
>> + }
>> + gaddr = page_to_phys(aibv_page) + (fib->fmt0.aibv & ~PAGE_MASK);
>> + fib->fmt0.aibv = gaddr;
>> +
>> + /* Pin the guest AISB if one was specified */
>> + if (fib->fmt0.sum == 1) {
>> + idx = srcu_read_lock(&kvm->srcu);
>> + aisb_page = gfn_to_page(kvm, gpa_to_gfn((gpa_t)fib->fmt0.aisb));
>> + srcu_read_unlock(&kvm->srcu, idx);
>> + if (is_error_page(aisb_page)) {
>> + rc = -EIO;
>> + goto unpin1;
>> + }
>> + }
>> +
>> + /* AISB must be allocated before we can fill in GAITE */
>> + mutex_lock(&aift->lock);
>> + bit = airq_iv_alloc_bit(aift->sbv);
>> + if (bit == -1UL)
>> + goto unpin2;
>> + zdev->aisb = bit;
>
> aisb here is the aisb offset right?
Yes
> Then may be add a comment as in gait and fmt0 aisb is an address.
Sure, good point
>
>> + zdev->aibv = airq_iv_create(msi_vecs, AIRQ_IV_DATA |
>> + AIRQ_IV_BITLOCK |
>> + AIRQ_IV_GUESTVEC,
>> + (unsigned long *)fib->fmt0.aibv);
>
> phys_to_virt ?
Ugh, yep -- we just put the physical address in fib->fmt0.aibv a few
lines earlier via page_to_phys
>
>> +
>> + spin_lock_irq(&aift->gait_lock);
>> + gaite = (struct zpci_gaite *)aift->gait + (zdev->aisb *
>> + sizeof(struct zpci_gaite));
>> +
>> + /* If assist not requested, host will get all alerts */
>> + if (assist)
>> + gaite->gisa = (u32)(u64)&kvm->arch.sie_page2->gisa;
>
> virt_to_phys ?
Yes
>
>> + else
>> + gaite->gisa = 0;
>> +
>> + gaite->gisc = fib->fmt0.isc;
>> + gaite->count++;
>> + gaite->aisbo = fib->fmt0.aisbo;
>> + gaite->aisb = virt_to_phys(page_address(aisb_page) +
>> (fib->fmt0.aisb &
>> + ~PAGE_MASK));
>> + aift->kzdev[zdev->aisb] = zdev->kzdev;
>> + spin_unlock_irq(&aift->gait_lock);
>> +
>> + /* Update guest FIB for re-issue */
>> + fib->fmt0.aisbo = zdev->aisb & 63;
>> + fib->fmt0.aisb = virt_to_phys(aift->sbv->vector + (zdev->aisb /
>> 64) * 8);
>> + fib->fmt0.isc = kvm_s390_gisc_register(kvm, gaite->gisc);
>> +
>> + /* Save some guest fib values in the host for later use */
>> + zdev->kzdev->fib.fmt0.isc = fib->fmt0.isc;
>> + zdev->kzdev->fib.fmt0.aibv = fib->fmt0.aibv;
>> + mutex_unlock(&aift->lock);
>> +
>> + /* Issue the clp to setup the irq now */
>> + rc = kvm_zpci_set_airq(zdev);
>> + return rc;
>> +
>> +unpin2:
>> + mutex_unlock(&aift->lock);
>> + if (fib->fmt0.sum == 1) {
>> + gaddr = page_to_phys(aisb_page);
>> + kvm_release_pfn_dirty(gaddr >> PAGE_SHIFT);
>> + }
>> +unpin1:
>> + kvm_release_pfn_dirty(fib->fmt0.aibv >> PAGE_SHIFT);
>> +out:
>> + return rc;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_s390_pci_aif_enable);
>> +
>> +int kvm_s390_pci_aif_disable(struct zpci_dev *zdev)
>> +{
>> + struct kvm_zdev *kzdev = zdev->kzdev;
>> + struct zpci_gaite *gaite;
>> + int rc;
>> + u8 isc;
>> +
>> + if (zdev->gd == 0)
>> + return -EINVAL;
>> +
>> + /* Even if the clear fails due to an error, clear the GAITE */
>> + rc = kvm_zpci_clear_airq(zdev);
>
> Having a look at kvm_zpci_clear_airq() the only possible error seems to
> be when an error recovery is in progress.
> The error returned for a wrong FH, function does not exist anymore, or
> if the interrupt vectors are already deregistered by the instruction are
> returned as success by the function.
>
> How can we be sure that we have no conflict with a recovery in progress?
> Shouldn't we in this case let the recovery process handle the function
> and stop here?
Hmm -- So I think for a userspace-initiated call to this routine, yes.
We could then assume recovery takes care of things. However, we also
call this routine from vfio-pci core when closing the device...
So then let's look at how this would work -- the current recovery action
for passthrough is always PCI_ERS_RESULT_DISCONNECT. The process of
disconnecting the device will trigger vfio-pci to close it's device,
which in turn will trigger vfio_pci_zdev_release() which will in turn
also call kvm_390_aif_disable as part of cleanup. However, in this case
now we want to clear the GAITE anyway even if kvm_zpci_clear_airq(zdev)
fails now because we know the device is for sure going away.
I think I need some sort of input to this routine that indicates we must
cleanup (bool force or something) which would only be specified by the
call from vfio_pci_zdev_release().
>
> Doesn't the aif lock mutex placed after and not before the clear_irq
> open a door for race condition with the recovery?
Good point.
>
>> +
>> + mutex_lock(&aift->lock);
>> + if (zdev->kzdev->fib.fmt0.aibv == 0)
>> + goto out;
>> + spin_lock_irq(&aift->gait_lock);
>> + gaite = (struct zpci_gaite *)aift->gait + (zdev->aisb *
>> + sizeof(struct zpci_gaite));
>> + isc = gaite->gisc;
>> + gaite->count--;
>> + if (gaite->count == 0) {
>> + /* Release guest AIBV and AISB */
>> + kvm_release_pfn_dirty(kzdev->fib.fmt0.aibv >> PAGE_SHIFT);
>> + if (gaite->aisb != 0)
>> + kvm_release_pfn_dirty(gaite->aisb >> PAGE_SHIFT);
>> + /* Clear the GAIT entry */
>> + gaite->aisb = 0;
>> + gaite->gisc = 0;
>> + gaite->aisbo = 0;
>> + gaite->gisa = 0;
>> + aift->kzdev[zdev->aisb] = 0;
>> + /* Clear zdev info */
>> + airq_iv_free_bit(aift->sbv, zdev->aisb);
>> + airq_iv_release(zdev->aibv);
>> + zdev->aisb = 0;
>> + zdev->aibv = NULL;
>> + }
>> + spin_unlock_irq(&aift->gait_lock);
>> + kvm_s390_gisc_unregister(kzdev->kvm, isc);
>
> Don't we need to check the return value?
> And maybe to report it to the caller?
Well, actually, I think we really need to look at the
kvm_s390_gisc_register() call during aif_enable -- I unconditionally
assigned it to the fib when in fact it can also return a negative error
value (which I never check for) -- so I will re-arrange the code in
aif_enable() to do that earlier using a local variable and leave on
error in aif_enable if this fails.
kvm_s390_gisc_register() returns 2 possible errors, which are shared
with gisc_unregister -- So with that change we will detect these errors
(not using GISA, bad guest ISC) at aif_enable time.
So then for gisc_unregister we should really only possibly hit the 3rd
error (guest ISC is not registered). And if for some reason we hit that
error at disable time, well, that's weird and unexpected (s390dbf?) but
as far as userspace is concerned the GAITE is cleared and the gisc is
unregistered, so I think we want to return success still to userspace.
But we must do the checking at gisc_register() time and fail for the
other cases there.
Powered by blists - more mailing lists