[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cebcc3de-e332-6381-f450-a6a26ef88182@linux.ibm.com>
Date: Wed, 19 Jan 2022 19:25:10 +0100
From: Pierre Morel <pmorel@...ux.ibm.com>
To: Matthew Rosato <mjrosato@...ux.ibm.com>, linux-s390@...r.kernel.org
Cc: alex.williamson@...hat.com, cohuck@...hat.com,
schnelle@...ux.ibm.com, farman@...ux.ibm.com,
borntraeger@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com,
gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com,
frankja@...ux.ibm.com, david@...hat.com, imbrenda@...ux.ibm.com,
vneethv@...ux.ibm.com, oberpar@...ux.ibm.com, freude@...ux.ibm.com,
thuth@...hat.com, pasic@...ux.ibm.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 21/30] KVM: s390: pci: handle refresh of PCI
translations
On 1/19/22 17:39, Matthew Rosato wrote:
> On 1/19/22 4:29 AM, Pierre Morel wrote:
>>
>>
>> On 1/14/22 21:31, Matthew Rosato wrote:
> ...
>>> +static int dma_table_shadow(struct kvm_vcpu *vcpu, struct zpci_dev
>>> *zdev,
>>> + dma_addr_t dma_addr, size_t size)
>>> +{
>>> + unsigned int nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>>> + struct kvm_zdev *kzdev = zdev->kzdev;
>>> + unsigned long *entry, *gentry;
>>> + int i, rc = 0, rc2;
>>> +
>>> + if (!nr_pages || !kzdev)
>>> + return -EINVAL;
>>> +
>>> + mutex_lock(&kzdev->ioat.lock);
>>> + if (!zdev->dma_table || !kzdev->ioat.head[0]) {
>>> + rc = -EINVAL;
>>> + goto out_unlock;
>>> + }
>>> +
>>> + for (i = 0; i < nr_pages; i++) {
>>> + gentry = dma_walk_guest_cpu_trans(vcpu, &kzdev->ioat,
>>> dma_addr);
>>> + if (!gentry)
>>> + continue;
>>> + entry = dma_walk_cpu_trans(zdev->dma_table, dma_addr);
>>> +
>>> + if (!entry) {
>>> + rc = -ENOMEM;
>>> + goto out_unlock;
>>> + }
>>> +
>>> + rc2 = dma_shadow_cpu_trans(vcpu, entry, gentry);
>>> + if (rc2 < 0) {
>>> + rc = -EIO;
>>> + goto out_unlock;
>>> + }
>>> + dma_addr += PAGE_SIZE;
>>> + rc += rc2;
>>> + }
>>> +
>>
>> In case of error, shouldn't we invalidate the shadow tables entries we
>> did validate until the error?
>
> Hmm, I don't think this is strictly necessary - the status returned
> should indicate the specified DMA range is now in an indeterminate state
> (putting the onus on the guest to take corrective action via a global
> refresh).
>
> In fact I think I screwed that up below in kvm_s390_pci_refresh_trans,
> the fabricated status should always be KVM_S390_RPCIT_INS_RES.
OK
>
>>
>>> +out_unlock:
>>> + mutex_unlock(&kzdev->ioat.lock);
>>> + return rc;
>>> +}
>>> +
>>> +int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long
>>> req,
>>> + unsigned long start, unsigned long size,
>>> + u8 *status)
>>> +{
>>> + struct zpci_dev *zdev;
>>> + u32 fh = req >> 32;
>>> + int rc;
>>> +
>>> + /* Make sure this is a valid device associated with this guest */
>>> + zdev = get_zdev_by_fh(fh);
>>> + if (!zdev || !zdev->kzdev || zdev->kzdev->kvm != vcpu->kvm) {
>>> + *status = 0;
>>
>> Wouldn't it be interesting to add some debug information here.
>> When would this appear?
>
> Yes, I agree -- One of the follow-ons I'd like to add after this series
> is s390dbf entries; this seems like a good spot for one.
>
> As to when this could happen; it should not under normal circumstances,
> but consider something like arbitrary function handles coming from the
> intercepted guest instruction. We need to ensure that the specified
> function 1) exists and 2) is associated with the guest issuing the refresh.
>
>>
>> Also if we have this error this looks like we have a VM problem,
>> shouldn't we treat this in QEMU and return -EOPNOTSUPP ?
>>
>
> Well, I'm not sure if we can really tell where the problem is (it could
> for example indicate a misbehaving guest, or a bug in our KVM tracking
> of hostdevs).
>
> The guest chose the function handle, and if we got here then that means
> it doesn't indicate that it's an emulated device, which means either we
> are using the assist and KVM should handle the intercept or we are not
> and userspace should handle it. But in both of those cases, there
> should be a host device and it should be associated with the guest.
That is right if we can not find an associated zdev = F(fh)
but the two other errors are KVM or QEMU errors AFAIU.
>
> I think if we decide to throw this to userspace in this event, QEMU
> needs some extra code to handle it (basically, if QEMU receives the
> intercept and the device is neither emulated nor using intercept mode
> then we must treat as an invalid handle as this intercept should have
> been handled by KVM)
I do not want to start a discussion on this, I think we can let it like
this at first and come back to it when we have a good idea on how to
handle this.
May be just add a /* TODO */
>
>
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /* Only proceed if the device is using the assist */
>>> + if (zdev->kzdev->ioat.head[0] == 0)
>>> + return -EOPNOTSUPP;
>>> +
>>> + rc = dma_table_shadow(vcpu, zdev, start, size);
>>> + if (rc < 0) {
>>> + /*
>>> + * If errors encountered during shadow operations, we must
>>> + * fabricate status to present to the guest
>>> + */
>>> + switch (rc) {
>>> + case -ENOMEM:
>>> + *status = KVM_S390_RPCIT_INS_RES;
>>> + break;
>>> + default:
>>> + *status = KVM_S390_RPCIT_ERR;
>>> + break;
>
> As mentioned above I think this switch statement should go away and
> instead always set KVM_S390_RPCIT_INS_RES when rc < 0.
>
>>> + }
>>> + } else if (rc > 0) {
>>> + /* Host RPCIT must be issued */
>>> + rc = zpci_refresh_trans((u64) zdev->fh << 32, start, size,
>>> + status);
>>> + }
>>> + zdev->kzdev->rpcit_count++;
>>> +
>>> + return rc;
>>> +}
>>> +
>>> /* Modify PCI: Register floating adapter interruption forwarding */
>>> static int kvm_zpci_set_airq(struct zpci_dev *zdev)
>>> {
>>> @@ -620,6 +822,8 @@ EXPORT_SYMBOL_GPL(kvm_s390_pci_attach_kvm);
>>> int kvm_s390_pci_init(void)
>>> {
>>> + int rc;
>>> +
>>> aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
>>> if (!aift)
>>> return -ENOMEM;
>>> @@ -627,5 +831,7 @@ int kvm_s390_pci_init(void)
>>> spin_lock_init(&aift->gait_lock);
>>> mutex_init(&aift->lock);
>>> - return 0;
>>> + rc = zpci_get_mdd(&aift->mdd);
>>> +
>>> + return rc;
>>> }
>>> diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
>>> index 54355634df82..bb2be7fc3934 100644
>>> --- a/arch/s390/kvm/pci.h
>>> +++ b/arch/s390/kvm/pci.h
>>> @@ -18,6 +18,9 @@
>>> #define KVM_S390_PCI_DTSM_MASK 0x40
>>> +#define KVM_S390_RPCIT_INS_RES 0x10
>>> +#define KVM_S390_RPCIT_ERR 0x28
>>> +
>>> struct zpci_gaite {
>>> u32 gisa;
>>> u8 gisc;
>>> @@ -33,6 +36,7 @@ struct zpci_aift {
>>> struct kvm_zdev **kzdev;
>>> spinlock_t gait_lock; /* Protects the gait, used during AEN
>>> forward */
>>> struct mutex lock; /* Protects the other structures in aift */
>>> + u32 mdd;
>>> };
>>> extern struct zpci_aift *aift;
>>> @@ -47,7 +51,9 @@ static inline struct kvm
>>> *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
>>> int kvm_s390_pci_aen_init(u8 nisc);
>>> void kvm_s390_pci_aen_exit(void);
>>> -
>>> +int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long
>>> req,
>>> + unsigned long start, unsigned long end,
>>> + u8 *status);
>>> int kvm_s390_pci_init(void);
>>> #endif /* __KVM_S390_PCI_H */
>>>
>>
>
--
Pierre Morel
IBM Lab Boeblingen
Powered by blists - more mailing lists