linux-kernel - Re: [PATCH v2 21/30] KVM: s390: pci: handle refresh of PCI translations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cebcc3de-e332-6381-f450-a6a26ef88182@linux.ibm.com>
Date:   Wed, 19 Jan 2022 19:25:10 +0100
From:   Pierre Morel <pmorel@...ux.ibm.com>
To:     Matthew Rosato <mjrosato@...ux.ibm.com>, linux-s390@...r.kernel.org
Cc:     alex.williamson@...hat.com, cohuck@...hat.com,
        schnelle@...ux.ibm.com, farman@...ux.ibm.com,
        borntraeger@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com,
        gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com,
        frankja@...ux.ibm.com, david@...hat.com, imbrenda@...ux.ibm.com,
        vneethv@...ux.ibm.com, oberpar@...ux.ibm.com, freude@...ux.ibm.com,
        thuth@...hat.com, pasic@...ux.ibm.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 21/30] KVM: s390: pci: handle refresh of PCI
 translations



On 1/19/22 17:39, Matthew Rosato wrote:
> On 1/19/22 4:29 AM, Pierre Morel wrote:
>>
>>
>> On 1/14/22 21:31, Matthew Rosato wrote:
> ...
>>> +static int dma_table_shadow(struct kvm_vcpu *vcpu, struct zpci_dev 
>>> *zdev,
>>> +                dma_addr_t dma_addr, size_t size)
>>> +{
>>> +    unsigned int nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>>> +    struct kvm_zdev *kzdev = zdev->kzdev;
>>> +    unsigned long *entry, *gentry;
>>> +    int i, rc = 0, rc2;
>>> +
>>> +    if (!nr_pages || !kzdev)
>>> +        return -EINVAL;
>>> +
>>> +    mutex_lock(&kzdev->ioat.lock);
>>> +    if (!zdev->dma_table || !kzdev->ioat.head[0]) {
>>> +        rc = -EINVAL;
>>> +        goto out_unlock;
>>> +    }
>>> +
>>> +    for (i = 0; i < nr_pages; i++) {
>>> +        gentry = dma_walk_guest_cpu_trans(vcpu, &kzdev->ioat, 
>>> dma_addr);
>>> +        if (!gentry)
>>> +            continue;
>>> +        entry = dma_walk_cpu_trans(zdev->dma_table, dma_addr);
>>> +
>>> +        if (!entry) {
>>> +            rc = -ENOMEM;
>>> +            goto out_unlock;
>>> +        }
>>> +
>>> +        rc2 = dma_shadow_cpu_trans(vcpu, entry, gentry);
>>> +        if (rc2 < 0) {
>>> +            rc = -EIO;
>>> +            goto out_unlock;
>>> +        }
>>> +        dma_addr += PAGE_SIZE;
>>> +        rc += rc2;
>>> +    }
>>> +
>>
>> In case of error, shouldn't we invalidate the shadow tables entries we 
>> did validate until the error?
> 
> Hmm, I don't think this is strictly necessary - the status returned 
> should indicate the specified DMA range is now in an indeterminate state 
> (putting the onus on the guest to take corrective action via a global 
> refresh).
> 
> In fact I think I screwed that up below in kvm_s390_pci_refresh_trans, 
> the fabricated status should always be KVM_S390_RPCIT_INS_RES.

OK

> 
>>
>>> +out_unlock:
>>> +    mutex_unlock(&kzdev->ioat.lock);
>>> +    return rc;
>>> +}
>>> +
>>> +int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long 
>>> req,
>>> +                   unsigned long start, unsigned long size,
>>> +                   u8 *status)
>>> +{
>>> +    struct zpci_dev *zdev;
>>> +    u32 fh = req >> 32;
>>> +    int rc;
>>> +
>>> +    /* Make sure this is a valid device associated with this guest */
>>> +    zdev = get_zdev_by_fh(fh);
>>> +    if (!zdev || !zdev->kzdev || zdev->kzdev->kvm != vcpu->kvm) {
>>> +        *status = 0;
>>
>> Wouldn't it be interesting to add some debug information here.
>> When would this appear?
> 
> Yes, I agree -- One of the follow-ons I'd like to add after this series 
> is s390dbf entries; this seems like a good spot for one.
> 
> As to when this could happen; it should not under normal circumstances, 
> but consider something like arbitrary function handles coming from the 
> intercepted guest instruction.  We need to ensure that the specified 
> function 1) exists and 2) is associated with the guest issuing the refresh.
> 
>>
>> Also if we have this error this looks like we have a VM problem, 
>> shouldn't we treat this in QEMU and return -EOPNOTSUPP ?
>>
> 
> Well, I'm not sure if we can really tell where the problem is (it could 
> for example indicate a misbehaving guest, or a bug in our KVM tracking 
> of hostdevs).
> 
> The guest chose the function handle, and if we got here then that means 
> it doesn't indicate that it's an emulated device, which means either we 
> are using the assist and KVM should handle the intercept or we are not 
> and userspace should handle it.  But in both of those cases, there 
> should be a host device and it should be associated with the guest.

That is right if we can not find an associated zdev = F(fh)
but the two other errors are KVM or QEMU errors AFAIU.

> 
> I think if we decide to throw this to userspace in this event, QEMU 
> needs some extra code to handle it (basically, if QEMU receives the 
> intercept and the device is neither emulated nor using intercept mode 
> then we must treat as an invalid handle as this intercept should have 
> been handled by KVM)

I do not want to start a discussion on this, I think we can let it like 
this at first and come back to it when we have a good idea on how to 
handle this.
May be just add a /* TODO */


> 
> 
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /* Only proceed if the device is using the assist */
>>> +    if (zdev->kzdev->ioat.head[0] == 0)
>>> +        return -EOPNOTSUPP;
>>> +
>>> +    rc = dma_table_shadow(vcpu, zdev, start, size);
>>> +    if (rc < 0) {
>>> +        /*
>>> +         * If errors encountered during shadow operations, we must
>>> +         * fabricate status to present to the guest
>>> +         */
>>> +        switch (rc) {
>>> +        case -ENOMEM:
>>> +            *status = KVM_S390_RPCIT_INS_RES;
>>> +            break;
>>> +        default:
>>> +            *status = KVM_S390_RPCIT_ERR;
>>> +            break;
> 
> As mentioned above I think this switch statement should go away and 
> instead always set KVM_S390_RPCIT_INS_RES when rc < 0.
> 
>>> +        }
>>> +    } else if (rc > 0) {
>>> +        /* Host RPCIT must be issued */
>>> +        rc = zpci_refresh_trans((u64) zdev->fh << 32, start, size,
>>> +                    status);
>>> +    }
>>> +    zdev->kzdev->rpcit_count++;
>>> +
>>> +    return rc;
>>> +}
>>> +
>>>   /* Modify PCI: Register floating adapter interruption forwarding */
>>>   static int kvm_zpci_set_airq(struct zpci_dev *zdev)
>>>   {
>>> @@ -620,6 +822,8 @@ EXPORT_SYMBOL_GPL(kvm_s390_pci_attach_kvm);
>>>   int kvm_s390_pci_init(void)
>>>   {
>>> +    int rc;
>>> +
>>>       aift = kzalloc(sizeof(struct zpci_aift), GFP_KERNEL);
>>>       if (!aift)
>>>           return -ENOMEM;
>>> @@ -627,5 +831,7 @@ int kvm_s390_pci_init(void)
>>>       spin_lock_init(&aift->gait_lock);
>>>       mutex_init(&aift->lock);
>>> -    return 0;
>>> +    rc = zpci_get_mdd(&aift->mdd);
>>> +
>>> +    return rc;
>>>   }
>>> diff --git a/arch/s390/kvm/pci.h b/arch/s390/kvm/pci.h
>>> index 54355634df82..bb2be7fc3934 100644
>>> --- a/arch/s390/kvm/pci.h
>>> +++ b/arch/s390/kvm/pci.h
>>> @@ -18,6 +18,9 @@
>>>   #define KVM_S390_PCI_DTSM_MASK 0x40
>>> +#define KVM_S390_RPCIT_INS_RES 0x10
>>> +#define KVM_S390_RPCIT_ERR 0x28
>>> +
>>>   struct zpci_gaite {
>>>       u32 gisa;
>>>       u8 gisc;
>>> @@ -33,6 +36,7 @@ struct zpci_aift {
>>>       struct kvm_zdev **kzdev;
>>>       spinlock_t gait_lock; /* Protects the gait, used during AEN 
>>> forward */
>>>       struct mutex lock; /* Protects the other structures in aift */
>>> +    u32 mdd;
>>>   };
>>>   extern struct zpci_aift *aift;
>>> @@ -47,7 +51,9 @@ static inline struct kvm 
>>> *kvm_s390_pci_si_to_kvm(struct zpci_aift *aift,
>>>   int kvm_s390_pci_aen_init(u8 nisc);
>>>   void kvm_s390_pci_aen_exit(void);
>>> -
>>> +int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned long 
>>> req,
>>> +                   unsigned long start, unsigned long end,
>>> +                   u8 *status);
>>>   int kvm_s390_pci_init(void);
>>>   #endif /* __KVM_S390_PCI_H */
>>>
>>
> 

-- 
Pierre Morel
IBM Lab Boeblingen