linux-kernel - Re: [PATCH v2 21/30] KVM: s390: pci: handle refresh of PCI translations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dcfe1661-0779-6c50-6126-e9d0609e3911@linux.ibm.com>
Date:   Thu, 20 Jan 2022 10:47:30 +0100
From:   Pierre Morel <pmorel@...ux.ibm.com>
To:     Matthew Rosato <mjrosato@...ux.ibm.com>, linux-s390@...r.kernel.org
Cc:     alex.williamson@...hat.com, cohuck@...hat.com,
        schnelle@...ux.ibm.com, farman@...ux.ibm.com,
        borntraeger@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com,
        gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com,
        frankja@...ux.ibm.com, david@...hat.com, imbrenda@...ux.ibm.com,
        vneethv@...ux.ibm.com, oberpar@...ux.ibm.com, freude@...ux.ibm.com,
        thuth@...hat.com, pasic@...ux.ibm.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 21/30] KVM: s390: pci: handle refresh of PCI
 translations



On 1/19/22 21:02, Matthew Rosato wrote:
> On 1/19/22 1:25 PM, Pierre Morel wrote:
>>
>>
>> On 1/19/22 17:39, Matthew Rosato wrote:
>>> On 1/19/22 4:29 AM, Pierre Morel wrote:
>>>>
>>>>
>>>> On 1/14/22 21:31, Matthew Rosato wrote:
>>> ...
>>>>> +static int dma_table_shadow(struct kvm_vcpu *vcpu, struct zpci_dev 
>>>>> *zdev,
>>>>> +                dma_addr_t dma_addr, size_t size)
>>>>> +{
>>>>> +    unsigned int nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
>>>>> +    struct kvm_zdev *kzdev = zdev->kzdev;
>>>>> +    unsigned long *entry, *gentry;
>>>>> +    int i, rc = 0, rc2;
>>>>> +
>>>>> +    if (!nr_pages || !kzdev)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    mutex_lock(&kzdev->ioat.lock);
>>>>> +    if (!zdev->dma_table || !kzdev->ioat.head[0]) {
>>>>> +        rc = -EINVAL;
>>>>> +        goto out_unlock;
>>>>> +    }
>>>>> +
>>>>> +    for (i = 0; i < nr_pages; i++) {
>>>>> +        gentry = dma_walk_guest_cpu_trans(vcpu, &kzdev->ioat, 
>>>>> dma_addr);
>>>>> +        if (!gentry)
>>>>> +            continue;
>>>>> +        entry = dma_walk_cpu_trans(zdev->dma_table, dma_addr);
>>>>> +
>>>>> +        if (!entry) {
>>>>> +            rc = -ENOMEM;
>>>>> +            goto out_unlock;
>>>>> +        }
>>>>> +
>>>>> +        rc2 = dma_shadow_cpu_trans(vcpu, entry, gentry);
>>>>> +        if (rc2 < 0) {
>>>>> +            rc = -EIO;
>>>>> +            goto out_unlock;
>>>>> +        }
>>>>> +        dma_addr += PAGE_SIZE;
>>>>> +        rc += rc2;
>>>>> +    }
>>>>> +
>>>>
>>>> In case of error, shouldn't we invalidate the shadow tables entries 
>>>> we did validate until the error?
>>>
>>> Hmm, I don't think this is strictly necessary - the status returned 
>>> should indicate the specified DMA range is now in an indeterminate 
>>> state (putting the onus on the guest to take corrective action via a 
>>> global refresh).
>>>
>>> In fact I think I screwed that up below in 
>>> kvm_s390_pci_refresh_trans, the fabricated status should always be 
>>> KVM_S390_RPCIT_INS_RES.
>>
>> OK
>>
>>>
>>>>
>>>>> +out_unlock:
>>>>> +    mutex_unlock(&kzdev->ioat.lock);
>>>>> +    return rc;
>>>>> +}
>>>>> +
>>>>> +int kvm_s390_pci_refresh_trans(struct kvm_vcpu *vcpu, unsigned 
>>>>> long req,
>>>>> +                   unsigned long start, unsigned long size,
>>>>> +                   u8 *status)
>>>>> +{
>>>>> +    struct zpci_dev *zdev;
>>>>> +    u32 fh = req >> 32;
>>>>> +    int rc;
>>>>> +
>>>>> +    /* Make sure this is a valid device associated with this guest */
>>>>> +    zdev = get_zdev_by_fh(fh);
>>>>> +    if (!zdev || !zdev->kzdev || zdev->kzdev->kvm != vcpu->kvm) {
>>>>> +        *status = 0;
>>>>
>>>> Wouldn't it be interesting to add some debug information here.
>>>> When would this appear?
>>>
>>> Yes, I agree -- One of the follow-ons I'd like to add after this 
>>> series is s390dbf entries; this seems like a good spot for one.
>>>
>>> As to when this could happen; it should not under normal 
>>> circumstances, but consider something like arbitrary function handles 
>>> coming from the intercepted guest instruction.  We need to ensure 
>>> that the specified function 1) exists and 2) is associated with the 
>>> guest issuing the refresh.
>>>
>>>>
>>>> Also if we have this error this looks like we have a VM problem, 
>>>> shouldn't we treat this in QEMU and return -EOPNOTSUPP ?
>>>>
>>>
>>> Well, I'm not sure if we can really tell where the problem is (it 
>>> could for example indicate a misbehaving guest, or a bug in our KVM 
>>> tracking of hostdevs).
>>>
>>> The guest chose the function handle, and if we got here then that 
>>> means it doesn't indicate that it's an emulated device, which means 
>>> either we are using the assist and KVM should handle the intercept or 
>>> we are not and userspace should handle it.  But in both of those 
>>> cases, there should be a host device and it should be associated with 
>>> the guest.
>>
>> That is right if we can not find an associated zdev = F(fh)
>> but the two other errors are KVM or QEMU errors AFAIU.
> 
> I don't think we know for sure for any of the cases...  For a 
> well-behaved guest I agree with your assessment.  However, the guest 
> decides what fh to put into its refresh instruction and so a misbehaving 
> guest could just pick arbitrary numbers for fh and circumstantially 
> match some other host device.  What if the guest just decided to try 
> every single possible fh number in a loop with a refresh instruction? 
> That's neither KVM nor QEMU's fault but can trip each of these cases.
> 
> Consider the different cases:
> 
> !zdev - Either the guest provided a bogus fh, KVM provided a bad fh via 
> the VFIO ioctl which then QEMU fed into CLP or KVM provided the right fh 
> via ioctl but QEMU clobbered it when providing it to the guest via CLP.
> 
> !zdev->kzdev - Either the guest provided a bogus fh that just so 
> happened to match a host fh that has no KVM association, or KVM or QEMU 
> screwed up somewhere (as above or because we failed to make the KVM 
> assocation somehow)
> 
> kzdev->kvm != vcpu->kvm - Pretty much the same as above, but the 
> matching device is actually in use by some other guest.  Again it's 
> possible the a misbehaving guest 'got lucky' with an arbitrary fh that 
> happened to match a host fh with an existing KVM association -- or more 
> likely that KVM or QEMU screwed up somewhere.

OK, I understand and you are right, my error was to consider that 
get_zdev_by_fh() returns a zdev associated with a valid FH for the guest 
while it returns a zdev associated with a valid FH for the host.

If the comment would have been after the get_zdev_by_fh() and before the 
test I may be wouldn't have done this mistake.

> 
>>
>>>
>>> I think if we decide to throw this to userspace in this event, QEMU 
>>> needs some extra code to handle it (basically, if QEMU receives the 
>>> intercept and the device is neither emulated nor using intercept mode 
>>> then we must treat as an invalid handle as this intercept should have 
>>> been handled by KVM)
>>
>> I do not want to start a discussion on this, I think we can let it 
>> like this at first and come back to it when we have a good idea on how 
>> to handle this.
>> May be just add a /* TODO */
> 
> OK, sure.  In any of the above cases, we are certainly done in KVM 
> anyway.  Whether there's value in passing it onto userspace vs 
> immediately giving an error, let's think about it.

No, I do not think we should anymore.
Sorry for this wrong idea.

-- 
Pierre Morel
IBM Lab Boeblingen