lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f118fc9-4f6f-dd67-a291-be78152e47fd@oracle.com>
Date:   Wed, 16 Aug 2023 16:43:52 -0700
From:   Si-Wei Liu <si-wei.liu@...cle.com>
To:     Jason Wang <jasowang@...hat.com>
Cc:     eperezma@...hat.com, gal@...dia.com, linux-kernel@...r.kernel.org,
        mst@...hat.com, virtualization@...ts.linux-foundation.org,
        xuanzhuo@...ux.alibaba.com
Subject: Re: [PATCH RFC 4/4] vhost-vdpa: introduce IOTLB_PERSIST backend
 feature bit



On 8/15/2023 6:48 PM, Jason Wang wrote:
> On Wed, Aug 16, 2023 at 6:31 AM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
>>
>>
>> On 8/14/2023 7:25 PM, Jason Wang wrote:
>>> On Tue, Aug 15, 2023 at 9:45 AM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@...cle.com>
>>>> ---
>>>>    drivers/vhost/vdpa.c             | 16 +++++++++++++++-
>>>>    include/uapi/linux/vhost_types.h |  2 ++
>>>>    2 files changed, 17 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>>> index 62b0a01..75092a7 100644
>>>> --- a/drivers/vhost/vdpa.c
>>>> +++ b/drivers/vhost/vdpa.c
>>>> @@ -406,6 +406,14 @@ static bool vhost_vdpa_can_resume(const struct vhost_vdpa *v)
>>>>           return ops->resume;
>>>>    }
>>>>
>>>> +static bool vhost_vdpa_has_persistent_map(const struct vhost_vdpa *v)
>>>> +{
>>>> +       struct vdpa_device *vdpa = v->vdpa;
>>>> +       const struct vdpa_config_ops *ops = vdpa->config;
>>>> +
>>>> +       return (!ops->set_map && !ops->dma_map) || ops->reset_map;
>>> So this means the IOTLB/IOMMU mappings have already been decoupled
>>> from the vdpa reset.
>> Not in the sense of API, it' been coupled since day one from the
>> implementations of every on-chip IOMMU parent driver, namely mlx5_vdpa
>> and vdpa_sim. Because of that, later on the (improper) support for
>> virtio-vdpa, from commit 6f5312f80183 ("vdpa/mlx5: Add support for
>> running with virtio_vdpa") and 6c3d329e6486 ("vdpa_sim: get rid of DMA
>> ops") misused the .reset() op to realize 1:1 mapping, rendering strong
>> coupling between device reset and reset of iotlb mappings. This series
>> try to rectify that implementation deficiency, while keep userspace
>> continuing to work with older kernel behavior.
>>
>>>    So it should have been noticed by the userspace.
>> Yes, userspace had noticed this no-chip IOMMU discrepancy since day one
>> I suppose. Unfortunately there's already code in userspace with this
>> assumption in mind that proactively tears down and sets up iotlb mapping
>> around vdpa device reset...
>>> I guess we can just fix the simulator and mlx5 then we are fine?
>> Only IF we don't care about running new QEMU on older kernels with
>> flawed on-chip iommu behavior around reset. But that's a big IF...
> So what I meant is:
>
> Userspace doesn't know whether the vendor specific mappings (set_map)
> are required or not. And in the implementation of vhost_vdpa, if
> platform IOMMU is used, the mappings are decoupled from the reset. So
> if the Qemu works with parents with platform IOMMU it means Qemu can
> work if we just decouple vendor specific mappings from the parents
> that uses set_map.
I was aware of this, and if you may notice I don't even offer a way 
backward to retain/emulate the flawed vhost-iotlb reset behavior for 
older userspace - I consider it more of a bug in .set_map driver 
implementation of its own rather than what the vhost-vdpa iotlb 
abstraction wishes to expose to userspace in the first place.

If you ever look into QEMU's vhost_vdpa_reset_status() function, you may 
see memory_listener_unregister() will be called to evict all of the 
existing iotlb mappings right after vhost_vdpa_reset_device() across 
device reset, and later on at vhost_vdpa_dev_start(), 
memory_listener_register() will set up all iotlb mappings again. In an 
ideal world without this on-chip iommu deficiency QEMU should not have 
to behave this way - this is what I mentioned earlier that userspace had 
already noticed the discrepancy and it has to "proactively tear down and 
set up iotlb mapping around vdpa device reset". Apparently from 
functionality perspective this trick works completely fine with platform 
IOMMU, however, it's sub-optimal in the performance perspective.

We can't simply fix QEMU by moving this memory_listener_unregister() 
call out of the reset path unconditionally, as we don't want to break 
the already-functioning older kernel even though it's suboptimal in 
performance. Instead, to keep new QEMU continuing to work on top of the 
existing or older kernels, QEMU has to check this IOTLB_PERSIST feature 
flag to decide whether it is safe not to bother flushing and setting up 
iotlb across reset. For the platform IOMMU case, vdpa parent driver 
won't implement either the .set_map or .dma_map op, so it should be 
covered in the vhost_vdpa_has_persistent_map() check I suppose.


Thanks,
-Siwei
> Thanks
>
>> Regards,
>> -Siwei
>>> Thanks
>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ