[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b83ad43c-2b5a-4288-aa8e-71ebdeb420eb@oracle.com>
Date: Thu, 19 Oct 2023 22:57:31 -0700
From: Si-Wei Liu <si-wei.liu@...cle.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Eugenio Perez Martin <eperezma@...hat.com>, mst@...hat.com,
xuanzhuo@...ux.alibaba.com, dtatulea@...dia.com,
virtualization@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial
state in .release
On 10/19/2023 9:11 PM, Jason Wang wrote:
> On Fri, Oct 20, 2023 at 6:28 AM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
>>
>>
>> On 10/19/2023 7:39 AM, Eugenio Perez Martin wrote:
>>> On Thu, Oct 19, 2023 at 10:27 AM Jason Wang <jasowang@...hat.com> wrote:
>>>> On Thu, Oct 19, 2023 at 2:47 PM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
>>>>>
>>>>> On 10/18/2023 7:53 PM, Jason Wang wrote:
>>>>>> On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
>>>>>>> On 10/18/2023 12:00 AM, Jason Wang wrote:
>>>>>>>>> Unfortunately, it's a must to stick to ABI. I agree it's a mess but we
>>>>>>>>> don't have a better choice. Or we can fail the probe if userspace
>>>>>>>>> doesn't ack this feature.
>>>>>>>> Antoher idea we can just do the following in vhost_vdpa reset?
>>>>>>>>
>>>>>>>> config->reset()
>>>>>>>> if (IOTLB_PERSIST is not set) {
>>>>>>>> config->reset_map()
>>>>>>>> }
>>>>>>>>
>>>>>>>> Then we don't have the burden to maintain them in the parent?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>> Please see my earlier response in the other email, thanks.
>>>>>>>
>>>>>>> ----------------%<----------------%<----------------
>>>>>>>
>>>>>>> First, the ideal fix would be to leave this reset_vendor_mappings()
>>>>>>> emulation code on the individual driver itself, which already has the
>>>>>>> broken behavior.
>>>>>> So the point is, not about whether the existing behavior is "broken"
>>>>>> or not.
>>>>> Hold on, I thought earlier we all agreed upon that the existing behavior
>>>>> of vendor driver self-clearing maps during .reset violates the vhost
>>>>> iotlb abstraction and also breaks the .set_map/.dma_map API. This is
>>>>> 100% buggy driver implementation itself that we should discourage or
>>>>> eliminate as much as possible (that's part of the goal for this series),
>>>> I'm not saying it's not an issue, what I'm saying is, if the fix
>>>> breaks another userspace, it's a new bug in the kernel. See what Linus
>>>> said in [1]
>>>>
>>>> "If a change results in user programs breaking, it's a bug in the kernel."
>>>>
>>>>> but here you seem to go existentialism and suggests the very opposite
>>>>> that every .set_map/.dma_map driver implementation, regardless being the
>>>>> current or the new/upcoming, should unconditionally try to emulate the
>>>>> broken reset behavior for the sake of not breaking older userspace.
>>>> Such "emulation" is not done at the parent level. New parents just
>>>> need to implement reset_map() or not. everything could be done inside
>>>> vhost-vDPA as pseudo code that is shown above.
>>>>
>>>>> Set
>>>>> aside the criteria and definition for how userspace can be broken, can
>>>>> we step back to the original question why we think it's broken, and what
>>>>> we can do to promote good driver implementation instead of discuss the
>>>>> implementation details?
>>>> I'm not sure I get the point of this question. I'm not saying we don't
>>>> need to fix, what I am saying is that such a fix must be done in a
>>>> negotiable way. And it's better if parents won't get any burden. It
>>>> can just decide to implement reset_map() or not.
>>>>
>>>>> Reading the below response I found my major
>>>>> points are not heard even if written for quite a few times.
>>>> I try my best to not ignore any important things, but I can't promise
>>>> I will not miss any. I hope the above clarifies my points.
>>>>
>>>>> It's not
>>>>> that I don't understand the importance of not breaking old userspace, I
>>>>> appreciate your questions and extra patience, however I do feel the
>>>>> "broken" part is very relevant to our discussion here.
>>>>> If it's broken (in the sense of vhost IOTLB API) that you agree, I think
>>>>> we should at least allow good driver implementations; and when you think
>>>>> about the possibility of those valid good driver cases
>>>>> (.set_map/.dma_map implementations that do not clear maps in .reset),
>>>>> you might be able to see why it's coded the way as it is now.
>>>>>
>>>>>> It's about whether we could stick to the old behaviour without
>>>>>> too much cost. And I believe we could.
>>>>>>
>>>>>> And just to clarify here, reset_vendor_mappings() = config->reset_map()
>>>>>>
>>>>>>> But today there's no backend feature negotiation
>>>>>>> between vhost-vdpa and the parent driver. Do we want to send down the
>>>>>>> acked_backend_features to parent drivers?
>>>>>> There's no need to do that with the above code, or anything I missed here?
>>>>>>
>>>>>> config->reset()
>>>>>> if (IOTLB_PERSIST is not set) {
>>>>>> config->reset_map()
>>>>>> }
>>>>> Implementation issue: this implies reset_map() has to be there for every
>>>>> .set_map implementations, but vendor driver implementation for custom
>>>>> IOMMU could well implement DMA ops by itself instead of .reset_map. This
>>>>> won't work for every set_map driver (think about the vduse case).
>>>> Well let me do it once again, reset_map() is not mandated:
>>>>
>>>> config->reset()
>>>> if (IOTLB_PERSIST is not set) {
>>>> if (config->reset_map)
>>>> config->reset_map()
>>> To avoid new parent drivers
>> I am afraid it's not just new parent drivers, but any well behaved
>> driver today may well break userspace if go with this forced emulation
>> code, if they have to implement reset_map for some reason (e.g. restored
>> to 1:1 passthrough mapping or other default state in mapping). For new
>> userspace and user driver we can guard against it using the
>> IOTLB_PERSIST flag, but the above code would get a big chance to break
>> setup with good driver and older userspace in practice.
>>
>> And .reset_map implementation doesn't necessarily need to clear maps.
>> For e.g. IOMMU API compliant driver that only needs simple DMA model for
>> passthrough, all .reset_map has to do is toggle to 1:1 mapping mode to
>> the default/initial state without taking care of maps, as
>> vhost_vdpa_unmap(0, -1ULL) earlier should have done the map cleaning job
>> already.
> Ok, finally, it takes me a while to understand the issue :)
>
> Actually, there are two things:
>
> 1) stick the IOTLB mappings across the reset
> 2) reset the vendor specific mappings to whatever the parent think
> it's comfort (like 1:1)
Yep, maybe I need to document this expectation more clearly, but I found
it a bit hard to describe what 2) is really about (tried to avoid the
specifics, like 1:1, as that wording seems not so welcomed).
>
> So I think my suggestion doesn't work.
>
>>
>>> to have this behavior if they need to
>>> implement reset_map,
>>>
>>> What if we add a new callback like "config->buggy_virtio_reset_map",
>>> different from regular reset_map callback at cleanup?
>> Right, separating out the need for old behavior emulation from
>> .reset_map is much cleaner, and this is what individual broken driver
>> has to maintain without penalizing other good drivers. Good to see what
>> I said earlier is heard.
>>
>>> Only mlx5 and
>>> vdpa_sim need to implement it, with a big warning, and new parent
>>> drivers can trust they'll never have the old bad behavior.
>> Let's see what Jason will say about it and try to converge on this
>> first, I think he seemed to imply that this is part of ABI that every
>> driver has to make compromise for. I'd better get it ack'ed before
>> proceeding to the rest.
> Thanks for your patience.
>
> I think we have some choices:
>
> (All of the below can work, but we need to choose the best)
>
> 1) module parameter: this turns out to be hard for the management as
> it requires the subtle knowledge of a specific user space which turns
> out to be hard
> 2) buggy_virtio_reset_map: seems like somehow a pollution of the
> config ops, I think we can do this only if we have other choice
> 3) set_backend_features: I understand the concern that we should not
> propagate the vhost level feature to parent, the reason is most of
> them are irrelevant to the parent. I think the right way is to
> introduce get_parent_features()/set_parent_features() then we can
> choose to map some parent feature to vhost like
> (ENALBE_AFTER_DRIVER_OK and IOTLB_PERSIST)
> 4) piggy backing whether we need to clean vendor specific IOTLB in
> config->reset(bool clean_map)
Both 3) and 4) should work with me. For 3) we already have the get_*()
part, though I'm not sure if worth to bother introducing the set_*()
API; today I think most parent drivers work directly with the driver
config op, instead of backend feature flag . 4) is actually closer to
what I had in mind (was thinking of a flag for future extension instead
of bool). But the document has to make it very clear that the use of
clean_map is limited to backward compatibility for old behavior, and new
driver should not bother to implement (as it violates the
.set_map/.dma_map IOMMU API).
Thanks,
-Siwei
>
> Siwei, Eugenio, what's your opinion here?
>
> Thanks
>
>> Thanks,
>> -Siwei
>>
>>>> }
>>>>
>>>> Did you see any issue with VDUSE in this case?
>>>>
>>>>> But this is not the the point I was making. I think if you agree this is
>>>>> purely buggy driver implementation of its own, we should try to isolate
>>>>> this buggy behavior to individual driver rather than overload vhost-vdpa
>>>>> or vdpa core's role to help implement the emulation of broken driver
>>>>> behavior.
>>>> As I pointed out, if it is not noticeable in the userspace, that's
>>>> fine but it's not.
>>>>
>>>>> I don't get why .reset is special here, the abuse of .reset to
>>>>> manipulate mapping could also happen in other IOMMU unrelated driver
>>>>> entries like in .suspend, or in queue_reset.
>>>> Who can abuse reset here? It is totally under the control of
>>>> vhost-vDPA and it's not visible to uAPI. And we can fully control the
>>>> behaviour of vhost-vDPA.
>>>>
>>>>> If someday userspace is
>>>>> found coded around similar buggy driver implementation in other driver
>>>>> ops, do we want to follow and duplicate the same emulation in vdpa core
>>>>> as the precedent is already set here around .reset?
>>>> I think so, have you seen the links I give you? If you want to go
>>>> through the one from Linus thread[1], you can see the one that unbreak
>>>> virtio-IOMMU[2]:
>>>>
>>>> 1) Someday, we spot invalidate with size 0 is a bug
>>>> 2) We fix this bug by not allowing this
>>>> 3) But virtio-IOMMU userspace find that size 0 actually clean all the
>>>> IOTLB so it depends on the behaviour
>>>> 4) So the virtio-IOMMU userspace find it can't work after 2)
>>>> 5) Then we recover the behaviour before 2) via [2]
>>>>
>>>> Another example is the IOTLB_MSG_V2, V1 suffers from in-stable ABI in
>>>> 32bit archs, most of the userspace survives since it never runs on
>>>> 32bit archs. The fix is to introduce a V2 but we will stick to V1 by
>>>> default if V2 is not acknowledged by the userspace.
>>>>
>>>> I think the above 2 examples are sufficient for us to understand the
>>>> case. If not, I can help to clarify more since I'm involved in those 2
>>>> fixes.
>>>>
>>>>> The buggy driver can fail in a lot of other ways indefinitely during
>>>>> reset, if there's a buggy driver that's already broken the way as how it
>>>>> is and happens to survive with all userspace apps, we just don't care
>>>>> and let it be.
>>>> Without IOTLB_PRESIST it doesn't break. With IOTLB_PERSIST and if the
>>>> reset_map() is done unconditionally, it can break. That's my point.
>>>>
>>>>> There's no way we can enumerate all those buggy behaviors
>>>>> in .reset_map itself, it's overloading that driver API too much.
>>>> If it is not noticeable by userspace, we can do any fix at will. But
>>>> it is not, we don't have another choice. Especially considering the
>>>> cost is rather low.
>>>>
>>>>>>> Second, IOTLB_PERSIST is needed but not sufficient. Due to lack of
>>>>>>> backend feature negotiation in parent driver, if vhost-vdpa has to
>>>>>>> provide the old-behaviour emulation for compatibility on driver's
>>>>>>> behalf, it needs to be done per-driver basis. There could be good
>>>>>>> on-chip or vendor IOMMU implementation which doesn't clear the IOTLB in
>>>>>>> .reset, and vendor specific IOMMU doesn't have to provide .reset_map,
>>>>>> Then we just don't offer IOTLB_PRESIST, isn't this by design?
>>>>> Think about the vduse case, it can work with DMA ops directly so doesn't
>>>>> have to implement .reset_map, unless for some specific good reason.
>>>>> Because it's a conforming and valid/good driver implementation, we may
>>>>> still allow it to advertise IOTLB_PERSIST to userspace.
>>>> I would like to know why this can't work in this case:
>>>>
>>>> config->reset()
>>>> if (IOTLB_PERSIST is not set) {
>>>> if (config->reset_map)
>>>> config->reset_map()
>>>> }
>>>>
>>>>> Which belongs to
>>>>> the 3rd bullet below:
>>>>>
>>>>> https://lore.kernel.org/virtualization/1696928580-7520-4-git-send-email-si-wei.liu@oracle.com/
>>>>>
>>>>> There are 3 cases that backend may claim this feature bit on:
>>>>>
>>>>> - parent device that has to work with platform IOMMU
>>>>> - parent device with on-chip IOMMU that has the expected
>>>>> .reset_map support in driver
>>>>> - parent device with vendor specific IOMMU implementation
>>>>> that explicitly declares the specific backend feature
>>>>>
>>>>>>> we
>>>>>>> should allow these good driver implementations rather than
>>>>>>> unconditionally stick to some specific problematic behavior for every
>>>>>>> other good driver.
>>>>>> Then you can force reset_map() with set_map() that is what I suggest
>>>>>> in another thread, no?
>>>>> This is exactly what I was afraid of that broken behavior emulation may
>>>>> become a dangerous slippery slope - in principle we should encourage
>>>>> good driver implementation, as they can work totally fine with older
>>>>> userspace. Why do they have to bother emulating broken behavior just
>>>>> because some other driver's misbehaving?
>>>> Please read the link [1], Linus has explained it.
>>>>
>>>>> And what's the boundary for
>>>>> this hack, do drivers backed by platform IOMMU even have to emulate (if
>>>>> not why not, and is there substantial difference in between)?
>>>> The boundary is whether the behaviour change could be noticed but
>>>> userspace. And I've shown you it's not a burden with the pseudo codes.
>>>> If not, please explain why.
>>>>
>>>>> After
>>>>> getting through all of this, do you still believe everything is just as
>>>>> easy and simple as what thought to be?
>>>> The truth is that bugs exist everywhere. We can't promise there's no
>>>> bug when developing an uAPI or subsystem. For kernel code, the bug
>>>> that touches uAPI might be fixed in a way that doesn't break existing
>>>> userspace. If you look at how downstream to maintain kABI, you will be
>>>> supersized furtherly.
>>>>
>>>>> Btw, I thought I was expecting but still haven't got the clear answers
>>>>> to what was the goal to do all this, we spent a lot of time trying to
>>>>> unbreak userspace,
>>>> The code is pretty simple. But yes, the time spent on justifying it
>>>> might take some time. That's the community. People need time to
>>>> understand each other's points.
>>>>
>>>>> but looks to me as if we were trying every possible
>>>>> way to break userspace
>>>> How could my suggestions break a userspace?
>>>>
>>>>> or try to approximate to the same brokenness
>>>>> mlx5_vdpa may have caused to the userspace. What we will get eventually
>>>>> from these lengthy discussions?
>>>> Siwei, I'd really suggest you read the link I gave you. You may get
>>>> the answer. What's more, It doesn't cost too much then we know for
>>>> sure there would not be any issue, why not choose the hard way?
>>>>
>>>>> On the other hand, if you think it from
>>>>> vhost-vdpa user perspective, you'll clearly see there's just a couple of
>>>>> ways to unbreak userspace from the internal broken map which is out of
>>>>> sync with vhost-vdpa iotlb after device reset.
>>>> Patches are more than welcomed.
>>>>
>>>>> If this brokenness was
>>>>> something universally done from the vhost-vdpa layer itself, I'd feel
>>>>> it's more of a shared problem, but this is not the case I see it here.
>>>>> While the long standing mlx5_vdpa/vdpa_sim issue is 100% misuse of
>>>>> .reset op in a wrong way per IOMMU API definition. Why leaving this
>>>>> discrepancy to the individual driver is not even an option, I'm still
>>>>> not sure?
>>>> Sorry? I start with a switch in the driver, and then I try to avoid
>>>> that. And it seems you don't want a burden on the driver as well.
>>>> Where did you see I say we can't do that in the driver? What I
>>>> disagree with is to use a module parameter.
>>>>
>>>> Even if I fail, it doesn't mean we can't do that in the driver code.
>>>> If you read the link[1] you can see the offending commit is a change
>>>> in uvcvideo driver.
>>>>
>>>> Thanks
>>>>
>>>>> Thanks,
>>>>> -Siwei
>>>>>
>>>>>>> Then we need a set of device flags (backend_features
>>>>>>> bit again?) to indicate the specific driver needs upper layer's help on
>>>>>>> old-behaviour emulation.
>>>>>>>
>>>>>>> Last but not least, I'm not sure how to properly emulate
>>>>>>> reset_vendor_mappings() from vhost-vdpa layer. If a vendor driver has no
>>>>>>> .reset_map op implemented, or if .reset_map has a slightly different
>>>>>>> implementation than what it used to reset the iotlb in the .reset op,
>>>>>> See above, for reset_vendor_mappings() I meant config->reset_map() exactly.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>> then this either becomes effectively dead code if no one ends up using,
>>>>>>> or the vhost-vdpa emulation is helpless and limited in scope, unable to
>>>>>>> cover all the cases.
>>>>>>>
>>>>>>> ----------------%<----------------%<----------------
>>>>>>>
Powered by blists - more mailing lists