lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEs0W1joaNh0-a27Nekxn8V8CmGgr99p+r60dA6sQeys5g@mail.gmail.com>
Date:   Tue, 17 Oct 2023 10:35:25 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Si-Wei Liu <si-wei.liu@...cle.com>
Cc:     Eugenio Perez Martin <eperezma@...hat.com>, mst@...hat.com,
        xuanzhuo@...ux.alibaba.com, dtatulea@...dia.com,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial
 state in .release

On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
>
>
>
> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
> > On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@...hat.com> wrote:
> >> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
> >>>
> >>>
> >>> On 10/12/2023 8:01 PM, Jason Wang wrote:
> >>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@...cle.com> wrote:
> >>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> >>>>> may need to restore iotlb mapping to the initial or default state
> >>>>> using the .reset_map op, as it's desirable for some parent devices
> >>>>> to solely manipulate mappings by its own, independent of virtio device
> >>>>> state. For instance, device reset does not cause mapping go away on
> >>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> >>>>> is going away, give them a chance to reset iotlb back to the initial
> >>>>> state in vhost_vdpa_cleanup().
> >>>>>
> >>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@...cle.com>
> >>>>> ---
> >>>>>    drivers/vhost/vdpa.c | 16 ++++++++++++++++
> >>>>>    1 file changed, 16 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>>>> index 851535f..a3f8160 100644
> >>>>> --- a/drivers/vhost/vdpa.c
> >>>>> +++ b/drivers/vhost/vdpa.c
> >>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> >>>>>           return vhost_vdpa_alloc_as(v, asid);
> >>>>>    }
> >>>>>
> >>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> >>>>> +{
> >>>>> +       struct vdpa_device *vdpa = v->vdpa;
> >>>>> +       const struct vdpa_config_ops *ops = vdpa->config;
> >>>>> +
> >>>>> +       if (ops->reset_map)
> >>>>> +               ops->reset_map(vdpa, asid);
> >>>>> +}
> >>>>> +
> >>>>>    static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>>>>    {
> >>>>>           struct vhost_vdpa_as *as = asid_to_as(v, asid);
> >>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
> >>>>>
> >>>>>           hlist_del(&as->hash_link);
> >>>>>           vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
> >>>>> +       /*
> >>>>> +        * Devices with vendor specific IOMMU may need to restore
> >>>>> +        * iotlb to the initial or default state which is not done
> >>>>> +        * through device reset, as the IOTLB mapping manipulation
> >>>>> +        * could be decoupled from the virtio device life cycle.
> >>>>> +        */
> >>>> Should we do this according to whether IOTLB_PRESIST is set?
> >>> Well, in theory this seems like so but it's unnecessary code change
> >>> actually, as that is the way how vDPA parent behind platform IOMMU works
> >>> today, and userspace doesn't break as of today. :)
> >> Well, this is one question I've ever asked before. You have explained
> >> that one of the reason that we don't break userspace is that they may
> >> couple IOTLB reset with vDPA reset as well. One example is the Qemu.
> >>
> >>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set
> >>> it doesn't necessarily mean the iotlb will definitely be destroyed
> >>> across reset (think about the platform IOMMU case), so userspace today
> >>> is already tolerating enough with either good or bad IOMMU.

I'm confused, how to define tolerating here? For example, if it has
tolerance, why bother?

> >>This code of
> >>> not checking IOTLB_PERSIST being set is intentional, there's no point to
> >>> emulate bad IOMMU behavior even for older userspace (with improper
> >>> emulation to be done it would result in even worse performance).

I can easily imagine a case:

The old Qemu that works only with a setup like mlx5_vdpa. If we do
this without a negotiation, IOTLB will not be clear but the Qemu will
try to re-program the IOTLB after reset. Which will break?

1) stick the exact old behaviour with just one line of check
2) audit all the possible cases to avoid a one line of code

1) seems much easier than 2)

> >> For two reasons:
> >>
> >> 1) backend features need acked by userspace this is by design
> >> 2) keep the odd behaviour seems to be more safe as we can't audit
> >> every userspace program
> >>
> > The old behavior (without flag ack) cannot be trusted already, as:

Possibly but the point is to unbreak userspace no matter how weird the
behaviour we've ever had.

> > * Devices using platform IOMMU (in other words, not implementing
> > neither .set_map nor .dma_map) does not unmap memory at virtio reset.
> > * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do
> > reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb
> > called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset,
> > now all backends work the same as far as I know., which was (and is)
> > the way devices using the platform IOMMU works.
> >
> > The difference in behavior did not matter as QEMU unmaps all the
> > memory unregistering the memory listener at vhost_vdpa_dev_start(...,
> > started = false),
> Exactly. It's not just QEMU, but any (older) userspace manipulates
> mappings through the vhost-vdpa iotlb interface has to unmap all
> mappings to workaround the vdpa parent driver bug.

Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI.

> If they don't do
> explicit unmap, it would cause state inconsistency between vhost-vdpa
> and parent driver, then old mappings can't be restored, and new mapping
> can be added to iotlb after vDPA reset. There's no point to preserve
> this broken and inconsistent behavior between vhost-vdpa and parent
> driver, as userspace doesn't care at all!

It's a userspace notice change so we can't fix it silently:

https://lkml.org/lkml/2012/12/23/75

Another example which is related to vhost-vDPA:

https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger@redhat.com/T/

Thanks

>
> > but the backend acknowledging this feature flag
> > allows QEMU to make sure it is safe to skip this unmap & map in the
> > case of vhost stop & start cycle.
> >
> > In that sense, this feature flag is actually a signal for userspace to
> > know that the bug has been solved.
> Right, I couldn't say it better than you do, thanks! The feature flag is
> more of an unusual means to indicating kernel bug having been fixed,
> rather than introduce a new feature or new kernel behavior ending up in
> change of userspace's expectation.
>
> > Not offering it indicates that
> > userspace cannot trust the kernel will retain the maps.
> >
> > Si-Wei or Dragos, please correct me if I've missed something. Feel
> > free to use the text in case you find more clear in doc or patch log.
> Sure, will do, thank you! Will post v2 adding these to the log.
>
> Thanks,
> -Siwei
>
>
>
> >
> > Thanks!
> >
> >> Thanks
> >>
> >>> I think
> >>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100%
> >>> certainty of persistent iotlb mapping not getting lost across vdpa reset.
> >>>
> >>> Thanks,
> >>> -Siwei
> >>>
> >>> [1]
> >>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@oracle.com/
> >>> [2]
> >>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@oracle.com/
> >>>>    Otherwise
> >>>> we may break old userspace.
> >>>>
> >>>> Thanks
> >>>>
> >>>>> +       vhost_vdpa_reset_map(v, asid);
> >>>>>           kfree(as);
> >>>>>
> >>>>>           return 0;
> >>>>> --
> >>>>> 1.8.3.1
> >>>>>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ