lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJaqyWejR8M1sgNtJmWbDGKp2rMZO2rHZP_syqqJxVMiHfXLUQ@mail.gmail.com>
Date:   Wed, 1 Jun 2022 12:48:27 +0200
From:   Eugenio Perez Martin <eperezma@...hat.com>
To:     Parav Pandit <parav@...dia.com>
Cc:     Jason Wang <jasowang@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "virtualization@...ts.linux-foundation.org" 
        <virtualization@...ts.linux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "martinh@...inx.com" <martinh@...inx.com>,
        Stefano Garzarella <sgarzare@...hat.com>,
        "martinpo@...inx.com" <martinpo@...inx.com>,
        "lvivier@...hat.com" <lvivier@...hat.com>,
        "pabloc@...inx.com" <pabloc@...inx.com>,
        Eli Cohen <elic@...dia.com>,
        Dan Carpenter <dan.carpenter@...cle.com>,
        Xie Yongji <xieyongji@...edance.com>,
        Christophe JAILLET <christophe.jaillet@...adoo.fr>,
        Zhang Min <zhang.min9@....com.cn>,
        Wu Zongyong <wuzongyong@...ux.alibaba.com>,
        "lulu@...hat.com" <lulu@...hat.com>,
        Zhu Lingshan <lingshan.zhu@...el.com>,
        "Piotr.Uminski@...el.com" <Piotr.Uminski@...el.com>,
        Si-Wei Liu <si-wei.liu@...cle.com>,
        "ecree.xilinx@...il.com" <ecree.xilinx@...il.com>,
        "gautam.dawar@....com" <gautam.dawar@....com>,
        "habetsm.xilinx@...il.com" <habetsm.xilinx@...il.com>,
        "tanuj.kamde@....com" <tanuj.kamde@....com>,
        "hanand@...inx.com" <hanand@...inx.com>,
        "dinang@...inx.com" <dinang@...inx.com>,
        Longpeng <longpeng2@...wei.com>
Subject: Re: [PATCH v4 0/4] Implement vdpasim stop operation

On Tue, May 31, 2022 at 10:26 PM Parav Pandit <parav@...dia.com> wrote:
>
>
>
> > From: Eugenio Perez Martin <eperezma@...hat.com>
> > Sent: Friday, May 27, 2022 3:55 AM
> >
> > On Fri, May 27, 2022 at 4:26 AM Jason Wang <jasowang@...hat.com> wrote:
> > >
> > > On Thu, May 26, 2022 at 8:54 PM Parav Pandit <parav@...dia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Eugenio Pérez <eperezma@...hat.com>
> > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > >
> > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > > offer
> > > > >
> > > > > that backend feature and userspace can effectively stop the device.
> > > > >
> > > > >
> > > > >
> > > > > This is a must before get virtqueue indexes (base) for live
> > > > > migration,
> > > > >
> > > > > since the device could modify them after userland gets them. There
> > > > > are
> > > > >
> > > > > individual ways to perform that action for some devices
> > > > >
> > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> > there
> > > > > was no
> > > > >
> > > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > > any
> > > > >
> > > > > pending operations like in flight requests. It must also preserve
> > > > > all
> > > > >
> > > > > the necessary state (the virtqueue vring base plus the possible
> > > > > device
> > > > >
> > > > > specific states) that is required for restoring in the future. The
> > > > >
> > > > > device must not change its configuration after that point.
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop == 0, the device can continue
> > > > >
> > > > > processing buffers as long as typical conditions are met (vq is
> > > > > enabled,
> > > > >
> > > > > DRIVER_OK status bit is enabled, etc).
> > > >
> > > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> > any mechanism in the virtio spec.
> > >
> > > We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That
> > > means it is expected to implement at least a subset of
> > > VIRTIO_CONFIG_S_STOP.
> > >
> >
> > Appending a link to the proposal, just for reference [1].
> >
> > > >
> > > > Why can't we use this ioctl() to indicate driver to start/stop the device
> > instead of driving it through the driver_ok?
> > >
> >
> > Parav, I'm not sure I follow you here.
> >
> > By the proposal, the resume of the device is (From qemu POV):
> > 1. To configure all data vqs and cvq (addr, num, ...) 2. To enable only CVQ, not
> > data vqs 3. To send DRIVER_OK 4. Wait for all buffers of CVQ to be used 5. To
> > enable all others data vqs (individual ioctl at the moment)
> >
> > Where can we fit the resume (as "stop(false)") here? If the device is stopped
> > (as if we send stop(true) before DRIVER_OK), we don't read CVQ first. If we
> > send it right after (or instead) DRIVER_OK, data buffers can reach data vqs
> > before configuring RSS.
> >
> It doesn’t make sense with currently proposed way of using cvq to replay the config.

The stop/resume part is not intended to restore the config through the
CVQ. The stop call is issued to be able to retrieve the vq status
(base, in vhost terminology). The symmetric operation (resume) was
added on demand, it was never intended to be part neither of the
config restore or the virtqueue state restore workflow.

The configuration restore workflow was modelled after the device
initialization, so each part needed to add the less things the better,
and only qemu needed to be changed. From the device POV, there is no
need to learn new tricks for this. The support of .set_vq_ready and
.get_vq_ready is already in the kernel in every vdpa backend driver.

> Need to continue with currently proposed temporary method that subsequently to be replaced with optimized flow as we discussed.

Back then, it was noted by you that enabling each data vq individually
after DRIVER_OK is slow on mlx5 devices. The solution was to batch
these enable calls accounting in the kernel, achieving no growth in
the vdpa uAPI layer. The proposed solution did not involve the resume
operation.

After that, you proposed in this thread "Why can't we use this ioctl()
to indicate driver to start/stop the device instead of driving it
through the driver_ok?". As I understand, that is a mistake, since it
requires the device, the vdpa layer, etc... to learn new tricks. It
requires qemu to duplicate the initialization layer (it's now common
for start and restore config). But I might have not seen the whole
picture, missing advantages of using the resume call for this
workflow. Can you describe the workflow you have in mind? How does
that new workflow affect this proposal?

I'm ok to change the proposal as long as we find we obtain a net gain.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ