lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aea1681c-e04d-4678-b161-6dbd2b13b82e@oracle.com>
Date: Wed, 17 Jan 2024 15:31:36 -0500
From: Steven Sistare <steven.sistare@...cle.com>
To: Jason Wang <jasowang@...hat.com>
Cc: virtualization@...ts.linux-foundation.org, linux-kernel@...r.kernel.org,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Si-Wei Liu <si-wei.liu@...cle.com>,
        Eugenio Perez Martin <eperezma@...hat.com>,
        Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
        Dragos Tatulea
 <dtatulea@...dia.com>, Eli Cohen <elic@...dia.com>,
        Xie Yongji <xieyongji@...edance.com>
Subject: Re: [RFC V1 00/13] vdpa live update

On 1/10/2024 9:55 PM, Jason Wang wrote:
> On Thu, Jan 11, 2024 at 4:40 AM Steve Sistare <steven.sistare@...cle.com> wrote:
>>
>> Live update is a technique wherein an application saves its state, exec's
>> to an updated version of itself, and restores its state.  Clients of the
>> application experience a brief suspension of service, on the order of
>> 100's of milliseconds, but are otherwise unaffected.
>>
>> Define and implement interfaces that allow vdpa devices to be preserved
>> across fork or exec, to support live update for applications such as qemu.
>> The device must be suspended during the update, but its dma mappings are
>> preserved, so the suspension is brief.
>>
>> The VHOST_NEW_OWNER ioctl transfers device ownership and pinned memory
>> accounting from one process to another.
>>
>> The VHOST_BACKEND_F_NEW_OWNER backend capability indicates that
>> VHOST_NEW_OWNER is supported.
>>
>> The VHOST_IOTLB_REMAP message type updates a dma mapping with its userland
>> address in the new process.
>>
>> The VHOST_BACKEND_F_IOTLB_REMAP backend capability indicates that
>> VHOST_IOTLB_REMAP is supported and required.  Some devices do not
>> require it, because the userland address of each dma mapping is discarded
>> after being translated to a physical address.
>>
>> Here is a pseudo-code sequence for performing live update, based on
>> suspend + reset because resume is not yet available.  The vdpa device
>> descriptor, fd, remains open across the exec.
>>
>>   ioctl(fd, VHOST_VDPA_SUSPEND)
>>   ioctl(fd, VHOST_VDPA_SET_STATUS, 0)
>>   exec
> 
> Is there a userspace implementation as a reference?

I have working patches for qemu that use these ioctl's, but they depend on other 
qemu cpr patches that are a work in progress, and not posted yet.  I'm working on
that.

>>   ioctl(fd, VHOST_NEW_OWNER)
>>
>>   issue ioctls to re-create vrings
>>
>>   if VHOST_BACKEND_F_IOTLB_REMAP
>>       foreach dma mapping
>>           write(fd, {VHOST_IOTLB_REMAP, new_addr})
> 
> I think I need to understand the advantages of this approach. For
> example, why it is better than
> 
> ioctl(VHOST_RESET_OWNER)
> exec
> 
> ioctl(VHOST_SET_OWNER)
> 
> for each dma mapping
>      ioctl(VHOST_IOTLB_UPDATE)

That is slower.  VHOST_RESET_OWNER unbinds physical pages, and VHOST_IOTLB_UPDATE
rebinds them.  It costs multiple seconds for large memories, and is incurred during the
virtual machine's pause time during live update.  For comparison, the total pause time
for live update with vfio interfaces is ~100 millis.

However, the interaction with userland is so similar that the same code paths can be used.
In my qemu prototype, after cpr exec's new qemu:
  - vhost_vdpa_set_owner() calls VHOST_NEW_OWNER instead of VHOST_SET_OWNER
  - vhost_vdpa_dma_map() sets type VHOST_IOTLB_REMAP instead of VHOST_IOTLB_UPDATE

- Steve


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ