lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 23 May 2018 20:27:48 +0200
From:   David Hildenbrand <david@...hat.com>
To:     linux-mm@...ck.org
Cc:     linux-kernel@...r.kernel.org,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Cornelia Huck <cohuck@...hat.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Halil Pasic <pasic@...ux.ibm.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Jason Wang <jasowang@...hat.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Len Brown <lenb@...nel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Michal Hocko <mhocko@...e.com>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vlastimil Babka <vbabka@...e.cz>, KVM <kvm@...r.kernel.org>,
        "virtualization@...ts.linux-foundation.org" 
        <virtualization@...ts.linux-foundation.org>,
        "virtio-dev@...ts.oasis-open.org" <virtio-dev@...ts.oasis-open.org>,
        "qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
        qemu-s390x <qemu-s390x@...gnu.org>
Subject: Re: [PATCH RFCv2 0/4] virtio-mem: paravirtualized memory

On 23.05.2018 20:24, David Hildenbrand wrote:
> This is the Linux driver side of virtio-mem. Compared to the QEMU side,
> it is in a pretty complete and clean state.
> 
> virtio-mem is a paravirtualized mechanism of adding/removing memory to/from
> a VM. We can do this on a 4MB granularity right now. In Linux, all
> memory is added to the ZONE_NORMAL, so unplugging cannot be guaranteed -
> but will be more likely to succeed compared to unplugging 128MB+ chunks.
> We might implement some optimizations in that area in the future that will
> make memory unplug more reliable.
> 
> For now, this is an easy way to give a VM access to more memory and
> eventually to remove some memory again. I am testing it on x86 and
> s390x (under QEMU TCG so far only).
> 
> This is the follow up on [1], but the concept, user interface and
> virtio protocol has been heavily changed. I am only including the important
> parts in this cover letter (because otherwise nobody will read it).  Please
> feel free to ask in case there are any questions.
> 
> This series is based on [4] and shows how it is being used. It contains
> further information. Also have a look at the description of patch nr 4 in
> this series.
> 
> This work is the result of the initital idea of Andrea Arcangeli to host
> enforce guest access to memory inflated in virtio-balloon using
> userfaultfd, which turned out to be problematic to implement. That's how
> I came up with virtio-mem.
> 
> --------------------------------------------------------------------------
> 1. High level concept
> --------------------------------------------------------------------------
> 
> Each virtio-mem device owns a memory region in the physical address space.
> The guest is allowed to plug and online up to 'requested_size' of memory.
> It will not be allowed to plug more than that size. Unplugged memory will
> be protected by configurable mechanisms (e.g. random discard, userfaultfd
> protection, etc.). virtio-mem is designed in a way that a guest may never
> assume to be able to even read unplugged memory. This is a big difference
> to classical balloon drivers.
> 
> The usable memory region might grow over time, so not all parts of the
> device memory region might be usable from the start. This is an
> optimization to allow a smarter implementation in the hypervisor (reduce
> size of dirty bitmaps, size of memory regions ...).
> 
> When the device driver starts up, it will query 'requested_size' and start
> to add memory to the system. This memory is not indicated e.g. via ACPI,
> so unmodified systems will not silently try to use unplugged memory that
> they are not supposed to touch.
> 
> Updates on the 'requested_size' indicate hypervisor requests to plug or
> unplug memory.
> 
> As each virtio-mem device can belong to a NUMA node, we can easily
> plug/unplug memory on a NUMA basis. And of course, we can have several
> independent virtio-mem devices for a VM.
> 
> The idea is *not* to add new virtio-mem devices when hotplugging memory,
> the idea is to resize (grow/shrink) virtio-mem devices.
> 
> --------------------------------------------------------------------------
> 2. Benefits
> --------------------------------------------------------------------------
> 
> Guest side:
> - Increase memory usable by Linux in 4MB steps (vs. section size like 128MB
>   on x86 or 2GB on e.g. some arm if I'm not mistaking)
> - Remove struct pages once all 4MB chunks of a section are offline (in
>   contrast to all balloon drivers where this never happens)
> - Don't fragment memory, while still being able to unplug smaller chunks
>   than ordinary DIMM sizes.
> - Memory hotplug support for architectures that have no proper interface
>   (e.g. s390x misses the external notification part) or e.g. QEMU/Linux
>   support is complicated to implement.
> - Automatic management of onlining/offlining in the device driver -
>   no manual interaction from an admin/tool necessary.
> 
> QEMU side:
> - Resizing (plug/unplug) has a single interface - in contrast to a mixture
>   of ACPI and virtio-balloon. See the example below.
> - Migration works out of the box - no need to specify new DIMMs or new
>   sizes on the migration target. It simply works.
> - We can resize in arbitrary steps and sizes (in contrast to e.g. ACPI,
>   where we have to know upfront in which granularity we later on want to
>   remove memory or even how much memory we eventually want to add to our
>   guest)
> - One interface to rule them (architectures) all :)
> 
> --------------------------------------------------------------------------
> 3. Reboot handling
> --------------------------------------------------------------------------
> 
> After a reboot, all memory is unplugged. This allows the hypervisor
> to see if support for virtio-mem is available in the freshly booted system.
> This way we could charge only for the actually "plugged" memory size. And
> it avoids to sense for plugged memory in the guest.
> 
> E.g. on every size change of a virtio-mem device, we can notify management
> layers. So we can track how much memory a VM has plugged.
> 
> --------------------------------------------------------------------------
> 4. Example
> --------------------------------------------------------------------------
> 
> (not including resizable memory regions on the QEMU side yet, so don't
>  focus on that part - it will consume a lot of memory right now for e.g.
>  dirty bitmaps and memory slot tracking data)
> 
> Start QEMU with two virtio-mem devices that provide little memory inititally.
> 	$ qemu-system-x86_64 -m 4G,maxmem=504G \
> 		-smp sockets=2,cores=2 \
> 		[...]
> 		-object memory-backend-ram,id=mem0,size=256G \
> 		-device virtio-mem-pci,id=vm0,memdev=mem0,node=0,size=4160M \
> 		-object memory-backend-ram,id=mem1,size=256G \
> 		-device virtio-mem-pci,id=vm1,memdev=mem1,node=1,size=3G
> 
> Query the configuration ('size' tells us the guest driver is active):
> 	(qemu) info memory-devices
> 	info memory-devices
> 	Memory device [virtio-mem]: "vm0"
> 	  phys-addr: 0x140000000
> 	  node: 0
> 	  requested-size: 4362076160
> 	  size: 4362076160
> 	  max-size: 274877906944
> 	  block-size: 4194304
> 	  memdev: /objects/mem0
> 	Memory device [virtio-mem]: "vm1"
> 	  phys-addr: 0x4140000000
> 	  node: 1
> 	  requested-size: 3221225472
> 	  size: 3221225472
> 	  max-size: 274877906944
> 	  block-size: 4194304
> 	  memdev: /objects/mem1
> 
> Change the size of a virtio-mem device:
> 	(qemu) memory-device-resize vm0 40960
> 	memory-device-resize vm0 40960
> 	...
> 	(qemu) info memory-devices
> 	info memory-devices
> 	Memory device [virtio-mem]: "vm0"
> 	  phys-addr: 0x140000000
> 	  node: 0
> 	  requested-size: 42949672960
> 	  size: 42949672960
> 	  max-size: 274877906944
> 	  block-size: 4194304
> 	  memdev: /objects/mem0
> 	...
> 
> Try to unplug memory (KASAN active in the guest - a lot of memory wasted):
> 	(qemu) memory-device-resize vm0 1024
> 	memory-device-resize vm0 1024
> 	...
> 	(qemu) info memory-devices
> 	info memory-devices
> 	Memory device [virtio-mem]: "vm0"
> 	  phys-addr: 0x140000000
> 	  node: 0
> 	  requested-size: 1073741824
> 	  size: 6169821184
> 	  max-size: 274877906944
> 	  block-size: 4194304
> 	  memdev: /objects/mem0
> 	...
> 
> I am sharing for now only the linux driver side. The current code can be
> found at [2]. The QEMU side is still heavily WIP, the current QEMU
> prototype can be found at [3].
> 
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html
> [2] https://github.com/davidhildenbrand/linux/tree/virtio-mem
> [3] https://github.com/davidhildenbrand/qemu/tree/virtio-mem
> [4] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1698014.html
> 
> David Hildenbrand (4):
>   ACPI: NUMA: export pxm_to_node
>   s390: mm: support removal of memory
>   s390: numa: implement memory_add_physaddr_to_nid()
>   virtio-mem: paravirtualized memory
> 
>  arch/s390/mm/init.c             |   18 +-
>  arch/s390/numa/numa.c           |   12 +
>  drivers/acpi/numa.c             |    1 +
>  drivers/virtio/Kconfig          |   15 +
>  drivers/virtio/Makefile         |    1 +
>  drivers/virtio/virtio_mem.c     | 1040 +++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h |    1 +
>  include/uapi/linux/virtio_mem.h |  134 ++++
>  8 files changed, 1216 insertions(+), 6 deletions(-)
>  create mode 100644 drivers/virtio/virtio_mem.c
>  create mode 100644 include/uapi/linux/virtio_mem.h
> 

cc-ing some further mailing lists

-- 

Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ