[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201128101604.GC557259@kernel.org>
Date: Sat, 28 Nov 2020 12:16:04 +0200
From: Mike Rapoport <rppt@...nel.org>
To: "Catangiu, Adrian Costin" <acatan@...zon.com>
Cc: Dmitry Safonov <0x7f454c46@...il.com>,
Alexander Graf <graf@...zon.de>,
Christian Borntraeger <borntraeger@...ibm.com>,
"Jason A. Donenfeld" <Jason@...c4.com>,
Jann Horn <jannh@...gle.com>, Willy Tarreau <w@....eu>,
"MacCarthaigh, Colm" <colmmacc@...zon.com>,
Andy Lutomirski <luto@...nel.org>,
"Theodore Y. Ts'o" <tytso@....edu>,
Eric Biggers <ebiggers@...nel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
kernel list <linux-kernel@...r.kernel.org>,
"Woodhouse, David" <dwmw@...zon.co.uk>,
"bonzini@....org" <bonzini@....org>,
"Singh, Balbir" <sblbir@...zon.com>,
"Weiss, Radu" <raduweis@...zon.com>,
"oridgar@...il.com" <oridgar@...il.com>,
"ghammer@...hat.com" <ghammer@...hat.com>,
Jonathan Corbet <corbet@....net>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Michael S. Tsirkin" <mst@...hat.com>,
Qemu Developers <qemu-devel@...gnu.org>,
KVM list <kvm@...r.kernel.org>,
Michal Hocko <mhocko@...nel.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Pavel Machek <pavel@....cz>,
Linux API <linux-api@...r.kernel.org>,
"mpe@...erman.id.au" <mpe@...erman.id.au>,
linux-s390 <linux-s390@...r.kernel.org>,
"areber@...hat.com" <areber@...hat.com>,
Pavel Emelyanov <ovzxemul@...il.com>,
Andrey Vagin <avagin@...il.com>,
Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
"gil@...l.com" <gil@...l.com>,
"asmehra@...hat.com" <asmehra@...hat.com>,
"dgunigun@...hat.com" <dgunigun@...hat.com>,
"vijaysun@...ibm.com" <vijaysun@...ibm.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [PATCH v3] drivers/virt: vmgenid: add vm generation id driver
Hi Adrian,
Usually each version of a patch is a separate e-mail thread
On Fri, Nov 27, 2020 at 08:26:02PM +0200, Catangiu, Adrian Costin wrote:
> - Background
>
> The VM Generation ID is a feature defined by Microsoft (paper:
> http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
> multiple hypervisor vendors.
>
> The feature is required in virtualized environments by apps that work
> with local copies/caches of world-unique data such as random values,
> uuids, monotonically increasing counters, etc.
> Such apps can be negatively affected by VM snapshotting when the VM
> is either cloned or returned to an earlier point in time.
>
> The VM Generation ID is a simple concept meant to alleviate the issue
> by providing a unique ID that changes each time the VM is restored
> from a snapshot. The hw provided UUID value can be used to
> differentiate between VMs or different generations of the same VM.
>
> - Problem
>
> The VM Generation ID is exposed through an ACPI device by multiple
> hypervisor vendors but neither the vendors or upstream Linux have no
> default driver for it leaving users to fend for themselves.
>
> Furthermore, simply finding out about a VM generation change is only
> the starting point of a process to renew internal states of possibly
> multiple applications across the system. This process could benefit
> from a driver that provides an interface through which orchestration
> can be easily done.
>
> - Solution
>
> This patch is a driver that exposes a monotonic incremental Virtual
> Machine Generation u32 counter via a char-dev FS interface. The FS
> interface provides sync and async VmGen counter updates notifications.
> It also provides VmGen counter retrieval and confirmation mechanisms.
>
> The generation counter and the interface through which it is exposed
> are available even when there is no acpi device present.
>
> When the device is present, the hw provided UUID is not exposed to
> userspace, it is internally used by the driver to keep accounting for
> the exposed VmGen counter. The counter starts from zero when the
> driver is initialized and monotonically increments every time the hw
> UUID changes (the VM generation changes).
> On each hw UUID change, the new hypervisor-provided UUID is also fed
> to the kernel RNG.
>
> If there is no acpi vmgenid device present, the generation changes are
> not driven by hw vmgenid events but can be driven by software through
> a dedicated driver ioctl.
>
> This patch builds on top of Or Idgar <oridgar@...il.com>'s proposal
> https://lkml.org/lkml/2018/3/1/498
>
> - Future improvements
>
> Ideally we would want the driver to register itself based on devices'
> _CID and not _HID, but unfortunately I couldn't find a way to do that.
> The problem is that ACPI device matching is done by
> '__acpi_match_device()' which exclusively looks at
> 'acpi_hardware_id *hwid'.
>
> There is a path for platform devices to match on _CID when _HID is
> 'PRP0001' - but this is not the case for the Qemu vmgenid device.
>
> Guidance and help here would be greatly appreciated.
>
> Signed-off-by: Adrian Catangiu <acatan@...zon.com>
>
> ---
Please put the history in the descending order next time
v2 -> v3:
...
v1 -> v2:
...
> v1 -> v2:
>
> - expose to userspace a monotonically increasing u32 Vm Gen Counter
> instead of the hw VmGen UUID
> - since the hw/hypervisor-provided 128-bit UUID is not public
> anymore, add it to the kernel RNG as device randomness
> - insert driver page containing Vm Gen Counter in the user vma in
> the driver's mmap handler instead of using a fault handler
> - turn driver into a misc device driver to auto-create /dev/vmgenid
> - change ioctl arg to avoid leaking kernel structs to userspace
> - update documentation
> - various nits
> - rebase on top of linus latest
>
> v2 -> v3:
>
> - separate the core driver logic and interface, from the ACPI device.
> The ACPI vmgenid device is now one possible backend.
> - fix issue when timeout=0 in VMGENID_WAIT_WATCHERS
> - add locking to avoid races between fs ops handlers and hw irq
> driven generation updates
> - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is
> outdated or a generation change happens while waiting (thus making
> current caller outdated), the ioctl returns -EINTR to signal the
> user to handle event and retry. Fixes blocking on oneself.
> - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by
> CAP_CHECKPOINT_RESTORE capability, through which software can force
> generation bump.
> ---
> Documentation/virt/vmgenid.rst | 240 +++++++++++++++++++++++
> drivers/virt/Kconfig | 17 ++
> drivers/virt/Makefile | 1 +
> drivers/virt/vmgenid.c | 435 +++++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/vmgenid.h | 14 ++
> 5 files changed, 707 insertions(+)
> create mode 100644 Documentation/virt/vmgenid.rst
> create mode 100644 drivers/virt/vmgenid.c
> create mode 100644 include/uapi/linux/vmgenid.h
>
> diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst
> new file mode 100644
> index 0000000..b6a9f8d
> --- /dev/null
> +++ b/Documentation/virt/vmgenid.rst
> @@ -0,0 +1,240 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +============
> +VMGENID
> +============
The "==" line should be the same length as the title, I think.
> +
> +The VM Generation ID is a feature defined by Microsoft (paper:
> +http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
> +multiple hypervisor vendors.
> +
> +The feature is required in virtualized environments by apps that work
Please spell 'applications' fully
> +with local copies/caches of world-unique data such as random values,
> +uuids, monotonically increasing counters, etc.
UUIDs
> +Such apps can be negatively affected by VM snapshotting when the VM
^applications
> +is either cloned or returned to an earlier point in time.
> +
> +The VM Generation ID is a simple concept meant to alleviate the issue
> +by providing a unique ID that changes each time the VM is restored
> +from a snapshot. The hw provided UUID value can be used to
^hardware (and below as well)
> +differentiate between VMs or different generations of the same VM.
> +
> +The VM Generation ID is exposed through an ACPI device by multiple
> +hypervisor vendors. The driver for it lives at
> +``drivers/virt/vmgenid.c``
> +
> +The ``vmgenid`` driver exposes a monotonic incremental Virtual
> +Machine Generation u32 counter via a char-dev FS interface that
> +provides sync and async VmGen counter updates notifications. It also
> +provides VmGen counter retrieval and confirmation mechanisms.
It would be nice to memntion here the name of the chardev :)
> +This counter and the interface through which it is exposed are
> +available even when there is no acpi device present.
> +
> +When the device is present, the hw provided UUID is not exposed to
> +userspace, it is internally used by the driver to keep accounting for
> +the exposed VmGen counter. The counter starts from zero when the
> +driver is initialized and monotonically increments every time the hw
> +UUID changes (the VM generation changes).
> +On each hw UUID change, the new UUID is also fed to the kernel RNG.
> +
> +If there is no acpi vmgenid device present, the generation changes are
> +not driven by hw vmgenid events and thus should be driven by software
> +through a dedicated driver ioctl.
> +
> +Driver interface:
> +
> +``open()``:
> + When the device is opened, a copy of the current Vm-Gen-Id (counter)
> + is associated with the open file descriptor. The driver now tracks
> + this file as an independent *watcher*. The driver tracks how many
> + watchers are aware of the latest Vm-Gen-Id counter and how many of
> + them are *outdated*; outdated being those that have lived through
> + a Vm-Gen-Id change but not yet confirmed the new generation counter.
> +
> +``read()``:
> + Read is meant to provide the *new* VM generation counter when a
> + generation change takes place. The read operation blocks until the
> + associated counter is no longer up to date - until HW vm gen id
> + changes - at which point the new counter is provided/returned.
> + Nonblocking ``read()`` uses ``EAGAIN`` to signal that there is no
> + *new* counter value available. The generation counter is considered
> + *new* for each open file descriptor that hasn't confirmed the new
> + value, following a generation change. Therefore, once a generation
> + change takes place, all ``read()`` calls will immediately return the
> + new generation counter and will continue to do so until the
> + new value is confirmed back to the driver through ``write()``.
> + Partial reads are not allowed - read buffer needs to be at least
> + ``sizeof(unsigned)`` in size.
> +
> +``write()``:
> + Write is used to confirm the up-to-date Vm Gen counter back to the
> + driver.
> + Following a VM generation change, all existing watchers are marked
> + as *outdated*. Each file descriptor will maintain the *outdated*
> + status until a ``write()`` confirms the up-to-date counter back to
> + the driver.
> + Partial writes are not allowed - write buffer should be exactly
> + ``sizeof(unsigned)`` in size.
> +
> +``poll()``:
> + Poll is implemented to allow polling for generation counter updates.
> + Such updates result in ``EPOLLIN`` polling status until the new
> + up-to-date counter is confirmed back to the driver through a
> + ``write()``.
> +
> +``ioctl()``:
> + The driver also adds support for tracking count of open file
> + descriptors that haven't acknowledged a generation counter update.
> + This is exposed through two IOCTLs:
> +
> + - VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of
> + *outdated* watchers - number of file descriptors that were open
> + during a VM generation change, and which have not yet confirmed the
> + new generation counter.
> + - VMGENID_WAIT_WATCHERS: blocks until there are no more *outdated*
> + watchers, or if a ``timeout`` argument is provided, until the
> + timeout expires.
> + If the current caller is *outdated* or a generation change happens
> + while waiting (thus making current caller *outdated*), the ioctl
> + returns ``-EINTR`` to signal the user to handle event and retry.
> + - VMGENID_FORCE_GEN_UPDATE: forces a generation counter bump. Can only
> + be used by processes with CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN
> + capabilities.
> +
> +``mmap()``:
> + The driver supports ``PROT_READ, MAP_SHARED`` mmaps of a single page
> + in size. The first 4 bytes of the mapped page will contain an
> + up-to-date copy of the VM generation counter.
> + The mapped memory can be used as a low-latency generation counter
> + probe mechanism in critical sections - see examples.
> +
> +``close()``:
> + Removes the file descriptor as a Vm generation counter watcher.
> +
> +Example application workflows
> +-----------------------------
> +
> +1) Watchdog thread simplified example::
> +
> + void watchdog_thread_handler(int *thread_active)
> + {
> + unsigned genid;
> + int fd = open("/dev/vmgenid", O_RDWR | O_CLOEXEC, S_IRUSR |
> S_IWUSR);
> +
> + do {
> + // read new gen ID - blocks until VM generation changes
> + read(fd, &genid, sizeof(genid));
> +
> + // because of VM generation change, we need to rebuild world
> + reseed_app_env();
> +
> + // confirm we're done handling gen ID update
> + write(fd, &genid, sizeof(genid));
> + } while (atomic_read(thread_active));
> +
> + close(fd);
> + }
> +
> +2) ASYNC simplified example::
> +
> + void handle_io_on_vmgenfd(int vmgenfd)
> + {
> + unsigned genid;
> +
> + // read new gen ID - we need it to confirm we've handled update
> + read(fd, &genid, sizeof(genid));
> +
> + // because of VM generation change, we need to rebuild world
> + reseed_app_env();
> +
> + // confirm we're done handling the gen ID update
> + write(fd, &genid, sizeof(genid));
> + }
> +
> + int main() {
> + int epfd, vmgenfd;
> + struct epoll_event ev;
> +
> + epfd = epoll_create(EPOLL_QUEUE_LEN);
> +
> + vmgenfd = open("/dev/vmgenid",
> + O_RDWR | O_CLOEXEC | O_NONBLOCK,
> + S_IRUSR | S_IWUSR);
> +
> + // register vmgenid for polling
> + ev.events = EPOLLIN;
> + ev.data.fd = vmgenfd;
> + epoll_ctl(epfd, EPOLL_CTL_ADD, vmgenfd, &ev);
> +
> + // register other parts of your app for polling
> + // ...
> +
> + while (1) {
> + // wait for something to do...
> + int nfds = epoll_wait(epfd, events,
> + MAX_EPOLL_EVENTS_PER_RUN,
> + EPOLL_RUN_TIMEOUT);
> + if (nfds < 0) die("Error in epoll_wait!");
> +
> + // for each ready fd
> + for(int i = 0; i < nfds; i++) {
> + int fd = events[i].data.fd;
> +
> + if (fd == vmgenfd)
> + handle_io_on_vmgenfd(vmgenfd);
> + else
> + handle_some_other_part_of_the_app(fd);
> + }
> + }
> +
> + return 0;
> + }
> +
> +3) Mapped memory polling simplified example::
> +
> + /*
> + * app/library function that provides cached secrets
> + */
> + char * safe_cached_secret(app_data_t *app)
> + {
> + char *secret;
> + volatile unsigned *const genid_ptr = get_vmgenid_mapping(app);
> + again:
> + secret = __cached_secret(app);
> +
> + if (unlikely(*genid_ptr != app->cached_genid)) {
> + // rebuild world then confirm the genid update (thru write)
> + rebuild_caches(app);
> +
> + app->cached_genid = *genid_ptr;
> + ack_vmgenid_update(app);
> +
> + goto again;
> + }
> +
> + return secret;
> + }
> +
> +4) Orchestrator simplified example::
> +
> + /*
> + * orchestrator - manages multiple apps and libraries used by a service
> + * and tries to make sure all sensitive components gracefully handle
> + * VM generation changes.
> + * Following function is called on detection of a VM generation change.
> + */
> + int handle_vmgen_update(int vmgen_fd, unsigned new_gen_id)
> + {
> + // pause until all components have handled event
> + pause_service();
> +
> + // confirm *this* watcher as up-to-date
> + write(vmgen_fd, &new_gen_id, sizeof(unsigned));
> +
> + // wait for all *others* for at most 5 seconds.
> + ioctl(vmgen_fd, VMGENID_WAIT_WATCHERS, 5000);
> +
> + // all apps on the system have rebuilt worlds
> + resume_service();
> + }
> diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
> index 80c5f9c1..5d5f37b 100644
> --- a/drivers/virt/Kconfig
> +++ b/drivers/virt/Kconfig
> @@ -13,6 +13,23 @@ menuconfig VIRT_DRIVERS
>
> if VIRT_DRIVERS
>
> +config VMGENID
> + tristate "Virtual Machine Generation ID driver"
> + depends on ACPI
I think this is not needed. We have /dev/vmgenid regardless of ACPI
device for container usecase and we may have a different HW emulation
for s390 and PowerPC.
> + default N
> + help
> + This is a Virtual Machine Generation ID driver which provides
> + a virtual machine generation counter. The driver exposes FS ops
> + on /dev/vmgenid through which it can provide information and
> + notifications on VM generation changes that happen on snapshots
> + or cloning.
> + This enables applications and libraries that store or cache
> + sensitive information, to know that they need to regenerate it
> + after process memory has been exposed to potential copying.
> +
> + To compile this driver as a module, choose M here: the
> + module will be called vmgenid.
> +
> config FSL_HV_MANAGER
> tristate "Freescale hypervisor management driver"
> depends on FSL_SOC
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index f28425c..889be01 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -4,6 +4,7 @@
> #
>
> obj-$(CONFIG_FSL_HV_MANAGER) += fsl_hypervisor.o
> +obj-$(CONFIG_VMGENID) += vmgenid.o
> obj-y += vboxguest/
>
> obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves/
> diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c
> new file mode 100644
> index 0000000..c4d4683
> --- /dev/null
> +++ b/drivers/virt/vmgenid.c
> @@ -0,0 +1,435 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtual Machine Generation ID driver
> + *
> + * Copyright (C) 2018 Red Hat Inc. All rights reserved.
> + *
> + * Copyright (C) 2020 Amazon. All rights reserved.
> + *
> + * Authors:
> + * Adrian Catangiu <acatan@...zon.com>
> + * Or Idgar <oridgar@...il.com>
> + * Gal Hammer <ghammer@...hat.com>
> + *
> + */
> +#include <linux/acpi.h>
> +#include <linux/kernel.h>
> +#include <linux/miscdevice.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/poll.h>
> +#include <linux/random.h>
> +#include <linux/uuid.h>
> +#include <linux/vmgenid.h>
> +
> +#define DEV_NAME "vmgenid"
> +ACPI_MODULE_NAME(DEV_NAME);
> +
> +struct acpi_data {
> + uuid_t uuid;
> + void *uuid_iomap;
> +};
> +
> +struct driver_data {
I'd suggest vmgenid_data
> + unsigned long map_buf;
We use tab=8 for indentation. Please run your patch though
scripts/checkpatch.pl to make sure it conforms the coding style.
> + wait_queue_head_t read_waitq;
> + atomic_t generation_counter;
> +
> + unsigned int watchers;
> + atomic_t outdated_watchers;
> + wait_queue_head_t outdated_waitq;
> + spinlock_t lock;
> +
> + struct acpi_data *acpi_data;
> +};
> +struct driver_data driver_data;
static
> +
> +struct file_data {
> + unsigned int acked_gen_counter;
> +};
> +
> +static int equals_gen_counter(unsigned int counter)
> +{
> + return counter == atomic_read(&driver_data.generation_counter);
> +}
> +
> +static void vmgenid_bump_generation(void)
> +{
> + unsigned long flags;
> + int counter;
> +
> + spin_lock_irqsave(&driver_data.lock, flags);
> + counter = atomic_inc_return(&driver_data.generation_counter);
> + *((int *) driver_data.map_buf) = counter;
> + atomic_set(&driver_data.outdated_watchers, driver_data.watchers);
> +
> + wake_up_interruptible(&driver_data.read_waitq);
> + wake_up_interruptible(&driver_data.outdated_waitq);
> + spin_unlock_irqrestore(&driver_data.lock, flags);
> +}
> +
> +static void vmgenid_put_outdated_watchers(void)
> +{
> + if (atomic_dec_and_test(&driver_data.outdated_watchers))
> + wake_up_interruptible(&driver_data.outdated_waitq);
> +}
> +
> +static int vmgenid_open(struct inode *inode, struct file *file)
> +{
> + struct file_data *fdata = kzalloc(sizeof(struct file_data),
> GFP_KERNEL);
> + unsigned long flags;
> +
> + if (!fdata)
> + return -ENOMEM;
> +
> + spin_lock_irqsave(&driver_data.lock, flags);
> + fdata->acked_gen_counter =
> atomic_read(&driver_data.generation_counter);
> + ++driver_data.watchers;
> + spin_unlock_irqrestore(&driver_data.lock, flags);
> +
> + file->private_data = fdata;
> +
> + return 0;
> +}
> +
> +static int vmgenid_close(struct inode *inode, struct file *file)
> +{
> + struct file_data *fdata = file->private_data;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&driver_data.lock, flags);
> + if (!equals_gen_counter(fdata->acked_gen_counter))
> + vmgenid_put_outdated_watchers();
> + --driver_data.watchers;
> + spin_unlock_irqrestore(&driver_data.lock, flags);
> +
> + kfree(fdata);
> +
> + return 0;
> +}
> +
> +static ssize_t
> +vmgenid_read(struct file *file, char __user *ubuf, size_t nbytes,
Please keep the function name at the same line as return type and wrap
parameters to the next line.
> loff_t *ppos)
> +{
> + struct file_data *fdata = file->private_data;
> + ssize_t ret;
> + int gen_counter;
> +
> + if (nbytes == 0)
> + return 0;
> + /* disallow partial reads */
> + if (nbytes < sizeof(gen_counter))
> + return -EINVAL;
> +
> + if (equals_gen_counter(fdata->acked_gen_counter)) {
> + if (file->f_flags & O_NONBLOCK)
> + return -EAGAIN;
> + ret = wait_event_interruptible(
> + driver_data.read_waitq,
> + !equals_gen_counter(fdata->acked_gen_counter)
> + );
> + if (ret)
> + return ret;
> + }
> +
> + gen_counter = atomic_read(&driver_data.generation_counter);
> + ret = copy_to_user(ubuf, &gen_counter, sizeof(gen_counter));
> + if (ret)
> + return -EFAULT;
> +
> + return sizeof(gen_counter);
> +}
> +
> +static ssize_t vmgenid_write(struct file *file, const char __user *ubuf,
> + size_t count, loff_t *ppos)
> +{
> + struct file_data *fdata = file->private_data;
> + unsigned int new_acked_gen;
> + unsigned long flags;
> +
> + /* disallow partial writes */
> + if (count != sizeof(new_acked_gen))
> + return -EINVAL;
> + if (copy_from_user(&new_acked_gen, ubuf, count))
> + return -EFAULT;
> +
> + spin_lock_irqsave(&driver_data.lock, flags);
> + /* wrong gen-counter acknowledged */
> + if (!equals_gen_counter(new_acked_gen)) {
> + spin_unlock_irqrestore(&driver_data.lock, flags);
> + return -EINVAL;
> + }
> + if (!equals_gen_counter(fdata->acked_gen_counter)) {
> + fdata->acked_gen_counter = new_acked_gen;
> + vmgenid_put_outdated_watchers();
> + }
> + spin_unlock_irqrestore(&driver_data.lock, flags);
> +
> + return (ssize_t)count;
> +}
> +
> +static __poll_t
> +vmgenid_poll(struct file *file, poll_table *wait)
> +{
> + __poll_t mask = 0;
> + struct file_data *fdata = file->private_data;
> +
> + if (!equals_gen_counter(fdata->acked_gen_counter))
> + return EPOLLIN | EPOLLRDNORM;
> +
> + poll_wait(file, &driver_data.read_waitq, wait);
> +
> + if (!equals_gen_counter(fdata->acked_gen_counter))
> + mask = EPOLLIN | EPOLLRDNORM;
> +
> + return mask;
> +}
> +
> +static long vmgenid_ioctl(struct file *file,
> + unsigned int cmd, unsigned long arg)
> +{
> + struct file_data *fdata = file->private_data;
> + unsigned long timeout_ns;
> + ktime_t until;
> + int ret = 0;
> +
> + switch (cmd) {
> + case VMGENID_GET_OUTDATED_WATCHERS:
> + ret = atomic_read(&driver_data.outdated_watchers);
> + break;
> + case VMGENID_WAIT_WATCHERS:
> + timeout_ns = arg * NSEC_PER_MSEC;
> + until = timeout_ns ? ktime_set(0, timeout_ns) : KTIME_MAX;
> +
> + ret = wait_event_interruptible_hrtimeout(
> + driver_data.outdated_waitq,
> + (!atomic_read(&driver_data.outdated_watchers) ||
> + !equals_gen_counter(fdata->acked_gen_counter)),
> + until
> + );
> + if (atomic_read(&driver_data.outdated_watchers))
> + ret = -EINTR;
> + else
> + ret = 0;
> + break;
> + case VMGENID_FORCE_GEN_UPDATE:
> + if (!checkpoint_restore_ns_capable(current_user_ns()))
> + return -EACCES;
> + vmgenid_bump_generation();
> + break;
> + default:
> + ret = -EINVAL;
> + break;
> + }
> + return ret;
> +}
> +
> +static int vmgenid_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct file_data *fdata = file->private_data;
> +
> + if (vma->vm_pgoff != 0 || vma_pages(vma) > 1)
> + return -EINVAL;
> +
> + if ((vma->vm_flags & VM_WRITE) != 0)
> + return -EPERM;
> +
> + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags &= ~VM_MAYWRITE;
> + vma->vm_private_data = fdata;
> +
> + return vm_insert_page(vma, vma->vm_start,
> + virt_to_page(driver_data.map_buf));
> +}
> +
> +static const struct file_operations fops = {
> + .owner = THIS_MODULE,
> + .mmap = vmgenid_mmap,
> + .open = vmgenid_open,
> + .release = vmgenid_close,
> + .read = vmgenid_read,
> + .write = vmgenid_write,
> + .poll = vmgenid_poll,
> + .unlocked_ioctl = vmgenid_ioctl,
> +};
> +
> +struct miscdevice vmgenid_misc = {
static
> + .minor = MISC_DYNAMIC_MINOR,
> + .name = "vmgenid",
> + .fops = &fops,
> +};
> +
> +static int vmgenid_acpi_map(struct acpi_data *priv, acpi_handle handle)
> +{
> + int i;
> + phys_addr_t phys_addr;
> + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
> + acpi_status status;
> + union acpi_object *pss;
> + union acpi_object *element;
> +
> + status = acpi_evaluate_object(handle, "ADDR", NULL, &buffer);
> + if (ACPI_FAILURE(status)) {
> + ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR"));
> + return -ENODEV;
> + }
> + pss = buffer.pointer;
> + if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2)
> + return -EINVAL;
> +
> + phys_addr = 0;
> + for (i = 0; i < pss->package.count; i++) {
> + element = &(pss->package.elements[i]);
> + if (element->type != ACPI_TYPE_INTEGER)
> + return -EINVAL;
> + phys_addr |= element->integer.value << i * 32;
> + }
> +
> + priv->uuid_iomap = acpi_os_map_memory(phys_addr, sizeof(uuid_t));
> + if (!priv->uuid_iomap) {
> + pr_err("Could not map memory at 0x%llx, size %u\n",
> + phys_addr,
> + (u32) sizeof(uuid_t));
> + return -ENOMEM;
> + }
> +
> + memcpy_fromio(&priv->uuid, priv->uuid_iomap, sizeof(uuid_t));
> +
> + return 0;
> +}
> +
> +static int vmgenid_acpi_add(struct acpi_device *device)
> +{
> + int ret;
> +
> + if (!device)
> + return -EINVAL;
> +
> + driver_data.acpi_data = kzalloc(sizeof(struct acpi_data), GFP_KERNEL);
> + if (!driver_data.acpi_data) {
> + pr_err("vmgenid: failed to allocate acpi_data\n");
> + return -ENOMEM;
> + }
> + device->driver_data = &driver_data;
> +
> + ret = vmgenid_acpi_map(driver_data.acpi_data, device->handle);
> + if (ret < 0) {
> + pr_err("vmgenid: failed to map acpi device\n");
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
> + kfree(driver_data.acpi_data);
> + driver_data.acpi_data = NULL;
> +
> + return ret;
> +}
> +
> +static int vmgenid_acpi_remove(struct acpi_device *device)
> +{
> + struct acpi_data *priv;
> +
> + if (!device || !acpi_driver_data(device))
> + return -EINVAL;
> +
> + device->driver_data = NULL;
> + priv = driver_data.acpi_data;
> + driver_data.acpi_data = NULL;
> +
> + if (priv && priv->uuid_iomap)
> + acpi_os_unmap_memory(priv->uuid_iomap, sizeof(uuid_t));
> + kfree(priv);
> +
> + return 0;
> +}
> +
> +static void vmgenid_acpi_notify(struct acpi_device *device, u32 event)
> +{
> + struct acpi_data *priv;
> + uuid_t old_uuid;
> +
> + if (!device || !acpi_driver_data(device)) {
> + pr_err("VMGENID notify with NULL private data\n");
> + return;
> + }
> + priv = driver_data.acpi_data;
> +
> + /* update VM Generation UUID */
> + old_uuid = priv->uuid;
> + memcpy_fromio(&priv->uuid, priv->uuid_iomap, sizeof(uuid_t));
> +
> + if (memcmp(&old_uuid, &priv->uuid, sizeof(uuid_t))) {
> + /* HW uuid updated */
> + vmgenid_bump_generation();
> + add_device_randomness(&priv->uuid, sizeof(uuid_t));
> + }
> +}
> +
> +static const struct acpi_device_id vmgenid_ids[] = {
> + {"QEMUVGID", 0},
> + {"", 0},
> +};
> +
> +static struct acpi_driver acpi_vmgenid_driver = {
> + .name = "vm_generation_id",
> + .ids = vmgenid_ids,
> + .owner = THIS_MODULE,
> + .ops = {
> + .add = vmgenid_acpi_add,
> + .remove = vmgenid_acpi_remove,
> + .notify = vmgenid_acpi_notify,
> + }
> +};
> +
> +static int __init vmgenid_init(void)
> +{
> + int ret;
> +
> + driver_data.map_buf = get_zeroed_page(GFP_KERNEL);
> + if (!driver_data.map_buf)
> + return -ENOMEM;
> +
> + atomic_set(&driver_data.generation_counter, 0);
> + atomic_set(&driver_data.outdated_watchers, 0);
> + init_waitqueue_head(&driver_data.read_waitq);
> + init_waitqueue_head(&driver_data.outdated_waitq);
> + spin_lock_init(&driver_data.lock);
> + driver_data.acpi_data = NULL;
> +
> + ret = misc_register(&vmgenid_misc);
> + if (ret < 0) {
> + pr_err("misc_register() failed for vmgenid\n");
> + goto err;
> + }
> +
> + ret = acpi_bus_register_driver(&acpi_vmgenid_driver);
> + if (ret < 0)
> + pr_warn("No vmgenid acpi device found\n");
I think this needs to be reworked to support no-ACPI version. For
instance we can call here something like
ret = vmgenid_hw_register();
and have
#ifdef CONFIG_ACPI
static int vmgenid_hw_register(void)
{
return acpi_bus_register_driver(&acpi_vmgenid_driver);
}
#else
static int vmgenid_hw_register(void)
{
return 0;
}
#endif
> +
> + return 0;
> +
> +err:
> + free_pages(driver_data.map_buf, 0);
> + driver_data.map_buf = 0;
> +
> + return ret;
> +}
> +
> +static void __exit vmgenid_exit(void)
> +{
> + acpi_bus_unregister_driver(&acpi_vmgenid_driver);
> +
> + misc_deregister(&vmgenid_misc);
> + free_pages(driver_data.map_buf, 0);
> + driver_data.map_buf = 0;
> +}
> +
> +module_init(vmgenid_init);
> +module_exit(vmgenid_exit);
> +
> +MODULE_AUTHOR("Adrian Catangiu");
> +MODULE_DESCRIPTION("Virtual Machine Generation ID");
> +MODULE_LICENSE("GPL");
> +MODULE_VERSION("0.1");
> diff --git a/include/uapi/linux/vmgenid.h b/include/uapi/linux/vmgenid.h
> new file mode 100644
> index 0000000..9316b00
> --- /dev/null
> +++ b/include/uapi/linux/vmgenid.h
> @@ -0,0 +1,14 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +
> +#ifndef _UAPI_LINUX_VMGENID_H
> +#define _UAPI_LINUX_VMGENID_H
> +
> +#include <linux/ioctl.h>
> +
> +#define VMGENID_IOCTL 0x2d
> +#define VMGENID_GET_OUTDATED_WATCHERS _IO(VMGENID_IOCTL, 1)
> +#define VMGENID_WAIT_WATCHERS _IO(VMGENID_IOCTL, 2)
> +#define VMGENID_FORCE_GEN_UPDATE _IO(VMGENID_IOCTL, 3)
> +
> +#endif /* _UAPI_LINUX_VMGENID_H */
> +
> --
> 2.7.4
>
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists