lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5e2b6e72-0b7e-45fa-8a5d-d7a2eff9a5b4@amazon.com>
Date: Thu, 19 Sep 2024 01:02:12 +0200
From: Alexander Graf <graf@...zon.com>
To: "Jason A. Donenfeld" <Jason@...c4.com>, "Michael S. Tsirkin"
	<mst@...hat.com>, <virtio-dev@...ts.oasis-open.org>, <qemu-devel@...gnu.org>
CC: Lennart Poettering <mzxreary@...inter.de>, <linux-kernel@...r.kernel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Babis Chalios
	<bchalios@...zon.es>, Theodore Ts'o <tytso@....edu>, "Cali, Marco"
	<xmarcalx@...zon.co.uk>, Arnd Bergmann <arnd@...db.de>, "rostedt@...dmis.org"
	<rostedt@...dmis.org>, Christian Brauner <brauner@...nel.org>, Paolo Bonzini
	<pbonzini@...hat.com>, Sean Christopherson <seanjc@...gle.com>,
	<jann@...jh.net>, Michael Kelley <mhklinux@...look.com>
Subject: Re: vm events, userspace, the vmgenid driver, and the future [was:
 the uevent revert thread]


On 19.09.24 00:27, Jason A. Donenfeld wrote:
> [broadened subject line and added relevant parties to cc list]
>
> On Tue, Sep 17, 2024 at 10:55:20PM +0200, Alexander Graf wrote:
>> What is still open are user space applications that require event based
>> notification on VM clone events - and *only* VM clone events. This
>> mostly caters for tools like systemd which need to execute policy - such
>> as generating randomly generated MAC addresses - in the event a VM was
>> cloned.
>>
>> That's the use case this patch "vmgenid: emit uevent when VMGENID
>> updates" is about and I think the best path forward is to just revert
>> the revert. A uevent from the device driver is a well established, well
>> fitting Linux mechanism for that type of notification.
> The thing that worries me is that vmgenid is just some weird random
> microsoft acpi driver. It's one sort of particular device, and not a
> very good one at that. There's still room for virtio/qemu to improve on
> it with their own thing, or for vbox or whatever else to have their
> version, and xen theirs, and so forth. That is to say, I'm not sure that
> this virtual hardware is *the* way of doing it.


I agree, but given that it's been a few years and nobody else really 
came up with a different device, it means the current semantics for the 
scope of what the device is doing are close to "good enough". So I don't 
expect a lot of innovation here. And if there will be innovation - as 
you point out - it will bring different semantics that will then also 
require user space changes anyway.


> Even in terms of the entropy stuff (which I know you no longer care
> about, but I do), mst's original virtio-rng draft mentioned reporting
> events beyond just VM forks, extending it generically to any kind of
> entropy reduction situation. For example, migration or suspend or
> whatever might be interesting things to trigger. Heck, one could imagine
> those coming through vmgenid at some point, which would then change the
> semantics you're after for systemd.


If they come through vmgenid, it would need to gain a new type of event 
at which point the uevent notification would also change.

I'm also not sure why live migration would trigger either a vm clone or 
any rng relevant event. And suspend is something we already have the 
machinery for to detect.


> Even in terms of reporting exclusively about external VM events, there's
> a subtle thing to consider between clones/forks and rollbacks, as well
> as migrations. Vmgenid kind of lumps it all together, and hopefully the


It's the opposite: VMGenID is exclusively concerned about clones. It 
doesn't care about rollbacks. It doesn't care about migrations. Its 
value effectively changes when you clone a VM; and only then.


> hypervisor notifies in a way consistent with what userspace was hoping
> to learn about. (Right now, maybe we're doing what Hyper-V does, maybe,
> but also maybe not; it's kind of loose.) So at some point, there's a
> question about the limitations of vmgenid and the possible extensions of
> it, or whether this will come in a different driver or virtual hardware,
> and how.


To me a lot of this is too vague to be actionable. Unless someone comes 
in with real scenarios where they care about other scenarios, it sounds 
to me like the one scenario that vmgenid covers is what system level 
user space cares about. If in a few years we realize that we need 3 
different types of events, we can start looking at ways to funnel those 
in a more abstract way. Until then, because we don't know what these 
events will be, we can't even design an API that would address them.

Keep in mind that we're not really talking here about building a generic 
API for any random user space application. We only want to give system 
software the ability to reason about system events. IMHO any more 
abstract layer to funnel multiple different of these to downstream user 
space (if we ever care) would be a user space problem to solve, like for 
example a dbus event.


> Right now, this is mostly unexplored. The virtio-rng avenue was largest
> step in terms of exploring this problem space, but there are obviously a
> few directions to go, depending on what your primary concern is.
>
> But all of that makes me think that exposing the particulars of this
> virtual hardware driver to userspace is not the best option, or at least
> not an option to rush into (or to trick Greg into), and will both limit


I'm pretty sure I never tricked Greg into anything :)


> what we can do with it later, and potentially burden userspace with
> having to check multiple different things with confusing interactions
> down the road. So I think it's worth stepping back a bit and thinking


This interface here is only available to effectively udev/systemd type 
software. Any abstraction above that should be on them. And if we 
eventually decide that we need a better interface to generic user space, 
we can still build it.


> about what we actually want from this and what those semantics should
> be.
>
> I'd also love to hear from the QEMU guys on this and get their input. To
> that end, I've added qemu and virtio mailing lists, as well as mst.
>
> Also, I'd be interested to learn specifically what you (Amazon) want
> this for and what the larger picture there is. I get the systemd case,
> but I'm under the assumption you've got a different project in your
> woods.


The purpose for Amazon here is to accelerate serverless compute VMs [1].

We want to snapshot a VM post-init, before it receives any operation. 
Then resume it, initiate logic to resanitize itself and serve the 
request. The reason we want this particular vmgenid interface is so that 
we can create a notion of "resanitization" in user space at all. Once we 
have the event, systemd can start establishing service actions based on 
that which will lead to the user space ecosystem to grow interfaces to 
say "sanitize yourself" which we can then also invoke in VM post-init - 
probably without systemd :).

We built such event logic for Java today [2], but we would like to 
expand beyond. And that will become an unmaintainable mess without 
viable ecosystem support, so we may as well enable "normal" VM clones 
with the same logic. Given pretty much all hypervisors (including QEMU) 
out there already implement vmgenid, it seems to be the de facto 
standard to do exactly this notification.


Alex

[1] https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html
[2] 
https://docs.aws.amazon.com/lambda/latest/dg/snapstart-runtime-hooks.html




Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ