lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47137806-9162-0f60-e830-1a3731595c8c@amazon.com>
Date:   Thu, 10 Mar 2022 12:18:04 +0100
From:   Alexander Graf <graf@...zon.com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
CC:     LKML <linux-kernel@...r.kernel.org>,
        KVM list <kvm@...r.kernel.org>,
        "QEMU Developers" <qemu-devel@...gnu.org>,
        <linux-hyperv@...r.kernel.org>,
        "Linux Crypto Mailing List" <linux-crypto@...r.kernel.org>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        <adrian@...ity.io>, Laszlo Ersek <lersek@...hat.com>,
        Daniel P. Berrangé <berrange@...hat.com>,
        "Dominik Brodowski" <linux@...inikbrodowski.net>,
        Jann Horn <jannh@...gle.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        "Brown, Len" <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
        Linux PM <linux-pm@...r.kernel.org>,
        Colm MacCarthaigh <colmmacc@...zon.com>,
        Theodore Ts'o <tytso@....edu>, Arnd Bergmann <arnd@...db.de>
Subject: Re: propagating vmgenid outward and upward


On 09.03.22 23:02, Jason A. Donenfeld wrote:
> Hi Alex,
>
> On Wed, Mar 9, 2022 at 3:10 AM Alexander Graf <graf@...zon.com> wrote:
>>> The vmgenid driver basically works, though it is racy, because that ACPI
>>> notification can arrive after the system is already running again. This
>>
>> I believe enough people already pointed out that this assumption is
>> incorrect. The thing that is racy about VMGenID is the interrupt based
>> notification.
> I'm having a hard time figuring out what's different between your
> statement and mine. I said that the race is due to the notification.
> You said that the race is due to the notification. What subtle thing
> am I missing here that would lead you to say that my assumption is
> incorrect? Or did you just misread?


The subtle difference is that you don't need to rely on the notification 
to learn about the world switch. If you instead read VMGenID explicitly 
without waiting for the notification, you're guaranteed to always know 
whether you were cloned. That means the actual VMGenID interface is not 
always racy. Just the notification part is.

So you *can* build a race-free VMGenID mechanism. You just can't build a 
race-free *event based* VMGenID mechanism if you want to allow cloning 
at arbitrary points in time.


>
>> The actual identifier is updated before the VM resumes
>> from its clone operation, so if you match on that you will know whether
>> you are in a new or old world. And that is enough to create
>> transactions: Save the identifier before a "crypto transaction",
>> validate before you finish, if they don't match, abort, reseed and replay.
> Right. But more than just transactions, it's useful to preventing key
> reuse vulnerabilities, in which case, you store the current identifier
> just before an ephemeral key is generated, and then subsequently check
> to see that the identifier hasn't changed before transmitting anything
> related to that key.
>
>> If you follow the logic at the beginning of the mail, you can create
>> something race free if you consume the hardware VMGenID counter. You can
>> not make it race free if you rely on the interrupt mechanism.
> Yes, as mentioned and discussed in depth before. However, your use of
> the word "counter" is problematic. Vmgenid is not a counter. It's a
> unique identifier. That means you can't compare it with a single word
> comparison but have to compare all of the 16 bytes. That seems
> potentially expensive. It's for that reason that I suggested
> augmenting the vmgenid spec with an additional word-sized _counter_
> that could be mapped into the kernels and into userspaces.


I think Michael's benchmarks show quite well that it's not all that 
expensive. We're talking 2x 64bit compares within the same cache line. 
You already realized that we don't need them to be atomic - just 
properly barriered.

So what if we created a vsyscall that takes a buffer of "up to 16 
bytes". If we later realize that an additional page with a 4 byte 
counter is a viable performance optimization, we can work with MS to add 
that to the spec. But the user space interface would stay identical.


>
>> So following that train of thought, if you expose the hardware VMGenID
>> to user space, you could allow user space to act race free based on
>> VMGenID. That means consumers of user space RNGs could validate whether
>> the ID is identical between the beginning of the crypto operation and
>> the end.
> Right.
>
>> However, there are more complicated cases as well. What do you do with
>> Samba for example? It needs to generate a new SID after the clone.
>> That's a super heavy operation. Do you want to have smbd constantly poll
>> on the VMGenID just to see whether it needs to kick off some
>> administrative actions?
> Were it a single word-sized integer, mapped into memory, that wouldn't
> be much of a problem at all. It could constantly read this before and
> after every operation. The problem is that it's 16 bytes and
> understandably applications don't want to deal with that clunkiness.


I don't think applications should be in the business of mapping 
arbitrary locations of the vdso space to match on them. We need to build 
an interface that is fast and flexible enough so they can just say "here 
is a buffer, tell me if the ID changed". That's practically all you need 
- on init you run that once. Later on, you invoke it every time between 
finishing a crypto operation and putting it on the wire.


>
>> In that case, all we would need from the kernel is an easily readable
>> GenID that changes
> Actually, no, you need even less than that. All that's required is a
> sysfs/procfs file that can be poll()'d on. It doesn't need to have any
> content. When poll() returns readable, the VM has been forked. Then
> userspace rngs and other things like that can call getrandom() to
> receive a fresh value to mix into whatever their operation is. Since
> all we're talking about here is _event notification_, all we need is
> that event, which is what poll() provides.
>
>> I'm also not a super big fan of putting all that logic into systemd. It
>> means applications need to create their own notification mechanisms to
>> pass that cloning notification into actual processes. Don't we have any
>> mechanism that applications and libraries could use to natively get an
>> event when the GenID changes?
> Yes. poll() can do this. For the purposes of discussion, I've posted
> an implementation of this idea here:
> https://lore.kernel.org/lkml/20220309215907.77526-1-Jason@zx2c4.com/
>
> What I'm sort of leaning toward is doing something like that patch,
> and then later if vmgenid ever grows an additional word-sized counter,
> moving to explore the race-free approach. Given the amount of
> programming required to actually implement the race-free approach
> (transactions and careful study of each case), the poll() file
> approach might be a medium-grade compromise for the time being.
> Evidently that's what Microsoft decided too.


I agree on the slightly racy compromise and that it's a step into the 
right direction. Doing this is a no brainer IMHO and I like the proc 
based poll approach.

I have an additional problem you might have an idea for with the poll 
based path. In addition to the clone notification, I'd need to know at 
which point everyone who was listening to a clone notification is 
finished acting up it. If I spawn a tiny VM to do "work", I want to know 
when it's safe to hand requests into it. How do I find out when that 
point in time is?

As far as the race-free approach goes, I wouldn't get hung up on 4 byte 
vs 16 byte UUID to match against. Outside of FUD that this might 
potentially have performance impact (4 byte reads will have impact 
too!), there's nothing that would keep us from implementing that 
interface in addition to the poll today.

I'm happy to see all of these things evolve incrementally though. We can 
start with the poll interface and then later implement a vsyscall that 
allows transactions in hot paths.


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ