linux-kernel - Re: propagating vmgenid outward and upward

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHmME9rTMDkE7UA3_wg87mrDVYps+YaHw+dZwF0EbM0zC4pQQw@mail.gmail.com>
Date:   Wed, 9 Mar 2022 15:02:04 -0700
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Alexander Graf <graf@...zon.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        KVM list <kvm@...r.kernel.org>,
        QEMU Developers <qemu-devel@...gnu.org>,
        linux-hyperv@...r.kernel.org,
        Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        adrian@...ity.io, Laszlo Ersek <lersek@...hat.com>,
        Daniel P. Berrangé <berrange@...hat.com>,
        Dominik Brodowski <linux@...inikbrodowski.net>,
        Jann Horn <jannh@...gle.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        "Brown, Len" <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
        Linux PM <linux-pm@...r.kernel.org>,
        Colm MacCarthaigh <colmmacc@...zon.com>,
        "Theodore Ts'o" <tytso@....edu>, Arnd Bergmann <arnd@...db.de>
Subject: Re: propagating vmgenid outward and upward

Hi Alex,

On Wed, Mar 9, 2022 at 3:10 AM Alexander Graf <graf@...zon.com> wrote:
> > The vmgenid driver basically works, though it is racy, because that ACPI
> > notification can arrive after the system is already running again. This
>
>
> I believe enough people already pointed out that this assumption is
> incorrect. The thing that is racy about VMGenID is the interrupt based
> notification.

I'm having a hard time figuring out what's different between your
statement and mine. I said that the race is due to the notification.
You said that the race is due to the notification. What subtle thing
am I missing here that would lead you to say that my assumption is
incorrect? Or did you just misread?

> The actual identifier is updated before the VM resumes
> from its clone operation, so if you match on that you will know whether
> you are in a new or old world. And that is enough to create
> transactions: Save the identifier before a "crypto transaction",
> validate before you finish, if they don't match, abort, reseed and replay.

Right. But more than just transactions, it's useful to preventing key
reuse vulnerabilities, in which case, you store the current identifier
just before an ephemeral key is generated, and then subsequently check
to see that the identifier hasn't changed before transmitting anything
related to that key.

> If you follow the logic at the beginning of the mail, you can create
> something race free if you consume the hardware VMGenID counter. You can
> not make it race free if you rely on the interrupt mechanism.

Yes, as mentioned and discussed in depth before. However, your use of
the word "counter" is problematic. Vmgenid is not a counter. It's a
unique identifier. That means you can't compare it with a single word
comparison but have to compare all of the 16 bytes. That seems
potentially expensive. It's for that reason that I suggested
augmenting the vmgenid spec with an additional word-sized _counter_
that could be mapped into the kernels and into userspaces.

> So following that train of thought, if you expose the hardware VMGenID
> to user space, you could allow user space to act race free based on
> VMGenID. That means consumers of user space RNGs could validate whether
> the ID is identical between the beginning of the crypto operation and
> the end.

Right.

> However, there are more complicated cases as well. What do you do with
> Samba for example? It needs to generate a new SID after the clone.
> That's a super heavy operation. Do you want to have smbd constantly poll
> on the VMGenID just to see whether it needs to kick off some
> administrative actions?

Were it a single word-sized integer, mapped into memory, that wouldn't
be much of a problem at all. It could constantly read this before and
after every operation. The problem is that it's 16 bytes and
understandably applications don't want to deal with that clunkiness.

> In that case, all we would need from the kernel is an easily readable
> GenID that changes

Actually, no, you need even less than that. All that's required is a
sysfs/procfs file that can be poll()'d on. It doesn't need to have any
content. When poll() returns readable, the VM has been forked. Then
userspace rngs and other things like that can call getrandom() to
receive a fresh value to mix into whatever their operation is. Since
all we're talking about here is _event notification_, all we need is
that event, which is what poll() provides.

> I'm also not a super big fan of putting all that logic into systemd. It
> means applications need to create their own notification mechanisms to
> pass that cloning notification into actual processes. Don't we have any
> mechanism that applications and libraries could use to natively get an
> event when the GenID changes?

Yes. poll() can do this. For the purposes of discussion, I've posted
an implementation of this idea here:
https://lore.kernel.org/lkml/20220309215907.77526-1-Jason@zx2c4.com/

What I'm sort of leaning toward is doing something like that patch,
and then later if vmgenid ever grows an additional word-sized counter,
moving to explore the race-free approach. Given the amount of
programming required to actually implement the race-free approach
(transactions and careful study of each case), the poll() file
approach might be a medium-grade compromise for the time being.
Evidently that's what Microsoft decided too.

Jason