linux-kernel - Re: propagating vmgenid outward and upward

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220302074503-mutt-send-email-mst@kernel.org>
Date:   Wed, 2 Mar 2022 07:58:33 -0500
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Laszlo Ersek <lersek@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        KVM list <kvm@...r.kernel.org>,
        QEMU Developers <qemu-devel@...gnu.org>,
        linux-hyperv@...r.kernel.org,
        Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
        Alexander Graf <graf@...zon.com>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        adrian@...ity.io,
        Daniel P. Berrangé <berrange@...hat.com>,
        Dominik Brodowski <linux@...inikbrodowski.net>,
        Jann Horn <jannh@...gle.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        "Brown, Len" <len.brown@...el.com>, Pavel Machek <pavel@....cz>,
        Linux PM <linux-pm@...r.kernel.org>,
        Colm MacCarthaigh <colmmacc@...zon.com>,
        Theodore Ts'o <tytso@....edu>, Arnd Bergmann <arnd@...db.de>
Subject: Re: propagating vmgenid outward and upward

On Wed, Mar 02, 2022 at 12:26:27PM +0100, Jason A. Donenfeld wrote:
> Hey Michael,
> 
> Thanks for the benchmark.
> 
> On Wed, Mar 2, 2022 at 9:30 AM Michael S. Tsirkin <mst@...hat.com> wrote:
> > So yes, the overhead is higher by 50% which seems a lot but it's from a
> > very small number, so I don't see why it's a show stopper, it's not by a
> > factor of 10 such that we should sacrifice safety by default. Maybe a
> > kernel flag that removes the read replacing it with an interrupt will
> > do.
> >
> > In other words, premature optimization is the root of all evil.
> 
> Unfortunately I don't think it's as simple as that for several reasons.
> 
> First, I'm pretty confident a beefy Intel machine can mostly hide
> non-dependent comparisons in the memory access and have the problem
> mostly go away. But this is much less the case on, say, an in-order
> MIPS32r2, which isn't just "some crappy ISA I'm using for the sake of
> argument," but actually the platform on which a lot of networking and
> WireGuard stuff runs, so I do care about it. There, we have 4
> reads/comparisons which can't pipeline nearly as well.

Sure. Want to try running some benchmarks on that platform?
Presumably you have access to such a box, right?


> There's also the atomicity aspect, which I think makes your benchmark
> not quite accurate. Those 16 bytes could change between the first and
> second word (or between the Nth and N+1th word for N<=3 on 32-bit).
> What if in that case the word you read second doesn't change, but the
> word you read first did? So then you find yourself having to do a
> hi-lo-hi dance.
> And then consider the 32-bit case, where that's even
> more annoying. This is just one of those things that comes up when you
> compare the semantics of a "large unique ID" and "word-sized counter",
> as general topics. (My suggestion is that vmgenid provide both.)

I don't see how this matters for any applications at all. Feel free to
present a case that would be race free with a word but not a 16
byte value, I could not imagine one. It's human to err of course.

>
> Finally, there's a slightly storage aspect, where adding 16 bytes to a
> per-key struct is a little bit heavier than adding 4 bytes and might
> bust a cache line without sufficient care, care which always has some
> cost in one way or another.
> 
> So I just don't know if it's realistic to impose a 16-byte per-packet
> comparison all the time like that. I'm familiar with WireGuard
> obviously, but there's also cifs and maybe even wifi and bluetooth,
> and who knows what else, to care about too. Then there's the userspace
> discussion. I can't imagine a 16-byte hotpath comparison being
> accepted as implementable.

I think this hinges on benchmarking results. Want to start with
my silly benchmark at least? If you can't measure an order of
magnitude gain then I think any effect on wireguard will be in the
noise.


> > And I feel if linux
> > DTRT and reads the 16 bytes then hypervisor vendors will be motivated to
> > improve and add a 4 byte unique one. As long as linux is interrupt
> > driven there's no motivation for change.
> 
> I reeeeeally don't want to get pulled into the politics of this on the
> hypervisor side. I assume an improved thing would begin with QEMU and
> Firecracker or something collaborating because they're both open
> source and Amazon people seem interested.

I think it would begin with a benchmark showing there's even any
measureable performance to be gained by switching the semantics.

> And then pressure builds for
> Microsoft and VMware to do it on their side. And then we get this all
> nicely implemented in the kernel. In the meantime, though, I'm not
> going to refuse to address the problem entirely just because the
> virtual hardware is less than perfect; I'd rather make the most with
> what we've got while still being somewhat reasonable from an
> implementation perspective.
> 
> Jason

Right but given you are trading security off for performance, it matters
a lot what the performance gain is.

-- 
MST