linux-kernel - Re: [RFC v2-fix 1/1] x86/tdx: Handle in-kernel MMIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3a037a43-435b-fc28-63d0-48e543cddfdd@linux.intel.com>
Date:   Tue, 18 May 2021 13:28:04 -0700
From:   Andi Kleen <ak@...ux.intel.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Dave Hansen <dave.hansen@...el.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Tony Luck <tony.luck@...el.com>,
        Kirill Shutemov <kirill.shutemov@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan <knsathya@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Raj Ashok <ashok.raj@...el.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC v2-fix 1/1] x86/tdx: Handle in-kernel MMIO


On 5/18/2021 11:22 AM, Sean Christopherson wrote:
> On Tue, May 18, 2021, Andi Kleen wrote:
>>> The extra bytes for .altinstructions is very different than the extra bytes for
>>> the code itself.  The .altinstructions section is freed after init, so yes it
>>> bloats the kernel size a bit, but the runtime footprint is unaffected by the
>>> patching metadata.
>>>
>>> IIRC, patching read/write{b,w,l,q}() can be done with 3 bytes of .text overhead.
>>>
>>> The other option to explore is to hook/patch IO_COND(), which can be done with
>>> neglible overhead because the helpers that use IO_COND() are not inlined.  In a
>>> TDX guest, redirecting IO_COND() to a paravirt helper would likely cover the
>>> majority of IO/MMIO since virtio-pci exclusively uses the IO_COND() wrappers.
>>> And if there are TDX VMMs that want to deploy virtio-mmio, hooking
>>> drivers/virtio/virtio_mmio.c directly would be a viable option.
>> Yes but what's the point of all that?
> Patching IO_COND() is relatively low effort.  With some clever refactoring, I
> suspect the net lines of code added would be less than 10.  That seems like a
> worthwhile effort to avoid millions of faults over the lifetime of the guest.

AFAIK IO_COND is only for iomap users. But most drivers don't even use 
iomap. virtio doesn't for example, and that's really the only case we 
currently care about.

Also millions of faults is nothing for a CPU.

The only case I can see it making sense is the virtio (and vmbus) door 
bells. Everything else should be slow path anyways.

But doing that now would be premature optimization and that's usually a 
bad idea. If it's a problem we can fix it later.


>
>> Even if it's only 3 bytes we still have a lot of MMIO all over the kernel
>> which never needs it.
>>
>> And I don't even see what TDX (or SEV which already does the decoding and
>> has been merged) would get out of it. We handle all the #VEs just fine. And
>> the instruction handling code is fairly straight forward too.
>>
>> Besides instruction decoding works fine for all the existing hypervisors.
>> All we really want to do is to do the same thing as KVM would do.
> Heh, trust me, you don't want to do the same thing KVM does :-)

We want the same behavior.

Yes probably not the same code.


-Andi