lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 29 Apr 2022 10:48:06 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Kai Huang <kai.huang@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        KVM list <kvm@...r.kernel.org>,
        Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        "Brown, Len" <len.brown@...el.com>,
        "Luck, Tony" <tony.luck@...el.com>,
        Rafael J Wysocki <rafael.j.wysocki@...el.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andi Kleen <ak@...ux.intel.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Isaku Yamahata <isaku.yamahata@...el.com>
Subject: Re: [PATCH v3 00/21] TDX host kernel support

On Fri, Apr 29, 2022 at 10:18 AM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 4/29/22 08:18, Dan Williams wrote:
> > Yes, I want to challenge the idea that all core-mm memory must be TDX
> > capable. Instead, this feels more like something that wants a
> > hugetlbfs / dax-device like capability to ask the kernel to gather /
> > set-aside the enumerated TDX memory out of all the general purpose
> > memory it knows about and then VMs use that ABI to get access to
> > convertible memory. Trying to ensure that all page allocator memory is
> > TDX capable feels too restrictive with all the different ways pfns can
> > get into the allocator.
>
> The KVM users are the problem here.  They use a variety of ABIs to get
> memory and then hand it to KVM.  KVM basically just consumes the
> physical addresses from the page tables.
>
> Also, there's no _practical_ problem here today.  I can't actually think
> of a case where any memory that ends up in the allocator on today's TDX
> systems is not TDX capable.
>
> Tomorrow's systems are going to be the problem.  They'll (presumably)
> have a mix of CXL devices that will have varying capabilities.  Some
> will surely lack the metadata storage for checksums and TD-owner bits.
> TDX use will be *safe* on those systems: if you take this code and run
> it on one tomorrow's systems, it will notice the TDX-incompatible memory
> and will disable TDX.
>
> The only way around this that I can see is to introduce ABI today that
> anticipates the needs of the future systems.  We could require that all
> the KVM memory be "validated" before handing it to TDX.  Maybe a new
> syscall that says: "make sure this mapping works for TDX".  It could be
> new sysfs ABI which specifies which NUMA nodes contain TDX-capable memory.

Yes, node-id seems the only reasonable handle that can be used, and it
does not seem too onerous for a KVM user to have to set a node policy
preferring all the TDX / confidential-computing capable nodes.

> But, neither of those really help with, say, a device-DAX mapping of
> TDX-*IN*capable memory handed to KVM.  The "new syscall" would just
> throw up its hands and leave users with the same result: TDX can't be
> used.  The new sysfs ABI for NUMA nodes wouldn't clearly apply to
> device-DAX because they don't respect the NUMA policy ABI.

They do have "target_node" attributes to associate node specific
metadata, and could certainly express target_node capabilities in its
own ABI. Then it's just a matter of making pfn_to_nid() do the right
thing so KVM kernel side can validate the capabilities of all inbound
pfns.

> I'm open to ideas here.  If there's a viable ABI we can introduce to
> train TDX users today that will work tomorrow too, I'm all for it.

In general, expressing NUMA node perf and node capabilities is
something Linux needs to get better at. HMAT data for example still
exists as sideband information ignored by numactl, but it feels
inevitable that perf and capability details become more of a first
class citizen for applications that have these mem-allocation-policy
constraints in the presence of disparate memory types.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ