linux-kernel - RE: [PATCH 3/6] x86/tdx: Support vmalloc() for tdx_enc_status

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SA1PR21MB133536EA0C26DFE0168E2F98BF0C9@SA1PR21MB1335.namprd21.prod.outlook.com>
Date:   Wed, 23 Nov 2022 23:51:11 +0000
From:   Dexuan Cui <decui@...rosoft.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
CC:     "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "arnd@...db.de" <arnd@...db.de>, "bp@...en8.de" <bp@...en8.de>,
        "brijesh.singh@....com" <brijesh.singh@....com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "jane.chu@...cle.com" <jane.chu@...cle.com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        KY Srinivasan <kys@...rosoft.com>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "luto@...nel.org" <luto@...nel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "sathyanarayanan.kuppuswamy@...ux.intel.com" 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "seanjc@...gle.com" <seanjc@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 3/6] x86/tdx: Support vmalloc() for
 tdx_enc_status_changed()

> From: Kirill A. Shutemov <kirill@...temov.name>
> Sent: Monday, November 21, 2022 4:24 PM
> 
> On Mon, Nov 21, 2022 at 11:51:48AM -0800, Dexuan Cui wrote:
> > When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
> > allocates buffers using vzalloc(), and needs to share the buffers with the
> > host OS by calling set_memory_decrypted(), which is not working for
> > vmalloc() yet. Add the support by handling the pages one by one.
> 
> Why do you use vmalloc here in the first place?

We changed to vmalloc() long ago, mainly for 2 reasons:

1) __alloc_pages() only allows us to allocate up to 4MB of contiguous pages, but
we need a 16MB buffer in the Hyper-V vNIC driver for better performance.

2) Due to memory fragmentation, we have seen that the page allocator can fail
to allocate 2 contigous pages, though the system has a lot of free memory. We
need to support Hyper-V vNIC hot addition, so we changed to vmalloc. See

b679ef73edc2 ("hyperv: Add support for physically discontinuous receive buffer")
06b47aac4924 ("Drivers: net-next: hyperv: Increase the size of the sendbuf region")

> Will you also adjust direct mapping to have shared bit set?
> 
> If not, we will have problems with load_unaligned_zeropad() when it will
> access shared pages via non-shared mapping.
> 
> If direct mapping is adjusted, we will get direct mapping fragmentation.

load_unaligned_zeropad() was added 10 years ago by Linus in
e419b4cc5856 ("vfs: make word-at-a-time accesses handle a non-existing page") 
so this seemingly-strange usage is legitimate.

Sorry I don't know how to adjust direct mapping. Do you mean I should do
something like the below in tdx_enc_status_changed_for_vmalloc() for
every 'start_va':
  pa = slow_virt_to_phys(start_va);
  set_memory_decrypted(phys_to_virt(pa), 1);
?

But IIRC this issue is not specific to vmalloc()? e.g. we get 1 page by
__get_free_pages(GFP_KERNEL, 0) or kmalloc(PAGE_SIZE, GFP_KERNEL)
and we call set_memory_decrypted() for the page. How can we make
sure the callers of load_unaligned_zeropad() can't access the page
via non-shared mapping?

It looks like you have a patchset to address the issue (it looks like it
hasn't been merged into the mainline?) ?
https://lwn.net/ml/linux-kernel/20220614120231.48165-11-kirill.shutemov@linux.intel.com/

BTW, I'll drop tdx_enc_status_changed_for_vmalloc() and try to enhance the
existing tdx_enc_status() to support both direct mapping and vmalloc().

> Maybe tap into swiotlb buffer using DMA API?

I doubt the Hyper-V vNIC driver here can call dma_alloc_coherent() to
get a 16MB buffer from swiotlb buffers. I'm looking at dma_alloc_coherent() ->
dma_alloc_attrs() -> dma_direct_alloc(), which typically calls 
__dma_direct_alloc_pages() to allocate congituous memory pages (which
can't exceed the 4MB limit. Note there is no virtual IOMMU in a guest on Hyper-V).

It looks like we can't force dma_direct_alloc() to call dma_direct_alloc_no_mapping(),
because even if we set the DMA_ATTR_NO_KERNEL_MAPPING flag,
force_dma_unencrypted() is still always true for a TDX guest.