lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 24 Nov 2022 10:51:25 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Dexuan Cui <decui@...rosoft.com>
Cc:     "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "arnd@...db.de" <arnd@...db.de>, "bp@...en8.de" <bp@...en8.de>,
        "brijesh.singh@....com" <brijesh.singh@....com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "hpa@...or.com" <hpa@...or.com>,
        "jane.chu@...cle.com" <jane.chu@...cle.com>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        KY Srinivasan <kys@...rosoft.com>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "luto@...nel.org" <luto@...nel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "sathyanarayanan.kuppuswamy@...ux.intel.com" 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        "seanjc@...gle.com" <seanjc@...gle.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/6] x86/tdx: Support vmalloc() for
 tdx_enc_status_changed()

On Wed, Nov 23, 2022 at 11:51:11PM +0000, Dexuan Cui wrote:
> > From: Kirill A. Shutemov <kirill@...temov.name>
> > Sent: Monday, November 21, 2022 4:24 PM
> > 
> > On Mon, Nov 21, 2022 at 11:51:48AM -0800, Dexuan Cui wrote:
> > > When a TDX guest runs on Hyper-V, the hv_netvsc driver's netvsc_init_buf()
> > > allocates buffers using vzalloc(), and needs to share the buffers with the
> > > host OS by calling set_memory_decrypted(), which is not working for
> > > vmalloc() yet. Add the support by handling the pages one by one.
> > 
> > Why do you use vmalloc here in the first place?
> 
> We changed to vmalloc() long ago, mainly for 2 reasons:
> 
> 1) __alloc_pages() only allows us to allocate up to 4MB of contiguous pages, but
> we need a 16MB buffer in the Hyper-V vNIC driver for better performance.
> 
> 2) Due to memory fragmentation, we have seen that the page allocator can fail
> to allocate 2 contigous pages, though the system has a lot of free memory. We
> need to support Hyper-V vNIC hot addition, so we changed to vmalloc. See
> 
> b679ef73edc2 ("hyperv: Add support for physically discontinuous receive buffer")
> 06b47aac4924 ("Drivers: net-next: hyperv: Increase the size of the sendbuf region")
> 
> > Will you also adjust direct mapping to have shared bit set?
> > 
> > If not, we will have problems with load_unaligned_zeropad() when it will
> > access shared pages via non-shared mapping.
> > 
> > If direct mapping is adjusted, we will get direct mapping fragmentation.
> 
> load_unaligned_zeropad() was added 10 years ago by Linus in
> e419b4cc5856 ("vfs: make word-at-a-time accesses handle a non-existing page") 
> so this seemingly-strange usage is legitimate.
> 
> Sorry I don't know how to adjust direct mapping. Do you mean I should do
> something like the below in tdx_enc_status_changed_for_vmalloc() for
> every 'start_va':
>   pa = slow_virt_to_phys(start_va);
>   set_memory_decrypted(phys_to_virt(pa), 1);
> ?
> 
> But IIRC this issue is not specific to vmalloc()? e.g. we get 1 page by
> __get_free_pages(GFP_KERNEL, 0) or kmalloc(PAGE_SIZE, GFP_KERNEL)
> and we call set_memory_decrypted() for the page. How can we make
> sure the callers of load_unaligned_zeropad() can't access the page
> via non-shared mapping?

__get_free_pages() and kmalloc() returns pointer to the page in the direct
mapping. set_memory_decrypted() adjust direct mapping to have the shared
bit set. Everything is fine.

> It looks like you have a patchset to address the issue (it looks like it
> hasn't been merged into the mainline?) ?
> https://lwn.net/ml/linux-kernel/20220614120231.48165-11-kirill.shutemov@linux.intel.com/

It addresses similar, but different issue. It is only relevant for
unaccepted memory support.

> BTW, I'll drop tdx_enc_status_changed_for_vmalloc() and try to enhance the
> existing tdx_enc_status() to support both direct mapping and vmalloc().
> 
> > Maybe tap into swiotlb buffer using DMA API?
> 
> I doubt the Hyper-V vNIC driver here can call dma_alloc_coherent() to
> get a 16MB buffer from swiotlb buffers. I'm looking at dma_alloc_coherent() ->
> dma_alloc_attrs() -> dma_direct_alloc(), which typically calls 
> __dma_direct_alloc_pages() to allocate congituous memory pages (which
> can't exceed the 4MB limit. Note there is no virtual IOMMU in a guest on Hyper-V).
> 
> It looks like we can't force dma_direct_alloc() to call dma_direct_alloc_no_mapping(),
> because even if we set the DMA_ATTR_NO_KERNEL_MAPPING flag,
> force_dma_unencrypted() is still always true for a TDX guest.

The point is not in reaching dma_direct_alloc_no_mapping(). The idea is
allocate from existing swiotlb that already has shared bit set in direct
mapping.

vmap area that maps pages allocated from swiotlb also should work fine.

To be honest, I don't understand DMA API well enough. I need to experiment
with it to see what works for the case.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ