linux-kernel - Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc124c3d-a316-d36e-3ae4-21674280f55d@gmail.com>
Date:   Thu, 2 Sep 2021 19:21:18 +0800
From:   Tianyu Lan <ltykernel@...il.com>
To:     Christoph Hellwig <hch@....de>,
        Michael Kelley <mikelley@...rosoft.com>
Cc:     KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        Dexuan Cui <decui@...rosoft.com>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "will@...nel.org" <will@...nel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "luto@...nel.org" <luto@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "konrad.wilk@...cle.com" <konrad.wilk@...cle.com>,
        "boris.ostrovsky@...cle.com" <boris.ostrovsky@...cle.com>,
        "jgross@...e.com" <jgross@...e.com>,
        "sstabellini@...nel.org" <sstabellini@...nel.org>,
        "joro@...tes.org" <joro@...tes.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "arnd@...db.de" <arnd@...db.de>,
        "m.szyprowski@...sung.com" <m.szyprowski@...sung.com>,
        "robin.murphy@....com" <robin.murphy@....com>,
        "brijesh.singh@....com" <brijesh.singh@....com>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        "pgonda@...gle.com" <pgonda@...gle.com>,
        "martin.b.radev@...il.com" <martin.b.radev@...il.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "rppt@...nel.org" <rppt@...nel.org>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "aneesh.kumar@...ux.ibm.com" <aneesh.kumar@...ux.ibm.com>,
        "krish.sadhukhan@...cle.com" <krish.sadhukhan@...cle.com>,
        "saravanand@...com" <saravanand@...com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        "rientjes@...gle.com" <rientjes@...gle.com>,
        "ardb@...nel.org" <ardb@...nel.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        vkuznets <vkuznets@...hat.com>,
        "parri.andrea@...il.com" <parri.andrea@...il.com>,
        "dave.hansen@...el.com" <dave.hansen@...el.com>
Subject: Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support



On 9/2/2021 3:59 PM, Christoph Hellwig wrote:
> On Tue, Aug 31, 2021 at 05:16:19PM +0000, Michael Kelley wrote:
>> As a quick overview, I think there are four places where the
>> shared_gpa_boundary must be applied to adjust the guest physical
>> address that is used.  Each requires mapping a corresponding
>> virtual address range.  Here are the four places:
>>
>> 1)  The so-called "monitor pages" that are a core communication
>> mechanism between the guest and Hyper-V.  These are two single
>> pages, and the mapping is handled by calling memremap() for
>> each of the two pages.  See Patch 7 of Tianyu's series.
> 
> Ah, interesting.
> 
>> 3)  The network driver send and receive buffers.  vmap_phys_range()
>> should work here.
> 
> Actually it won't.  The problem with these buffers is that they are
> physically non-contiguous allocations.  We really have two sensible
> options:
> 
>   1) use vmap_pfn as in the current series.  But in that case I think
>      we should get rid of the other mapping created by vmalloc.  I
>      though a bit about finding a way to apply the offset in vmalloc
>      itself, but I think it would be too invasive to the normal fast
>      path.  So the other sub-option would be to allocate the pages
>      manually (maybe even using high order allocations to reduce TLB
>      pressure) and then remap them

Agree. In such case, the map for memory below shared_gpa_boundary is not 
necessary. allocate_pages() is limited by MAX_ORDER and needs to be 
called repeatedly to get enough memory.

>   2) do away with the contiguous kernel mapping entirely.  This means
>      the simple memcpy calls become loops over kmap_local_pfn.  As
>      I just found out for the send side that would be pretty easy,
>      but the receive side would be more work.  We'd also need to check
>      the performance implications.

kmap_local_pfn() requires pfn with backing struct page and this doesn't 
work pfn above shared_gpa_boundary.
> 
>> 4) The swiotlb memory used for bounce buffers.  vmap_phys_range()
>> should work here as well.
> 
> Or memremap if it works for 1.

Now use vmap_pfn() and the hv map function is reused in the netvsc driver.

> 
>> Case #2 above does unusual mapping.  The ring buffer consists of a ring
>> buffer header page, followed by one or more pages that are the actual
>> ring buffer.  The pages making up the actual ring buffer are mapped
>> twice in succession.  For example, if the ring buffer has 4 pages
>> (one header page and three ring buffer pages), the contiguous
>> virtual mapping must cover these seven pages:  0, 1, 2, 3, 1, 2, 3.
>> The duplicate contiguous mapping allows the code that is reading
>> or writing the actual ring buffer to not be concerned about wrap-around
>> because writing off the end of the ring buffer is automatically
>> wrapped-around by the mapping.  The amount of data read or
>> written in one batch never exceeds the size of the ring buffer, and
>> after a batch is read or written, the read or write indices are adjusted
>> to put them back into the range of the first mapping of the actual
>> ring buffer pages.  So there's method to the madness, and the
>> technique works pretty well.  But this kind of mapping is not
>> amenable to using vmap_phys_range().
> 
> Hmm.  Can you point me to where this is mapped?  Especially for the
> classic non-isolated case where no vmap/vmalloc mapping is involved
> at all?
> 

This is done via vmap() in the hv_ringbuffer_init()

182/* Initialize the ring buffer. */
183int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
184                       struct page *pages, u32 page_cnt, u32 
max_pkt_size)
185{
186        int i;
187        struct page **pages_wraparound;
188
189        BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
190
191        /*
192         * First page holds struct hv_ring_buffer, do wraparound 
mapping for
193         * the rest.
194         */
195        pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct 
page *),
196                                   GFP_KERNEL);
197        if (!pages_wraparound)
198                return -ENOMEM;
199
/* prepare to wrap page array */
200        pages_wraparound[0] = pages;
201        for (i = 0; i < 2 * (page_cnt - 1); i++)
202                pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
203
/* map */
204        ring_info->ring_buffer = (struct hv_ring_buffer *)
205                vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, 
PAGE_KERNEL);
206
207        kfree(pages_wraparound);
208
209
210        if (!ring_info->ring_buffer)
211                return -ENOMEM;
212
213        ring_info->ring_buffer->read_index =
214                ring_info->ring_buffer->write_index = 0;