[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc124c3d-a316-d36e-3ae4-21674280f55d@gmail.com>
Date: Thu, 2 Sep 2021 19:21:18 +0800
From: Tianyu Lan <ltykernel@...il.com>
To: Christoph Hellwig <hch@....de>,
Michael Kelley <mikelley@...rosoft.com>
Cc: KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"will@...nel.org" <will@...nel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>,
"hpa@...or.com" <hpa@...or.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"luto@...nel.org" <luto@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"konrad.wilk@...cle.com" <konrad.wilk@...cle.com>,
"boris.ostrovsky@...cle.com" <boris.ostrovsky@...cle.com>,
"jgross@...e.com" <jgross@...e.com>,
"sstabellini@...nel.org" <sstabellini@...nel.org>,
"joro@...tes.org" <joro@...tes.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>,
"jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
"martin.petersen@...cle.com" <martin.petersen@...cle.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"arnd@...db.de" <arnd@...db.de>,
"m.szyprowski@...sung.com" <m.szyprowski@...sung.com>,
"robin.murphy@....com" <robin.murphy@....com>,
"brijesh.singh@....com" <brijesh.singh@....com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>,
Tianyu Lan <Tianyu.Lan@...rosoft.com>,
"pgonda@...gle.com" <pgonda@...gle.com>,
"martin.b.radev@...il.com" <martin.b.radev@...il.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"rppt@...nel.org" <rppt@...nel.org>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"aneesh.kumar@...ux.ibm.com" <aneesh.kumar@...ux.ibm.com>,
"krish.sadhukhan@...cle.com" <krish.sadhukhan@...cle.com>,
"saravanand@...com" <saravanand@...com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"rientjes@...gle.com" <rientjes@...gle.com>,
"ardb@...nel.org" <ardb@...nel.org>,
"iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
vkuznets <vkuznets@...hat.com>,
"parri.andrea@...il.com" <parri.andrea@...il.com>,
"dave.hansen@...el.com" <dave.hansen@...el.com>
Subject: Re: [PATCH V4 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support
On 9/2/2021 3:59 PM, Christoph Hellwig wrote:
> On Tue, Aug 31, 2021 at 05:16:19PM +0000, Michael Kelley wrote:
>> As a quick overview, I think there are four places where the
>> shared_gpa_boundary must be applied to adjust the guest physical
>> address that is used. Each requires mapping a corresponding
>> virtual address range. Here are the four places:
>>
>> 1) The so-called "monitor pages" that are a core communication
>> mechanism between the guest and Hyper-V. These are two single
>> pages, and the mapping is handled by calling memremap() for
>> each of the two pages. See Patch 7 of Tianyu's series.
>
> Ah, interesting.
>
>> 3) The network driver send and receive buffers. vmap_phys_range()
>> should work here.
>
> Actually it won't. The problem with these buffers is that they are
> physically non-contiguous allocations. We really have two sensible
> options:
>
> 1) use vmap_pfn as in the current series. But in that case I think
> we should get rid of the other mapping created by vmalloc. I
> though a bit about finding a way to apply the offset in vmalloc
> itself, but I think it would be too invasive to the normal fast
> path. So the other sub-option would be to allocate the pages
> manually (maybe even using high order allocations to reduce TLB
> pressure) and then remap them
Agree. In such case, the map for memory below shared_gpa_boundary is not
necessary. allocate_pages() is limited by MAX_ORDER and needs to be
called repeatedly to get enough memory.
> 2) do away with the contiguous kernel mapping entirely. This means
> the simple memcpy calls become loops over kmap_local_pfn. As
> I just found out for the send side that would be pretty easy,
> but the receive side would be more work. We'd also need to check
> the performance implications.
kmap_local_pfn() requires pfn with backing struct page and this doesn't
work pfn above shared_gpa_boundary.
>
>> 4) The swiotlb memory used for bounce buffers. vmap_phys_range()
>> should work here as well.
>
> Or memremap if it works for 1.
Now use vmap_pfn() and the hv map function is reused in the netvsc driver.
>
>> Case #2 above does unusual mapping. The ring buffer consists of a ring
>> buffer header page, followed by one or more pages that are the actual
>> ring buffer. The pages making up the actual ring buffer are mapped
>> twice in succession. For example, if the ring buffer has 4 pages
>> (one header page and three ring buffer pages), the contiguous
>> virtual mapping must cover these seven pages: 0, 1, 2, 3, 1, 2, 3.
>> The duplicate contiguous mapping allows the code that is reading
>> or writing the actual ring buffer to not be concerned about wrap-around
>> because writing off the end of the ring buffer is automatically
>> wrapped-around by the mapping. The amount of data read or
>> written in one batch never exceeds the size of the ring buffer, and
>> after a batch is read or written, the read or write indices are adjusted
>> to put them back into the range of the first mapping of the actual
>> ring buffer pages. So there's method to the madness, and the
>> technique works pretty well. But this kind of mapping is not
>> amenable to using vmap_phys_range().
>
> Hmm. Can you point me to where this is mapped? Especially for the
> classic non-isolated case where no vmap/vmalloc mapping is involved
> at all?
>
This is done via vmap() in the hv_ringbuffer_init()
182/* Initialize the ring buffer. */
183int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
184 struct page *pages, u32 page_cnt, u32
max_pkt_size)
185{
186 int i;
187 struct page **pages_wraparound;
188
189 BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
190
191 /*
192 * First page holds struct hv_ring_buffer, do wraparound
mapping for
193 * the rest.
194 */
195 pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct
page *),
196 GFP_KERNEL);
197 if (!pages_wraparound)
198 return -ENOMEM;
199
/* prepare to wrap page array */
200 pages_wraparound[0] = pages;
201 for (i = 0; i < 2 * (page_cnt - 1); i++)
202 pages_wraparound[i + 1] = &pages[i % (page_cnt - 1) + 1];
203
/* map */
204 ring_info->ring_buffer = (struct hv_ring_buffer *)
205 vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP,
PAGE_KERNEL);
206
207 kfree(pages_wraparound);
208
209
210 if (!ring_info->ring_buffer)
211 return -ENOMEM;
212
213 ring_info->ring_buffer->read_index =
214 ring_info->ring_buffer->write_index = 0;
Powered by blists - more mailing lists