netdev - RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CY4PR21MB158655D98947D66C1C8B80B3D7C19@CY4PR21MB1586.namprd21.prod.outlook.com>
Date:   Fri, 20 Aug 2021 19:22:14 +0000
From:   Michael Kelley <mikelley@...rosoft.com>
To:     Tianyu Lan <ltykernel@...il.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        Dexuan Cui <decui@...rosoft.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "luto@...nel.org" <luto@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "konrad.wilk@...cle.com" <konrad.wilk@...cle.com>,
        "boris.ostrovsky@...cle.com" <boris.ostrovsky@...cle.com>,
        "jgross@...e.com" <jgross@...e.com>,
        "sstabellini@...nel.org" <sstabellini@...nel.org>,
        "joro@...tes.org" <joro@...tes.org>,
        "will@...nel.org" <will@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "arnd@...db.de" <arnd@...db.de>, "hch@....de" <hch@....de>,
        "m.szyprowski@...sung.com" <m.szyprowski@...sung.com>,
        "robin.murphy@....com" <robin.murphy@....com>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        "brijesh.singh@....com" <brijesh.singh@....com>,
        "ardb@...nel.org" <ardb@...nel.org>,
        Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        "pgonda@...gle.com" <pgonda@...gle.com>,
        "martin.b.radev@...il.com" <martin.b.radev@...il.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "rppt@...nel.org" <rppt@...nel.org>,
        "sfr@...b.auug.org.au" <sfr@...b.auug.org.au>,
        "saravanand@...com" <saravanand@...com>,
        "krish.sadhukhan@...cle.com" <krish.sadhukhan@...cle.com>,
        "aneesh.kumar@...ux.ibm.com" <aneesh.kumar@...ux.ibm.com>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        "rientjes@...gle.com" <rientjes@...gle.com>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "tj@...nel.org" <tj@...nel.org>
CC:     "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        vkuznets <vkuznets@...hat.com>,
        "parri.andrea@...il.com" <parri.andrea@...il.com>,
        "dave.hansen@...el.com" <dave.hansen@...el.com>
Subject: RE: [PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc
 driver

From: Tianyu Lan <ltykernel@...il.com> Sent: Friday, August 20, 2021 11:04 AM
> 
> On 8/21/2021 12:08 AM, Michael Kelley wrote:
> >>>>    	}
> >>> The whole approach here is to do dma remapping on each individual page
> >>> of the I/O buffer.  But wouldn't it be possible to use dma_map_sg() to map
> >>> each scatterlist entry as a unit?  Each scatterlist entry describes a range of
> >>> physically contiguous memory.  After dma_map_sg(), the resulting dma
> >>> address must also refer to a physically contiguous range in the swiotlb
> >>> bounce buffer memory.   So at the top of the "for" loop over the scatterlist
> >>> entries, do dma_map_sg() if we're in an isolated VM.  Then compute the
> >>> hvpfn value based on the dma address instead of sg_page().  But everything
> >>> else is the same, and the inner loop for populating the pfn_arry is unmodified.
> >>> Furthermore, the dma_range array that you've added is not needed, since
> >>> scatterlist entries already have a dma_address field for saving the mapped
> >>> address, and dma_unmap_sg() uses that field.
> >> I don't use dma_map_sg() here in order to avoid introducing one more
> >> loop(e,g dma_map_sg()). We already have a loop to populate
> >> cmd_request->dma_range[] and so do the dma map in the same loop.
> >>
> > I'm not seeing where the additional loop comes from.  Storvsc
> > already has a loop through the sgl entries.  Retain that loop and call
> > dma_map_sg() with nents set to 1.  Then the sequence is
> > dma_map_sg() --> dma_map_sg_attrs() --> dma_direct_map_sg() ->
> > dma_direct_map_page().  The latter function will call swiotlb_map()
> > to map all pages of the sgl entry as a single operation.
> 
> After dma_map_sg(), we still need to go through scatter list again to
> populate payload->rrange.pfn_array. We may just go through the scatter
> list just once if dma_map_sg() accepts a callback and run it during go
> through scatter list.

Here's some code for what I'm suggesting (not even compile tested).
The only change is what's in the "if" clause of the SNP test.  dma_map_sg()
is called with the nents parameter set to one so that it only
processes one sgl entry each time it is called, and doesn't walk the
entire sgl.  Arguably, we don't even need the SNP test and the else
clause -- just always do what's in the if clause.

The corresponding code in storvsc_on_channel_callback would also
have to be changed.   And we still have to set the min_align_mask
so swiotlb will preserve any offset.  Storsvsc already has things set up
so that higher levels ensure there are no holes between sgl entries,
and that needs to stay true.

	if (sg_count) {
		unsigned int hvpgoff, hvpfns_to_add;
		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
		u64 hvpfn;
		int nents;

		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {

			payload_sz = (hvpg_count * sizeof(u64) +
				      sizeof(struct vmbus_packet_mpb_array));
			payload = kzalloc(payload_sz, GFP_ATOMIC);
			if (!payload)
				return SCSI_MLQUEUE_DEVICE_BUSY;
		}

		payload->range.len = length;
		payload->range.offset = offset_in_hvpg;


		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
			/*
			 * Init values for the current sgl entry. hvpgoff
			 * and hvpfns_to_add are in units of Hyper-V size
			 * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
			 * case also handles values of sgl->offset that are
			 * larger than PAGE_SIZE. Such offsets are handled
			 * even on other than the first sgl entry, provided
			 * they are a multiple of PAGE_SIZE.
			 */
			hvpgoff = HVPFN_DOWN(sgl->offset);

			if (hv_isolation_type_snp()) {
				nents = dma_map_sg(dev->device, sgl, 1, scmnd->sc_data_direction);
				if (nents != 1)
					<return error code so higher levels will retry>
				hvpfn = HVPFN_DOWN(sg_dma_address(sgl)) + hvpgoff;
			} else {
				hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
			}

			hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
						hvpgoff;

			/*
			 * Fill the next portion of the PFN array with
			 * sequential Hyper-V PFNs for the contiguous physical
			 * memory described by the sgl entry. The end of the
			 * last sgl should be reached at the same time that
			 * the PFN array is filled.
			 */
			while (hvpfns_to_add--)
				payload->range.pfn_array[i++] = hvpfn++;
		}
	}