netdev - RE: [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR21MB1593CB6AD9521190CA0AB0DED7CE9@MWHPR21MB1593.namprd21.prod.outlook.com>
Date:   Thu, 2 Sep 2021 02:08:38 +0000
From:   Michael Kelley <mikelley@...rosoft.com>
To:     Tianyu Lan <ltykernel@...il.com>,
        KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        Dexuan Cui <decui@...rosoft.com>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "will@...nel.org" <will@...nel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "bp@...en8.de" <bp@...en8.de>, "x86@...nel.org" <x86@...nel.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "luto@...nel.org" <luto@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "konrad.wilk@...cle.com" <konrad.wilk@...cle.com>,
        "boris.ostrovsky@...cle.com" <boris.ostrovsky@...cle.com>,
        "jgross@...e.com" <jgross@...e.com>,
        "sstabellini@...nel.org" <sstabellini@...nel.org>,
        "joro@...tes.org" <joro@...tes.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>,
        "martin.petersen@...cle.com" <martin.petersen@...cle.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        "arnd@...db.de" <arnd@...db.de>, "hch@....de" <hch@....de>,
        "m.szyprowski@...sung.com" <m.szyprowski@...sung.com>,
        "robin.murphy@....com" <robin.murphy@....com>,
        "brijesh.singh@....com" <brijesh.singh@....com>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        "pgonda@...gle.com" <pgonda@...gle.com>,
        "martin.b.radev@...il.com" <martin.b.radev@...il.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
        "rppt@...nel.org" <rppt@...nel.org>,
        "hannes@...xchg.org" <hannes@...xchg.org>,
        "aneesh.kumar@...ux.ibm.com" <aneesh.kumar@...ux.ibm.com>,
        "krish.sadhukhan@...cle.com" <krish.sadhukhan@...cle.com>,
        "saravanand@...com" <saravanand@...com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
        "rientjes@...gle.com" <rientjes@...gle.com>,
        "ardb@...nel.org" <ardb@...nel.org>
CC:     "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        vkuznets <vkuznets@...hat.com>,
        "parri.andrea@...il.com" <parri.andrea@...il.com>,
        "dave.hansen@...el.com" <dave.hansen@...el.com>
Subject: RE: [PATCH V4 13/13] hv_storvsc: Add Isolation VM support for storvsc
 driver

From: Tianyu Lan <ltykernel@...il.com> Sent: Friday, August 27, 2021 10:21 AM
> 

Per previous comment, the Subject line tag should be "scsi: storvsc: "

> In Isolation VM, all shared memory with host needs to mark visible
> to host via hvcall. vmbus_establish_gpadl() has already done it for
> storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
> mpb_desc() still needs to be handled. Use DMA API(dma_map_sg) to map
> these memory during sending/receiving packet and return swiotlb bounce
> buffer dma address. In Isolation VM, swiotlb  bounce buffer is marked
> to be visible to host and the swiotlb force mode is enabled.
> 
> Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
> keep the original data offset in the bounce buffer.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@...rosoft.com>
> ---
> Change since v3:
> 	* Rplace dma_map_page with dma_map_sg()
> 	* Use for_each_sg() to populate payload->range.pfn_array.
> 	* Remove storvsc_dma_map macro
> ---
>  drivers/hv/vmbus_drv.c     |  1 +
>  drivers/scsi/storvsc_drv.c | 41 +++++++++++++++-----------------------
>  include/linux/hyperv.h     |  1 +
>  3 files changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index f068e22a5636..270d526fd9de 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2124,6 +2124,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
>  	hv_debug_add_dev_dir(child_device_obj);
> 
>  	child_device_obj->device.dma_mask = &vmbus_dma_mask;
> +	child_device_obj->device.dma_parms = &child_device_obj->dma_parms;
>  	return 0;
> 
>  err_kset_unregister:
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 328bb961c281..4f1793be1fdc 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -21,6 +21,8 @@
>  #include <linux/device.h>
>  #include <linux/hyperv.h>
>  #include <linux/blkdev.h>
> +#include <linux/dma-mapping.h>
> +
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
>  #include <scsi/scsi_host.h>
> @@ -1312,6 +1314,9 @@ static void storvsc_on_channel_callback(void *context)
>  					continue;
>  				}
>  				request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd);
> +				if (scsi_sg_count(scmnd))
> +					dma_unmap_sg(&device->device, scsi_sglist(scmnd),
> +						     scsi_sg_count(scmnd), scmnd->sc_data_direction);

Use scsi_dma_unmap(), which does exactly what you have written
above. :-)

>  			}
> 
>  			storvsc_on_receive(stor_device, packet, request);
> @@ -1725,7 +1730,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  	struct hv_host_device *host_dev = shost_priv(host);
>  	struct hv_device *dev = host_dev->dev;
>  	struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
> -	int i;
>  	struct scatterlist *sgl;
>  	unsigned int sg_count;
>  	struct vmscsi_request *vm_srb;
> @@ -1807,10 +1811,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  	payload_sz = sizeof(cmd_request->mpb);
> 
>  	if (sg_count) {
> -		unsigned int hvpgoff, hvpfns_to_add;
>  		unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
>  		unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
> -		u64 hvpfn;
> +		struct scatterlist *sg;
> +		unsigned long hvpfn, hvpfns_to_add;
> +		int j, i = 0;
> 
>  		if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
> 
> @@ -1824,31 +1829,16 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd)
>  		payload->range.len = length;
>  		payload->range.offset = offset_in_hvpg;
> 
> +		if (dma_map_sg(&dev->device, sgl, sg_count,
> +		    scmnd->sc_data_direction) == 0)
> +			return SCSI_MLQUEUE_DEVICE_BUSY;
> 
> -		for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
> -			/*
> -			 * Init values for the current sgl entry. hvpgoff
> -			 * and hvpfns_to_add are in units of Hyper-V size
> -			 * pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
> -			 * case also handles values of sgl->offset that are
> -			 * larger than PAGE_SIZE. Such offsets are handled
> -			 * even on other than the first sgl entry, provided
> -			 * they are a multiple of PAGE_SIZE.
> -			 */

Any reason not to keep this comment?  It's still correct and
mentions important cases that must be handled.

> -			hvpgoff = HVPFN_DOWN(sgl->offset);
> -			hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
> -			hvpfns_to_add =	HVPFN_UP(sgl->offset + sgl->length) -
> -						hvpgoff;
> +		for_each_sg(sgl, sg, sg_count, j) {

There's a subtle issue here in that the number of entries in the
mapped sgl might not be the same as the number of entries prior
to the mapping.  A change in the count probably never happens for
the direct DMA mapping being done here, but let's code to be
correct in the general case.  Either need to refetch the value of
sg_count, or arrange to use something like for_each_sgtable_dma_sg().

> +			hvpfns_to_add = HVPFN_UP(sg_dma_len(sg));

This simplification in calculating hvpnfs_to_add is not correct.  Consider
the case of one sgl entry specifying a buffer of 3 Kbytes that starts at a
2K offset in the first page and runs over into the second page.  This case
can happen when the physical memory for the two pages is contiguous
due to random happenstance, due to huge pages, or due to being on an
architecture like ARM64 where the guest page size may be larger than
the Hyper-V page size.

In this case, we need two Hyper-V PFNs because the buffer crosses a
Hyper-V page boundary.   But the above will calculate only one PFN.
The original algorithm handles this case correctly.

> +			hvpfn = HVPFN_DOWN(sg_dma_address(sg));
> 
> -			/*
> -			 * Fill the next portion of the PFN array with
> -			 * sequential Hyper-V PFNs for the continguous physical
> -			 * memory described by the sgl entry. The end of the
> -			 * last sgl should be reached at the same time that
> -			 * the PFN array is filled.
> -			 */

Any reason not to keep this comment?  It's still correct.

>  			while (hvpfns_to_add--)
> -				payload->range.pfn_array[i++] =	hvpfn++;
> +				payload->range.pfn_array[i++] = hvpfn++;
>  		}
>  	}
> 
> @@ -1992,6 +1982,7 @@ static int storvsc_probe(struct hv_device *device,
>  	stor_device->vmscsi_size_delta = sizeof(struct vmscsi_win8_extension);
>  	spin_lock_init(&stor_device->lock);
>  	hv_set_drvdata(device, stor_device);
> +	dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);
> 
>  	stor_device->port_number = host->host_no;
>  	ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size, is_fc);
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 139a43ad65a1..8f39893f8ccf 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1274,6 +1274,7 @@ struct hv_device {
> 
>  	struct vmbus_channel *channel;
>  	struct kset	     *channels_kset;
> +	struct device_dma_parameters dma_parms;
> 
>  	/* place holder to keep track of the dir for hv device in debugfs */
>  	struct dentry *debug_dir;
> --
> 2.25.1