lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <365a1c5b-53fe-cd2c-11a1-9678dde0c5@outlook.office365.com>
Date:   Fri, 27 Aug 2021 09:36:36 +0800 (CST)
From:   Colin Xu <colin.xu@...el.com>
To:     Colin Xu <colin.xu@...el.com>
cc:     Alex Williamson <alex.williamson@...hat.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, zhenyuw@...ux.intel.com,
        hang.yuan@...ux.intel.com, swee.yee.fonn@...el.com,
        fred.gao@...el.com
Subject: Re: [PATCH] vfio/pci: Add OpRegion 2.0 Extended VBT support.

Hi Alex,

In addition to the background that devices on market may still need 
OpRegion 2.0 support in vfio-pci, do you have other comments to the patch 
body?

On Tue, 17 Aug 2021, Colin Xu wrote:

> On Mon, 16 Aug 2021, Alex Williamson wrote:
>
>>  On Fri, 13 Aug 2021 10:13:29 +0800
>>  Colin Xu <colin.xu@...el.com> wrote:
>>
>>>  Due to historical reason, some legacy shipped system doesn't follow
>>>  OpRegion 2.1 spec but still stick to OpRegion 2.0, in which the extended
>>>  VBT is not contigious after OpRegion in physical address, but any
>>>  location pointed by RVDA via absolute address. Thus it's impossible
>>>  to map a contigious range to hold both OpRegion and extended VBT as 2.1.
>>>
>>>  Since the only difference between OpRegion 2.0 and 2.1 is where extended
>>>  VBT is stored: For 2.0, RVDA is the absolute address of extended VBT
>>>  while for 2.1, RVDA is the relative address of extended VBT to OpRegion
>>>  baes, and there is no other difference between OpRegion 2.0 and 2.1,
>>>  it's feasible to amend OpRegion support for these legacy system (before
>>>  upgrading the system firmware), by kazlloc a range to shadown OpRegion
>>>  from the beginning and stitch VBT after closely, patch the shadow
>>>  OpRegion version from 2.0 to 2.1, and patch the shadow RVDA to relative
>>>  address. So that from the vfio igd OpRegion r/w ops view, only OpRegion
>>>  2.1 is exposed regardless the underneath host OpRegion is 2.0 or 2.1
>>>  if the extended VBT exists. vfio igd OpRegion r/w ops will return either
>>>  shadowed data (OpRegion 2.0) or directly from physical address
>>>  (OpRegion 2.1+) based on host OpRegion version and RVDA/RVDS. The shadow
>>>  mechanism makes it possible to support legacy systems on the market.
>>
>>  Which systems does this enable?  There's a suggestion above that these
>>  systems could update firmware to get OpRegion v2.1 support, why
>>  shouldn't we ask users to do that instead?  When we added OpRegion v2.1
>>  support we were told that v2.0 support was essentially non-existent,
>>  why should we add code to support and old spec with few users for such
>>  a niche use case?
> Hi Alex, there was some mis-alignment with the BIOS owner that we were told 
> the 2.0 system doesn't for retail but only for internal development. However 
> in other projects we DO see the retail market has such systems, including NUC 
> NUC6CAYB, some APL industrial PC used in RT system, and some customized APL 
> motherboard by commercial virtualization solution. We immediately contact the 
> BIOS owner to ask for a clarification and they admit it. These system won't 
> get updated BIOS for OpRegion update but still under warranty. That's why the 
> OpRegion 2.0 support is still needed.
>
>> 
>>> Cc:  Zhenyu Wang <zhenyuw@...ux.intel.com>
>>> Cc:  Hang Yuan <hang.yuan@...ux.intel.com>
>>> Cc:  Swee Yee Fonn <swee.yee.fonn@...el.com>
>>> Cc:  Fred Gao <fred.gao@...el.com>
>>>  Signed-off-by: Colin Xu <colin.xu@...el.com>
>>>  ---
>>>   drivers/vfio/pci/vfio_pci_igd.c | 117 ++++++++++++++++++++------------
>>>   1 file changed, 75 insertions(+), 42 deletions(-)
>>>
>>>  diff --git a/drivers/vfio/pci/vfio_pci_igd.c
>>>  b/drivers/vfio/pci/vfio_pci_igd.c
>>>  index 228df565e9bc..22b9436a3044 100644
>>>  --- a/drivers/vfio/pci/vfio_pci_igd.c
>>>  +++ b/drivers/vfio/pci/vfio_pci_igd.c
>>>  @@ -48,7 +48,10 @@ static size_t vfio_pci_igd_rw(struct vfio_pci_device
>>>  *vdev, char __user *buf,
>>>   static void vfio_pci_igd_release(struct vfio_pci_device *vdev,
>>>   				 struct vfio_pci_region *region)
>>>  {
>>>  -	memunmap(region->data);
>>>  +	if (is_ioremap_addr(region->data))
>>>  +		memunmap(region->data);
>>>  +	else
>>>  +		kfree(region->data);
>>>   }
>>>
>>>  static const struct vfio_pci_regops vfio_pci_igd_regops = {
>>>  @@ -59,10 +62,11 @@ static const struct vfio_pci_regops
>>>  vfio_pci_igd_regops = {
>>>   static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
>>>   {
>>>  	__le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR);
>>>  -	u32 addr, size;
>>>  -	void *base;
>>>  +	u32 addr, size, rvds = 0;
>>>  +	void *base, *opregionvbt;
>>>    int ret;
>>>    u16 version;
>>>  +	u64 rvda = 0;
>>>
>>>    ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, &addr);
>>>    if (ret)
>>>  @@ -89,66 +93,95 @@ static int vfio_pci_igd_opregion_init(struct
>>>  vfio_pci_device *vdev)
>>>    size *= 1024; /* In KB */
>>>
>>>  	/*
>>>  -	 * Support opregion v2.1+
>>>  -	 * When VBT data exceeds 6KB size and cannot be within mailbox #4,
>>>  then
>>>  -	 * the Extended VBT region next to opregion is used to hold the VBT
>>>  data.
>>>  -	 * RVDA (Relative Address of VBT Data from Opregion Base) and RVDS
>>>  -	 * (Raw VBT Data Size) from opregion structure member are used to
>>>  hold the
>>>  -	 * address from region base and size of VBT data. RVDA/RVDS are not
>>>  -	 * defined before opregion 2.0.
>>>  +	 * OpRegion and VBT:
>>>  +	 * When VBT data doesn't exceed 6KB, it's stored in Mailbox #4.
>>>  +	 * When VBT data exceeds 6KB size, Mailbox #4 is no longer large
>>>  enough
>>>  +	 * to hold the VBT data, the Extended VBT region is introduced since
>>>  +	 * OpRegion 2.0 to hold the VBT data. Since OpRegion 2.0, RVDA/RVDS
>>>  are
>>>  +	 * introduced to define the extended VBT data location and size.
>>>  +	 * OpRegion 2.0: RVDA defines the absolute physical address of the
>>>  +	 *   extended VBT data, RVDS defines the VBT data size.
>>>  +	 * OpRegion 2.1 and above: RVDA defines the relative address of the
>>>  +	 *   extended VBT data to OpRegion base, RVDS defines the VBT data
>>>  size.
>>>  	 *
>>>  -	 * opregion 2.1+: RVDA is unsigned, relative offset from
>>>  -	 * opregion base, and should point to the end of opregion.
>>>  -	 * otherwise, exposing to userspace to allow read access to
>>>  everything between
>>>  -	 * the OpRegion and VBT is not safe.
>>>  -	 * RVDS is defined as size in bytes.
>>>  -	 *
>>>  -	 * opregion 2.0: rvda is the physical VBT address.
>>>  -	 * Since rvda is HPA it cannot be directly used in guest.
>>>  -	 * And it should not be practically available for end user,so it is
>>>  not supported.
>>>  +	 * Due to the RVDA difference in OpRegion VBT (also the only diff
>>>  between
>>>  +	 * 2.0 and 2.1), while for OpRegion 2.1 and above it's possible to
>>>  map
>>>  +	 * a contigious memory to expose OpRegion and VBT r/w via the vfio
>>>  +	 * region, for OpRegion 2.0 shadow and amendment mechanism is used to
>>>  +	 * expose OpRegion and VBT r/w properly. So that from r/w ops view,
>>>  only
>>>  +	 * OpRegion 2.1 is exposed regardless underneath Region is 2.0 or
>>>  2.1.
>>>    */
>>>  	version = le16_to_cpu(*(__le16 *)(base + OPREGION_VERSION));
>>>  -	if (version >= 0x0200) {
>>>  -		u64 rvda;
>>>  -		u32 rvds;
>>>
>>>  +	if (version >= 0x0200) {
>>>     rvda = le64_to_cpu(*(__le64 *)(base + OPREGION_RVDA));
>>>     rvds = le32_to_cpu(*(__le32 *)(base + OPREGION_RVDS));
>>>  +
>>>  +		/* The extended VBT is valid only when RVDA/RVDS are
>>>  non-zero. */
>>>  		if (rvda && rvds) {
>>>  -			/* no support for opregion v2.0 with physical VBT
>>>  address */
>>>  -			if (version == 0x0200) {
>>>  +			size += rvds;
>>>  +		}
>>>  +
>>>  +		/* The extended VBT must follows OpRegion for OpRegion 2.1+
>>>  */
>>>  +		if (rvda != size && version > 0x0200) {
>>
>>  But we already added rvds to size, this is not compatible with the
>>  previous code that required rvda == size BEFORE adding rvds.
>>
>>>  +			memunmap(base);
>>>  +			pci_err(vdev->pdev,
>>>  +				"Extended VBT does not follow opregion on
>>>  version 0x%04x\n",
>>>  +				version);
>>>  +			return -EINVAL;
>>>  +		}
>>>  +	}
>>>  +
>>>  +	if (size != OPREGION_SIZE) {
>>>  +		/* Allocate memory for OpRegion and extended VBT for 2.0 */
>>>  +		if (rvda && rvds && version == 0x0200) {
>>>  +			void *vbt_base;
>>>  +
>>>  +			vbt_base = memremap(rvda, rvds, MEMREMAP_WB);
>>>  +			if (!vbt_base) {
>>>  				memunmap(base);
>>>  -				pci_err(vdev->pdev,
>>>  -					"IGD assignment does not support
>>>  opregion v2.0 with an extended VBT region\n");
>>>  -				return -EINVAL;
>>>  +				return -ENOMEM;
>>>      }
>>>
>>>  -			if (rvda != size) {
>>>  +			opregionvbt = kzalloc(size, GFP_KERNEL);
>>>  +			if (!opregionvbt) {
>>>  				memunmap(base);
>>>  -				pci_err(vdev->pdev,
>>>  -					"Extended VBT does not follow
>>>  opregion on version 0x%04x\n",
>>>  -					version);
>>>  -				return -EINVAL;
>>>  +				memunmap(vbt_base);
>>>  +				return -ENOMEM;
>>>      }
>>>
>>>  -			/* region size for opregion v2.0+: opregion and VBT
>>>  size. */
>>>  -			size += rvds;
>>>  +			/* Stitch VBT after OpRegion noncontigious */
>>>  +			memcpy(opregionvbt, base, OPREGION_SIZE);
>>>  +			memcpy(opregionvbt + OPREGION_SIZE, vbt_base, rvds);
>>>  +
>>>  +			/* Patch OpRegion 2.0 to 2.1 */
>>>  +			*(__le16 *)(opregionvbt + OPREGION_VERSION) = 0x0201;
>>>  +			/* Patch RVDA to relative address after OpRegion */
>>>  +			*(__le64 *)(opregionvbt + OPREGION_RVDA) =
>>>  OPREGION_SIZE;
>>
>>  AIUI, the OpRegion is a two-way channel between the IGD device/system
>>  BIOS and the driver, numerous fields are writable by the driver.  Now
>>  the driver writes to a shadow copy of the OpRegion table.  What
>>  completes the write to the real OpRegion table for consumption by the
>>  device/BIOS?  Likewise, what updates the fields that are written by the
>>  device/BIOS for consumption by the driver?
>>
>>  If a shadow copy of the OpRegion detached from the physical table is
>>  sufficient here, why wouldn't we always shadow the OpRegion and prevent
>>  all userspace writes from touching the real version?  Thanks,
>>
>>  Alex
>>
>>>  +
>>>  +			memunmap(vbt_base);
>>>  +			memunmap(base);
>>>  +
>>>  +			/* Register shadow instead of map as vfio_region */
>>>  +			base = opregionvbt;
>>>  +		/* Remap OpRegion + extended VBT for 2.1+ */
>>>  +		} else {
>>>  +			memunmap(base);
>>>  +			base = memremap(addr, size, MEMREMAP_WB);
>>>  +			if (!base)
>>>  +				return -ENOMEM;
>>>    	}
>>>    }
>>>
>>>  -	if (size != OPREGION_SIZE) {
>>>  -		memunmap(base);
>>>  -		base = memremap(addr, size, MEMREMAP_WB);
>>>  -		if (!base)
>>>  -			return -ENOMEM;
>>>  -	}
>>>  -
>>>    ret = vfio_pci_register_dev_region(vdev,
>>>     PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
>>>     VFIO_REGION_SUBTYPE_INTEL_IGD_OPREGION,
>>>     &vfio_pci_igd_regops, size, VFIO_REGION_INFO_FLAG_READ, base);
>>>  	if (ret) {
>>>  -		memunmap(base);
>>>  +		if (is_ioremap_addr(base))
>>>  +			memunmap(base);
>>>  +		else
>>>  +			kfree(base);
>>>    	return ret;
>>>    }
>>> 
>> 
>> 
>
> --
> Best Regards,
> Colin Xu
>
>

--
Best Regards,
Colin Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ