lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e89107da-ac99-4d3a-9527-a4df9986e120@kernel.org>
Date: Wed, 25 Sep 2024 07:06:41 +0200
From: Jiri Slaby <jirislaby@...nel.org>
To: Vidya Sagar <vidyas@...dia.com>, corbet@....net, bhelgaas@...gle.com,
 galshalom@...dia.com, leonro@...dia.com, jgg@...dia.com, treding@...dia.com,
 jonathanh@...dia.com
Cc: mmoshrefjava@...dia.com, shahafs@...dia.com, vsethi@...dia.com,
 sdonthineni@...dia.com, jan@...dia.com, tdave@...dia.com,
 linux-doc@...r.kernel.org, linux-pci@...r.kernel.org,
 linux-kernel@...r.kernel.org, kthota@...dia.com, mmaddireddy@...dia.com,
 sagar.tv@...il.com
Subject: Re: [PATCH V4] PCI: Extend ACS configurability

On 25. 06. 24, 17:31, Vidya Sagar wrote:
> PCIe ACS settings control the level of isolation and the possible P2P
> paths between devices. With greater isolation the kernel will create
> smaller iommu_groups and with less isolation there is more HW that
> can achieve P2P transfers. From a virtualization perspective all
> devices in the same iommu_group must be assigned to the same VM as
> they lack security isolation.
> 
> There is no way for the kernel to automatically know the correct
> ACS settings for any given system and workload. Existing command line
> options (ex:- disable_acs_redir) allow only for large scale change,
> disabling all isolation, but this is not sufficient for more complex cases.
> 
> Add a kernel command-line option 'config_acs' to directly control all the
> ACS bits for specific devices, which allows the operator to setup the
> right level of isolation to achieve the desired P2P configuration.
> The definition is future proof, when new ACS bits are added to the spec
> the open syntax can be extended.
> 
> ACS needs to be setup early in the kernel boot as the ACS settings
> effect how iommu_groups are formed. iommu_group formation is a one
> time event during initial device discovery, changing ACS bits after
> kernel boot can result in an inaccurate view of the iommu_groups
> compared to the current isolation configuration.
> 
> ACS applies to PCIe Downstream Ports and multi-function devices.
> The default ACS settings are strict and deny any direct traffic
> between two functions. This results in the smallest iommu_group the
> HW can support. Frequently these values result in slow or
> non-working P2PDMA.
> 
> ACS offers a range of security choices controlling how traffic is
> allowed to go directly between two devices. Some popular choices:
>    - Full prevention
>    - Translated requests can be direct, with various options
>    - Asymmetric direct traffic, A can reach B but not the reverse
>    - All traffic can be direct
> Along with some other less common ones for special topologies.
> 
> The intention is that this option would be used with expert knowledge
> of the HW capability and workload to achieve the desired
> configuration.

Hi,

this breaks ACS on some platforms (in 6.11). See:
https://bugzilla.suse.com/show_bug.cgi?id=1229019

When starting a KVM:
> "Error starting domain: internal error: QEMU unexpectedly closed the monitor (vm=‘win10’): qxl_send_events: spice-server bug: guest stopped, ignoring
> 2024-08-08T01:45:51.824474Z qemu-system-x86_64: -device {“driver”:“vfio-pci”,“host”:“0000:0b:00.0”,“id”:“hostdev0”,“bus”:“pci.2”,“addr”:“0x0”}: vfio 0000:0b:00.0: group 83 is not viable
> Please ensure all devices within the iommu_group are bound to their vfio bus driver."



We backported the commit to 6.4, there they see:
> The good kernel 6.4.0-150600.23.14 log has this:
> 
> [    0.618918] pci 0000:00:1c.0: Intel PCH root port ACS workaround enabled
> [    0.619153] pci 0000:00:1c.4: Intel PCH root port ACS workaround enabled
> 
> the bad one 6.4.0-150600.23.22 does not.

But the same difference is with 6.10 vs 6.11.

Any ideas? It looks like workarounds are not applied anymore.

> Signed-off-by: Vidya Sagar <vidyas@...dia.com>
> ---
> v4:
> * Changed commit message (Courtesy: Jason) to provide more details
> 
> v3:
> * Fixed a documentation issue reported by kernel test bot
> 
> v2:
> * Refactored the code as per Jason's suggestion
> 
>   .../admin-guide/kernel-parameters.txt         |  22 +++
>   drivers/pci/pci.c                             | 148 +++++++++++-------
>   2 files changed, 112 insertions(+), 58 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 500cfa776225..42d0f6fd40d0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4619,6 +4619,28 @@
>   				bridges without forcing it upstream. Note:
>   				this removes isolation between devices and
>   				may put more devices in an IOMMU group.
> +		config_acs=
> +				Format:
> +				=<ACS flags>@<pci_dev>[; ...]
> +				Specify one or more PCI devices (in the format
> +				specified above) optionally prepended with flags
> +				and separated by semicolons. The respective
> +				capabilities will be enabled, disabled or unchanged
> +				based on what is specified in flags.
> +				ACS Flags is defined as follows
> +				bit-0 : ACS Source Validation
> +				bit-1 : ACS Translation Blocking
> +				bit-2 : ACS P2P Request Redirect
> +				bit-3 : ACS P2P Completion Redirect
> +				bit-4 : ACS Upstream Forwarding
> +				bit-5 : ACS P2P Egress Control
> +				bit-6 : ACS Direct Translated P2P
> +				Each bit can be marked as
> +				‘0‘ – force disabled
> +				‘1’ – force enabled
> +				‘x’ – unchanged.
> +				Note: this may remove isolation between devices
> +				and may put more devices in an IOMMU group.
>   		force_floating	[S390] Force usage of floating interrupts.
>   		nomio		[S390] Do not use MIO instructions.
>   		norid		[S390] ignore the RID field and force use of
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 55078c70a05b..6661932afe59 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -946,30 +946,67 @@ void pci_request_acs(void)
>   }
>   
>   static const char *disable_acs_redir_param;
> +static const char *config_acs_param;
>   
> -/**
> - * pci_disable_acs_redir - disable ACS redirect capabilities
> - * @dev: the PCI device
> - *
> - * For only devices specified in the disable_acs_redir parameter.
> - */
> -static void pci_disable_acs_redir(struct pci_dev *dev)
> +struct pci_acs {
> +	u16 cap;
> +	u16 ctrl;
> +	u16 fw_ctrl;
> +};
> +
> +static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
> +			     const char *p, u16 mask, u16 flags)
>   {
> +	char *delimit;
>   	int ret = 0;
> -	const char *p;
> -	int pos;
> -	u16 ctrl;
>   
> -	if (!disable_acs_redir_param)
> +	if (!p)
>   		return;
>   
> -	p = disable_acs_redir_param;
>   	while (*p) {
> +		if (!mask) {
> +			/* Check for ACS flags */
> +			delimit = strstr(p, "@");
> +			if (delimit) {
> +				int end;
> +				u32 shift = 0;
> +
> +				end = delimit - p - 1;
> +
> +				while (end > -1) {
> +					if (*(p + end) == '0') {
> +						mask |= 1 << shift;
> +						shift++;
> +						end--;
> +					} else if (*(p + end) == '1') {
> +						mask |= 1 << shift;
> +						flags |= 1 << shift;
> +						shift++;
> +						end--;
> +					} else if ((*(p + end) == 'x') || (*(p + end) == 'X')) {
> +						shift++;
> +						end--;
> +					} else {
> +						pci_err(dev, "Invalid ACS flags... Ignoring\n");
> +						return;
> +					}
> +				}
> +				p = delimit + 1;
> +			} else {
> +				pci_err(dev, "ACS Flags missing\n");
> +				return;
> +			}
> +		}
> +
> +		if (mask & ~(PCI_ACS_SV | PCI_ACS_TB | PCI_ACS_RR | PCI_ACS_CR |
> +			    PCI_ACS_UF | PCI_ACS_EC | PCI_ACS_DT)) {
> +			pci_err(dev, "Invalid ACS flags specified\n");
> +			return;
> +		}
> +
>   		ret = pci_dev_str_match(dev, p, &p);
>   		if (ret < 0) {
> -			pr_info_once("PCI: Can't parse disable_acs_redir parameter: %s\n",
> -				     disable_acs_redir_param);
> -
> +			pr_info_once("PCI: Can't parse acs command line parameter\n");
>   			break;
>   		} else if (ret == 1) {
>   			/* Found a match */
> @@ -989,56 +1026,38 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
>   	if (!pci_dev_specific_disable_acs_redir(dev))
>   		return;
>   
> -	pos = dev->acs_cap;
> -	if (!pos) {
> -		pci_warn(dev, "cannot disable ACS redirect for this hardware as it does not have ACS capabilities\n");
> -		return;
> -	}
> -
> -	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> +	pci_dbg(dev, "ACS mask  = 0x%X\n", mask);
> +	pci_dbg(dev, "ACS flags = 0x%X\n", flags);
>   
> -	/* P2P Request & Completion Redirect */
> -	ctrl &= ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC);
> +	/* If mask is 0 then we copy the bit from the firmware setting. */
> +	caps->ctrl = (caps->ctrl & ~mask) | (caps->fw_ctrl & mask);
> +	caps->ctrl |= flags;
>   
> -	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> -
> -	pci_info(dev, "disabled ACS redirect\n");
> +	pci_info(dev, "Configured ACS to 0x%x\n", caps->ctrl);
>   }
>   
>   /**
>    * pci_std_enable_acs - enable ACS on devices using standard ACS capabilities
>    * @dev: the PCI device
> + * @caps: default ACS controls
>    */
> -static void pci_std_enable_acs(struct pci_dev *dev)
> +static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
>   {
> -	int pos;
> -	u16 cap;
> -	u16 ctrl;
> -
> -	pos = dev->acs_cap;
> -	if (!pos)
> -		return;
> -
> -	pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
> -	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> -
>   	/* Source Validation */
> -	ctrl |= (cap & PCI_ACS_SV);
> +	caps->ctrl |= (caps->cap & PCI_ACS_SV);
>   
>   	/* P2P Request Redirect */
> -	ctrl |= (cap & PCI_ACS_RR);
> +	caps->ctrl |= (caps->cap & PCI_ACS_RR);
>   
>   	/* P2P Completion Redirect */
> -	ctrl |= (cap & PCI_ACS_CR);
> +	caps->ctrl |= (caps->cap & PCI_ACS_CR);
>   
>   	/* Upstream Forwarding */
> -	ctrl |= (cap & PCI_ACS_UF);
> +	caps->ctrl |= (caps->cap & PCI_ACS_UF);
>   
>   	/* Enable Translation Blocking for external devices and noats */
>   	if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
> -		ctrl |= (cap & PCI_ACS_TB);
> -
> -	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> +		caps->ctrl |= (caps->cap & PCI_ACS_TB);
>   }
>   
>   /**
> @@ -1047,23 +1066,33 @@ static void pci_std_enable_acs(struct pci_dev *dev)
>    */
>   static void pci_enable_acs(struct pci_dev *dev)
>   {
> -	if (!pci_acs_enable)
> -		goto disable_acs_redir;
> +	struct pci_acs caps;
> +	int pos;
> +
> +	pos = dev->acs_cap;
> +	if (!pos)
> +		return;
>   
> -	if (!pci_dev_specific_enable_acs(dev))
> -		goto disable_acs_redir;
> +	pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
> +	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
> +	caps.fw_ctrl = caps.ctrl;
>   
> -	pci_std_enable_acs(dev);
> +	/* If an iommu is present we start with kernel default caps */
> +	if (pci_acs_enable) {
> +		if (pci_dev_specific_enable_acs(dev))
> +			pci_std_enable_acs(dev, &caps);
> +	}
>   
> -disable_acs_redir:
>   	/*
> -	 * Note: pci_disable_acs_redir() must be called even if ACS was not
> -	 * enabled by the kernel because it may have been enabled by
> -	 * platform firmware.  So if we are told to disable it, we should
> -	 * always disable it after setting the kernel's default
> -	 * preferences.
> +	 * Always apply caps from the command line, even if there is no iommu.
> +	 * Trust that the admin has a reason to change the ACS settings.
>   	 */
> -	pci_disable_acs_redir(dev);
> +	__pci_config_acs(dev, &caps, disable_acs_redir_param,
> +			 PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC,
> +			 ~(PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_EC));
> +	__pci_config_acs(dev, &caps, config_acs_param, 0, 0);
> +
> +	pci_write_config_word(dev, pos + PCI_ACS_CTRL, caps.ctrl);
>   }
>   
>   /**
> @@ -6740,6 +6769,8 @@ static int __init pci_setup(char *str)
>   				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
>   			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
>   				disable_acs_redir_param = str + 18;
> +			} else if (!strncmp(str, "config_acs=", 11)) {
> +				config_acs_param = str + 11;
>   			} else {
>   				pr_err("PCI: Unknown option `%s'\n", str);
>   			}
> @@ -6764,6 +6795,7 @@ static int __init pci_realloc_setup_params(void)
>   	resource_alignment_param = kstrdup(resource_alignment_param,
>   					   GFP_KERNEL);
>   	disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
> +	config_acs_param = kstrdup(config_acs_param, GFP_KERNEL);
>   
>   	return 0;
>   }

-- 
js
suse labs


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ