lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 17 Dec 2013 15:12:11 -0700
From:	Alex Williamson <alex.williamson@...hat.com>
To:	Bjorn Helgaas <bhelgaas@...gle.com>
Cc:	linux-pci@...r.kernel.org, ddutile@...hat.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] pci: Add Virtual Channel to save/restore support

On Mon, 2013-12-16 at 17:48 -0700, Bjorn Helgaas wrote:
> On Tue, Dec 10, 2013 at 11:48:45AM -0700, Alex Williamson wrote:
> > While we don't really have any infrastructure for making use of VC
> > support, the system BIOS can configure the topology to non-default
> > VC values prior to boot.  This may be due to silicon bugs, desire to
> > reserve traffic classes, or perhaps just BIOS bugs.  When we reset
> > devices, the VC configuration may return to default values, which can
> > be incompatible with devices upstream.  For instance, Nvidia GRID
> > cards provide a PCIe switch and some number of GPUs, all supporting
> > VC.  The power-on default for VC is to support TC0-7 across VC0,
> > however some platforms will only enable TC0/VC0 mapping across the
> > topology.  When we do a secondary bus reset on the downstream switch
> > port, the GPU is reset to a TC0-7/VC0 mapping while the opposite end
> > of the link only enables TC0/VC0.  If the GPU attempts to use TC1-7,
> > it fails.
> > 
> > This patch attempts to provide complete support for VC save/restore,
> > even beyond the minimally required use case above.  This includes
> > save/restore and reload of the arbitration table, save/restore and
> > reload of the port arbitration tables, and re-enabling of the
> > channels for VC, VC9, and MFVC capabilities.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@...hat.com>
> > ---
> >  drivers/pci/pci.c |  387 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> Wow, that's a lot of code just to support resetting a device.  I wish I had
> a better idea, but I don't.

I know, me too.  I tried to keep the code as compact as I could.  If you
look at it on a per capability basis, since this supports VC/VC9/MFVC,
it's not so bad ;)

>   I know we have similar save/restore code in
> pci.c already, but would it make sense to put this in a vc.c to keep pci.c
> from growing without bound?

Yep, I can do that.

> >  1 file changed, 387 insertions(+)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index a898e4e..3a1d060 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -969,6 +969,367 @@ static void pci_restore_pcix_state(struct pci_dev *dev)
> >  	pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
> >  }
> >  
> > +/**
> > + * pci_vc_save_restore_dwords - Save or restore a series of dwords
> > + * @dev: device
> > + * @pos: starting config space position
> > + * @buf: buffer to save to or restore from
> > + * @dwords: number of dwords to save/restore
> > + * @save: whether to save or restore
> > + */
> > +static void pci_vc_save_restore_dwords(struct pci_dev *dev, int pos,
> > +				       u32 *buf, int dwords, bool save)
> 
> Nothing VC-specific here, so maybe remove "_vc_" from the name.

Sure, but if I make a vc.c there's no reason not to move this function
there so retaining _vc_ avoids polluting the common pci namespace.

> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < dwords; i++, buf++) {
> > +		if (save)
> > +			pci_read_config_dword(dev, pos + (i * 4), buf);
> > +		else
> > +			pci_write_config_dword(dev, pos + (i * 4), *buf);
> > +	}
> > +}
> > +
> > +/**
> > + * pci_vc_load_port_arb_table - load and wait for VC arbitration table
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + *
> > + * Signal a VC arbitration table to load and wait for completion.
> > + */
> > +static void pci_vc_load_arb_table(struct pci_dev *dev, int pos)
> > +{
> > +	u32 ctrl;
> > +
> > +	pci_read_config_dword(dev, pos + PCI_VC_PORT_CTRL, &ctrl);
> > +	pci_write_config_dword(dev, pos + PCI_VC_PORT_CTRL, ctrl | 1);
> > +	if (pci_wait_for_pending(dev, pos + PCI_VC_PORT_STATUS, 1))
> 
> Comment doesn't match function name.  The spec "Request hardware to apply
> new values" language would be a useful clue that this doesn't actually
> *write* the table; before reading the spec, I initially looked for a
> buffer containing the arbitration table.  It'd be nice to have #defines
> for the bits here, i.e., PCI_VC_PORT_CTRL_LOAD_ARB or similar.  The

Ok

> spec says these are 16-bit registers; shouldn't these be "word" accesses?

The control registers are 32-bit, the status registers are 16-bit.
pci_wait_for_pending uses word access.

> > +		return;
> > +
> > +	dev_err(&dev->dev, "VC arbitration table failed to load\n");
> > +}
> > +
> > +/**
> > + * pci_vc_load_port_arb_table - Load and wait for VC port arbitration table
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + * @res: VC res number, ie. VCn (0-7)
> > + *
> > + * Signal a VC port arbitration table to load and wait for completion.
> > + */
> > +static void pci_vc_load_port_arb_table(struct pci_dev *dev, int pos, int res)
> > +{
> > +	int ctrl_pos, status_pos;
> > +	u32 ctrl;
> > +
> > +	ctrl_pos = pos + PCI_VC_RES_CTRL + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +	status_pos = pos + PCI_VC_RES_STATUS + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +
> > +	pci_read_config_dword(dev, ctrl_pos, &ctrl);
> > +	pci_write_config_dword(dev, ctrl_pos, ctrl | (1 << 16));
> > +
> > +	if (pci_wait_for_pending(dev, status_pos, 1))
> 
> #defines, please, to help grep users later.

Yep

> > +		return;
> > +
> > +	dev_err(&dev->dev, "VC%d port arbitration table failed to load\n", res);
> > +}
> > +
> > +/**
> > + * pci_vc_enable - Enable virtual channel
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + * @res: VC res number, ie. VCn (0-7)
> > + *
> > + * A VC is enabled by setting the enable bit in matching resource control
> > + * registers on both sides of a link.  We therefore need to find the opposite
> > + * end of the link.  To keep this simple we enable from the downstream device.
> > + * RC devices do not have an upstream device, nor does it seem that VC9 do
> > + * (spec is unclear).  Once we find the upstream device, match the VC ID to
> > + * get the correct resource, disable and enable on both ends.
> > + */
> > +static void pci_vc_enable(struct pci_dev *dev, int pos, int res)
> > +{
> > +	int ctrl_pos, status_pos, id, pos2, evcc, i, ctrl_pos2, status_pos2;
> > +	u32 ctrl, header, reg1, ctrl2;
> > +	struct pci_dev *link = NULL;
> > +
> > +	/* Enable VCs from the downstream device */
> > +	if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> > +	    pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM)
> > +		return;
> > +
> > +	ctrl_pos = pos + PCI_VC_RES_CTRL + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +	status_pos = pos + PCI_VC_RES_STATUS + (res * PCI_CAP_VC_PER_VC_SIZEOF);
> > +
> > +	pci_read_config_dword(dev, ctrl_pos, &ctrl);
> > +	id = (ctrl >> 24) & 0x7;
> 
> #define PCI_VC_RES_CTRL_ID 0x07000000.
> 
> It gets a little cumbersome to have #defines for *both* the mask and the
> shift; the compromise I like is to have a #define for the mask (which shows
> the position in the register) and use bare numbers when we need to shift.
> Then it's easy to find uses of the field with grep or cscope, and the mask
> definition helps manual decoding.  There are several other opportunities
> for mask #defines below.

Ok, figuring out the define for mask vs shift did play a part in doing
neither.  I'll follow your standard.

> In this case, you only compare ID values, so there's actually no need to
> shift it.

Good point.

> > +
> > +	pci_read_config_dword(dev, pos, &header);
> > +
> > +	/* If there is no opposite end of the link, skip to enable */
> > +	if (PCI_EXT_CAP_ID(header) == PCI_EXT_CAP_ID_VC9 ||
> > +	    pci_is_root_bus(dev->bus))
> > +		goto enable;
> > +
> > +	pos2 = pci_find_ext_capability(dev->bus->self, PCI_EXT_CAP_ID_VC);
> > +	if (pos2 <= 0)
> > +		goto enable;
> 
> I don't think < 0 is possible (several other occurrences below, too).

Copied from elsewhere, but no need to propagate poor form.

> > +
> > +	pci_read_config_dword(dev->bus->self, pos2 + PCI_VC_PORT_REG1, &reg1);
> > +	evcc = reg1 & PCI_VC_REG1_EVCC;
> > +
> > +	/* VC0 is hardwired enabled, so we can start with 1 */
> > +	for (i = 1; i < evcc + 1; i++) {
> > +		ctrl_pos2 = pos2 + PCI_VC_RES_CTRL +
> > +				(i * PCI_CAP_VC_PER_VC_SIZEOF);
> > +		status_pos2 = pos2 + PCI_VC_RES_STATUS +
> > +				(i * PCI_CAP_VC_PER_VC_SIZEOF);
> > +		pci_read_config_dword(dev->bus->self, ctrl_pos2, &ctrl2);
> > +		if (((ctrl2 >> 24) & 0x7) == id) {
> > +			link = dev->bus->self;
> > +			break;
> > +		}
> > +	}
> > +
> > +	if (!link)
> > +		goto enable;
> > +
> > +	/* Disable if enabled */
> > +	if (ctrl2 & (1U << 31)) {
> > +		ctrl2 &= ~(1U << 31);
> > +		pci_write_config_dword(link, ctrl_pos2, ctrl2);
> > +	}
> > +
> > +	/* Enable on both ends */
> > +	ctrl2 |= 1U << 31;
> > +	pci_write_config_dword(link, ctrl_pos2, ctrl2);
> > +enable:
> > +	ctrl |= 1U << 31;
> > +	pci_write_config_dword(dev, ctrl_pos, ctrl);
> > +
> > +	if (!pci_wait_for_pending(dev, status_pos, 0x2))
> > +		dev_err(&dev->dev, "VC%d negotiation stuck pending\n", id);
> > +
> > +	if (link && !pci_wait_for_pending(link, status_pos2, 0x2))
> > +		dev_err(&link->dev, "VC%d negotiation stuck pending\n", id);
> > +}
> > +
> > +/**
> > + * pci_vc_do_save_buffer - Size, save, or restore VC state
> > + * @dev: device
> > + * @pos: starting position of VC capability (VC/VC9/MFVC)
> > + * @save_state: buffer for save/restore
> > + * @name: for error message
> > + * @save: if provided a buffer, this indicates what to do with it
> > + *
> > + * Walking Virtual Channel config space to size, save, or restore it
> > + * is complicated, so we do it all from one function to reduce code and
> > + * guarantee ordering matches in the buffer.  When called with NULL
> > + * @save_state, return the size of the necessary save buffer.  When called
> > + * with a non-NULL @save_state, @save determines whether we save to the
> > + * buffer or restore from it.
> > + */
> > +static int pci_vc_do_save_buffer(struct pci_dev *dev, int pos,
> > +				 struct pci_cap_saved_state *save_state,
> > +				 bool save)
> > +{
> > +	u32 reg1;
> > +	char evcc, lpevcc, parb_size;
> > +	int i, len = 0;
> > +	u32 *buf = save_state ? save_state->cap.data : NULL;
> > +
> > +	/* Sanity check buffer size for save/restore */
> > +	if (buf && save_state->cap.size !=
> > +	    pci_vc_do_save_buffer(dev, pos, NULL, save)) {
> > +		dev_err(&dev->dev,
> > +			"VC save buffer size does not match @0x%x\n", pos);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	pci_read_config_dword(dev, pos + PCI_VC_PORT_REG1, &reg1);
> > +	/* Extended VC Count (not counting VC0) */
> > +	evcc = reg1 & PCI_VC_REG1_EVCC;
> > +	/* Low Priority Extended VC Count (not counting VC0) */
> > +	lpevcc = (reg1 >> 4) & PCI_VC_REG1_EVCC;
> 
> Please make a new #define, since LPEVCC is a separate field.

Ok

Thanks for the review,

Alex

> +	/* Port Arbitration Table Entry Size (bits) */
> > +	parb_size = 1 << ((reg1 >> 10) & 0x3);
> > +
> > +	/*
> > +	 * Port VC Control Register contains VC Arbitration Select, which
> > +	 * cannot be modified when more than one LPVC is in operation.  We
> > +	 * therefore save/restore it first, as only VC0 should be enabled
> > +	 * after device reset.
> > +	 */
> > +	if (buf) {
> > +		pci_vc_save_restore_dwords(dev, pos + PCI_VC_PORT_CTRL,
> > +					   buf, 1, save);
> > +		buf++;
> > +	}
> > +	len += 4;
> > +
> > +	/*
> > +	 * If we have any Low Priority VCs and a VC Arbitration Table Offset
> > +	 * in Port VC Capability Register 2 then save/restore it next.
> > +	 */
> > +	if (lpevcc) {
> > +		u32 reg2;
> > +		int vcarb_offset;
> > +
> > +		pci_read_config_dword(dev, pos + PCI_VC_PORT_REG2, &reg2);
> > +		vcarb_offset = (reg2 >> 24) * 16;
> > +
> > +		if (vcarb_offset) {
> > +			int size, vcarb_phases = 0;
> > +
> > +			if (reg2 & PCI_VC_REG2_128_PHASE)
> > +				vcarb_phases = 128;
> > +			else if (reg2 & PCI_VC_REG2_64_PHASE)
> > +				vcarb_phases = 64;
> > +			else if (reg2 & PCI_VC_REG2_32_PHASE)
> > +				vcarb_phases = 32;
> > +
> > +			/* Fixed 4 bits per phase per lpevcc (plus VC0) */
> > +			size = ((lpevcc + 1) * vcarb_phases * 4) / 8;
> > +
> > +			if (size && buf) {
> > +				pci_vc_save_restore_dwords(dev,
> > +							   pos + vcarb_offset,
> > +							   buf, size / 4, save);
> > +				/*
> > +				 * On restore, we need to signal hardware to
> > +				 * re-load the VC Arbitration Table.
> > +				 */
> > +				if (!save)
> > +					pci_vc_load_arb_table(dev, pos);
> > +
> > +				buf += size / 4;
> > +			}
> > +			len += size;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * In addition to each VC Resource Control Register, we may have a
> > +	 * Port Arbitration Table attached to each VC.  The Port Arbitration
> > +	 * Table Offset in each VC Resource Capability Register tells us if
> > +	 * it exists.  The entry size is global from the Port VC Capability
> > +	 * Register1 above.  The number of phases is determined per VC.
> > +	 */
> > +	for (i = 0; i < evcc + 1; i++) {
> > +		u32 cap;
> > +		int parb_offset;
> > +
> > +		pci_read_config_dword(dev, pos + PCI_VC_RES_CAP +
> > +				      (i * PCI_CAP_VC_PER_VC_SIZEOF), &cap);
> > +		parb_offset = (cap >> 24) * 16;
> > +		if (parb_offset) {
> > +			int size, parb_phases = 0;
> > +
> > +			if (cap & 0x20)
> > +				parb_phases = 256;
> > +			else if (cap & 0x18)
> > +				parb_phases = 128;
> > +			else if (cap & 0x4)
> > +				parb_phases = 64;
> > +			else if (cap & 0x2)
> > +				parb_phases = 32;
> > +
> > +			size = (parb_size * parb_phases) / 8;
> > +
> > +			if (size && buf) {
> > +				pci_vc_save_restore_dwords(dev,
> > +							   pos + parb_offset,
> > +							   buf, size / 4, save);
> > +				buf += size / 4;
> > +			}
> > +			len += size;
> > +		}
> > +
> > +		/* VC Resource Control Register */
> > +		if (buf) {
> > +			int ctrl_pos = pos + PCI_VC_RES_CTRL +
> > +						(i * PCI_CAP_VC_PER_VC_SIZEOF);
> > +			if (save)
> > +				pci_read_config_dword(dev, ctrl_pos, buf);
> > +			else {
> > +				u32 tmp, ctrl = *(u32 *)buf;
> > +				/*
> > +				 * For an FLR case, the VC config may remain.
> > +				 * Preserve enable bit, restore the rest.
> > +				 */
> > +				pci_read_config_dword(dev, ctrl_pos, &tmp);
> > +				tmp &= 1U << 31;
> > +				tmp |= ctrl & ~(1U << 31);
> > +				pci_write_config_dword(dev, ctrl_pos, tmp);
> > +				/* Load port arbitration table if used */
> > +				if (ctrl & (7 << 17))
> > +					pci_vc_load_port_arb_table(dev, pos, i);
> > +				/* Re-enable if needed */
> > +				if ((ctrl ^ tmp) & (1U << 31))
> > +					pci_vc_enable(dev, pos, i);
> > +			}
> > +			buf++;
> > +		}
> > +		len += 4;
> > +	}
> > +
> > +	return buf ? 0 : len;
> > +}
> > +
> > +static struct {
> > +	u16 id;
> > +	const char *name;
> > +} vc_caps[] = { { PCI_EXT_CAP_ID_MFVC, "MFVC" },
> > +		{ PCI_EXT_CAP_ID_VC, "VC" },
> > +		{ PCI_EXT_CAP_ID_VC9, "VC9" } };
> > +
> > +static int pci_save_vc_state(struct pci_dev *dev)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
> > +		int pos, ret;
> > +		struct pci_cap_saved_state *save_state;
> > +
> > +		pos = pci_find_ext_capability(dev, vc_caps[i].id);
> > +		if (pos <= 0)
> > +			continue;
> > +
> > +		save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
> > +		if (!save_state) {
> > +			dev_err(&dev->dev, "%s buffer not found in %s\n",
> > +				vc_caps[i].name, __func__);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		ret = pci_vc_do_save_buffer(dev, pos, save_state, true);
> > +		if (ret) {
> > +			dev_err(&dev->dev, "%s save unsuccessful %s\n",
> > +				vc_caps[i].name, __func__);
> > +			return ret;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void pci_restore_vc_state(struct pci_dev *dev)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
> > +		int pos;
> > +		struct pci_cap_saved_state *save_state;
> > +
> > +		pos = pci_find_ext_capability(dev, vc_caps[i].id);
> > +		save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
> > +		if (!save_state || pos <= 0)
> > +			continue;
> > +
> > +		pci_vc_do_save_buffer(dev, pos, save_state, false);
> > +	}
> > +}
> > +
> >  
> >  /**
> >   * pci_save_state - save the PCI configuration space of a device before suspending
> > @@ -986,6 +1347,8 @@ pci_save_state(struct pci_dev *dev)
> >  		return i;
> >  	if ((i = pci_save_pcix_state(dev)) != 0)
> >  		return i;
> > +	if ((i = pci_save_vc_state(dev)) != 0)
> > +		return i;
> >  	return 0;
> >  }
> >  
> > @@ -1048,6 +1411,7 @@ void pci_restore_state(struct pci_dev *dev)
> >  	/* PCI Express register must be restored first */
> >  	pci_restore_pcie_state(dev);
> >  	pci_restore_ats_state(dev);
> > +	pci_restore_vc_state(dev);
> >  
> >  	pci_restore_config_space(dev);
> >  
> > @@ -2104,6 +2468,27 @@ static int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
> >  	return _pci_add_cap_save_buffer(dev, cap, true, size);
> >  }
> >  
> > +static void pci_allocate_vc_save_buffers(struct pci_dev *dev)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
> > +		int error, pos = pci_find_ext_capability(dev, vc_caps[i].id);
> > +		int len;
> > +
> > +		if (pos <= 0)
> > +			continue;
> > +
> > +		len = pci_vc_do_save_buffer(dev, pos, NULL, false);
> > +		error = pci_add_ext_cap_save_buffer(dev,
> > +						    vc_caps[i].id, len);
> > +		if (error)
> > +			dev_err(&dev->dev,
> > +				"unable to preallocate %s save buffer\n",
> > +				vc_caps[i].name);
> > +	}
> > +}
> > +
> >  /**
> >   * pci_allocate_cap_save_buffers - allocate buffers for saving capabilities
> >   * @dev: the PCI device
> > @@ -2122,6 +2507,8 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >  	if (error)
> >  		dev_err(&dev->dev,
> >  			"unable to preallocate PCI-X save buffer\n");
> > +
> > +	pci_allocate_vc_save_buffers(dev);
> >  }
> >  
> >  void pci_free_cap_save_buffers(struct pci_dev *dev)
> > 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ