lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 5 May 2023 08:10:33 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     "Chatre, Reinette" <reinette.chatre@...el.com>,
        "jgg@...dia.com" <jgg@...dia.com>,
        "yishaih@...dia.com" <yishaih@...dia.com>,
        "shameerali.kolothum.thodi@...wei.com" 
        <shameerali.kolothum.thodi@...wei.com>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>
CC:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "darwi@...utronix.de" <darwi@...utronix.de>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "tom.zanussi@...ux.intel.com" <tom.zanussi@...ux.intel.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH V4 10/11] vfio/pci: Support dynamic MSI-X

> From: Chatre, Reinette <reinette.chatre@...el.com>
> Sent: Saturday, April 29, 2023 2:35 AM
> 
> Hi Kevin,
> 
> On 4/27/2023 11:50 PM, Tian, Kevin wrote:
> >> From: Chatre, Reinette <reinette.chatre@...el.com>
> >> Sent: Friday, April 28, 2023 1:36 AM
> >>
> >> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be
> >> allocated after MSI-X enabling.
> >>
> >> Use dynamic MSI-X (if supported by the device) to allocate an interrupt
> >> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at
> >> the time a valid eventfd is assigned. This is different behavior from
> >> a range provided during MSI-X enabling where interrupts are allocated
> >> for the entire range whether a valid eventfd is provided for each
> >> interrupt or not.
> >>
> >> The PCI-MSIX API requires that some number of irqs are allocated for
> >> an initial set of vectors when enabling MSI-X on the device. When
> >> dynamic MSIX allocation is not supported, the vector table, and thus
> >> the allocated irq set can only be resized by disabling and re-enabling
> >> MSI-X with a different range. In that case the irq allocation is
> >> essentially a cache for configuring vectors within the previously
> >> allocated vector range. When dynamic MSI-X allocation is supported,
> >> the API still requires some initial set of irqs to be allocated, but
> >> also supports allocating and freeing specific irq vectors both
> >> within and beyond the initially allocated range.
> >>
> >> For consistency between modes, as well as to reduce latency and improve
> >> reliability of allocations, and also simplicity, this implementation
> >> only releases irqs via pci_free_irq_vectors() when either the interrupt
> >> mode changes or the device is released.
> >
> > It improves the reliability of allocations from the calling device p.o.v.
> >
> > But system-wide this is not efficient use of irqs and not releasing them
> > timely may affect the reliability of allocations for other devices.
> 
> Could you please elaborate how other devices may be impacted?

the more this devices reserves the less remains for others, e.g. irte entries.

> 
> > Should this behavior be something configurable?
> 
> This is not clear to me and I look to you for guidance here. From practical
> side it looks like configuration via module parameters is supported but
> whether it should be done is not clear to me.
> 
> When considering this we need to think about what the user may expect
> when
> turning on/off the configuration. For example, MSI-X continues to allocate a
> range of interrupts during enabling. These have always been treated as a
> "cache" (interrupts remain allocated, whether they have an associated
> trigger
> or not). If there is new configurable behavior, do you expect that the
> driver needs to distinguish between the original "cache" that the user is
> used to and the new dynamic allocations? That is, should a dynamic MSI-X
> capable device always free interrupts when user space removes an eventfd
> or should only interrupts that were allocated dynamically be freed
> dynamically?

That looks tricky. Probably that is why Alex suggested doing this simple
scheme and it is on par with the old logic anyway. So I'll withdraw this
comment.

> 
> >> +/*
> >> + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector.
> >> + * If a Linux IRQ number is not available then a new interrupt will be
> >> + * allocated if dynamic MSI-X is supported.
> >> + */
> >> +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev,
> >> +			      unsigned int vector, bool msix)
> >> +{
> >> +	struct pci_dev *pdev = vdev->pdev;
> >> +	struct msi_map map;
> >> +	int irq;
> >> +	u16 cmd;
> >> +
> >> +	irq = pci_irq_vector(pdev, vector);
> >> +	if (irq > 0 || !msix || !vdev->has_dyn_msix)
> >> +		return irq;
> >
> > if (irq >= 0 || ...)
> >
> 
> I am not sure about this request because pci_irq_vector() cannot return 0.
> The Linux interrupt number will be > 0 on success. 0 means "not found"
> (see msi_get_virq()), which is translated to -EINVAL by pci_irq_vector().
> 

There is a subtle difference between the description and the code of
pci_irq_vector().

/**
 * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
 * @dev: the PCI device to operate on
 * @nr:  device-relative interrupt vector index (0-based); has different
 *       meanings, depending on interrupt mode:
 *
 *         * MSI-X     the index in the MSI-X vector table
 *         * MSI       the index of the enabled MSI vectors
 *         * INTx      must be 0
 *
 * Return: the Linux IRQ number, or -EINVAL if @nr is out of range
 */

From above '0' is a valid irq number.

then in following code:

	irq = msi_get_virq(&dev->dev, nr);
	return irq ? irq : -EINVAL;

'0' is obviously invalid for msi.

I didn't realize the msi part when reading the patch. It left me in
confusion that '0' is unhandled as here we only check ">0" while in
other places "-EINVAL" is checked.

Not big matter but it sounds slightly clearer to me to follow the
description of pci_irq_vector() instead of its internal detail. 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ