lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6da0b4eb-717b-4810-951c-b59edf32e422@intel.com>
Date: Wed, 8 May 2024 11:34:23 +0200
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Shay Drory <shayd@...dia.com>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>, <davem@...emloft.net>, <kuba@...nel.org>,
	<edumazet@...gle.com>, <gregkh@...uxfoundation.org>,
	<david.m.ertman@...el.com>
CC: <rafael@...nel.org>, <ira.weiny@...el.com>, <linux-rdma@...r.kernel.org>,
	<leon@...nel.org>, <tariqt@...dia.com>, Parav Pandit <parav@...dia.com>
Subject: Re: [PATCH net-next v3 1/2] driver core: auxiliary bus: show
 auxiliary device IRQs

On 5/8/24 06:04, Shay Drory wrote:
> PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical
> and virtual functions are anchored on the PCI bus;  the irq information
> of each such function is visible to users via sysfs directory "msi_irqs"
> containing file for each irq entry. However, for PCI SFs such information
> is unavailable. Due to this users have no visibility on IRQs used by the
> SFs.
> Secondly, an SF is a multi function device supporting rdma, netdevice
> and more. Without irq information at the bus level, the user is unable
> to view or use the affinity of the SF IRQs.
> 
> Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory,
> for supporting auxiliary devices, containing file for each irq entry.
> 
> Additionally, the PCI SFs sometimes share the IRQs with peer SFs. This
> information is also not available to the users. To overcome this
> limitation, each irq sysfs entry shows if irq is exclusive or shared.
> 
> For example:
> $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/
> 50  51  52  53  54  55  56  57  58
> $ cat /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/52
> exclusive
> 
> Reviewed-by: Parav Pandit <parav@...dia.com>
> Signed-off-by: Shay Drory <shayd@...dia.com>
> 
> ---
> v2->v3:
> - fix function declaration in case SYSFS isn't defined (Parav)
> - convert auxdev->groups array with auxiliary_irqs_groups (Przemek)
> v1->v2:
> - move #ifdefs from drivers/base/auxiliary.c to
>    include/linux/auxiliary_bus.h (Greg)
> - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg)
> - Fix kzalloc(ref) to kzalloc(*ref) (Simon)
> - Add return description in auxiliary_device_sysfs_irq_add() kdoc (Simon)
> - Fix auxiliary_irq_mode_show doc (kernel test boot)
> ---
>   Documentation/ABI/testing/sysfs-bus-auxiliary |  14 ++
>   drivers/base/auxiliary.c                      | 171 +++++++++++++++++-
>   include/linux/auxiliary_bus.h                 |  24 ++-
>   3 files changed, 206 insertions(+), 3 deletions(-)
>   create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary b/Documentation/ABI/testing/sysfs-bus-auxiliary
> new file mode 100644
> index 000000000000..3b8299d49d9e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary
> @@ -0,0 +1,14 @@
> +What:		/sys/bus/auxiliary/devices/.../irqs/
> +Date:		April, 2024
> +Contact:	Shay Drory <shayd@...dia.com>
> +Description:
> +		The /sys/devices/.../irqs directory contains a variable set of
> +		files, with each file is named as irq number similar to PCI PF
> +		or VF's irq number located in msi_irqs directory.
> +
> +What:		/sys/bus/auxiliary/devices/.../irqs/<N>
> +Date:		April, 2024
> +Contact:	Shay Drory <shayd@...dia.com>
> +Description:
> +		auxiliary devices can share IRQs. This attribute indicates if
> +		the irq is shared with other SFs or exclusively used by the SF.
> diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c
> index d3a2c40c2f12..6293c6707e1e 100644
> --- a/drivers/base/auxiliary.c
> +++ b/drivers/base/auxiliary.c
> @@ -158,6 +158,169 @@
>    *	};
>    */
>   
> +#ifdef CONFIG_SYSFS
> +/* Xarray of irqs to determine if irq is exclusive or shared. */
> +static DEFINE_XARRAY(irqs);
> +/* Protects insertions into the irtqs xarray. */
> +static DEFINE_MUTEX(irqs_lock);

sorry for not catching it earlier, you don't need a separate lock,
xarray provides one, please see below [1], [2]

> +
> +struct auxiliary_irq_info {
> +	struct device_attribute sysfs_attr;
> +	int irq;
> +};
> +
> +static struct attribute *auxiliary_irq_attrs[] = {
> +	NULL
> +};
> +
> +static const struct attribute_group auxiliary_irqs_group = {
> +	.name = "irqs",
> +	.attrs = auxiliary_irq_attrs,
> +};
> +
> +static const struct attribute_group *auxiliary_irqs_groups[2] = {
> +	&auxiliary_irqs_group,
> +	NULL
> +};
> +
> +/* Auxiliary devices can share IRQs. Expose to user whether the provided IRQ is
> + * shared or exclusive.
> + */
> +static ssize_t auxiliary_irq_mode_show(struct device *dev,
> +				       struct device_attribute *attr, char *buf)
> +{
> +	struct auxiliary_irq_info *info =
> +		container_of(attr, struct auxiliary_irq_info, sysfs_attr);
> +
> +	if (refcount_read(xa_load(&irqs, info->irq)) > 1)

I didn't checked if it is possible with current implementation, but
please imagine a scenario where user open()'s sysfs file, then triggers
operation to remove irq (to call auxiliary_irq_destroy()), and only then
read()'s sysfs contents, what results in nullptr dereference (xa_load()
returning NULL). Splitting the code into two if statements would resolve
this issue.

> +		return sysfs_emit(buf, "%s\n", "shared");
> +	else
> +		return sysfs_emit(buf, "%s\n", "exclusive");
> +}
> +
> +static void auxiliary_irq_destroy(int irq)
> +{
> +	refcount_t *ref;
> +
> +	xa_lock(&irqs);
> +	ref = xa_load(&irqs, irq);
> +	if (refcount_dec_and_test(ref)) {
> +		__xa_erase(&irqs, irq);
> +		kfree(ref);
> +	}
> +	xa_unlock(&irqs);
> +}
> +
> +static int auxiliary_irq_create(int irq)
> +{
> +	refcount_t *ref;
> +	int ret = 0;
> +
> +	mutex_lock(&irqs_lock);

[1] xa_lock() instead ...

> +	ref = xa_load(&irqs, irq);
> +	if (ref && refcount_inc_not_zero(ref))
> +		goto out;

`&& refcount_inc_not_zero()` here means: leak memory and wreak havoc on
saturation, instead the logic should be:
	if (ref) {
		refcount_inc(ref);
		goto out;
	}

anyway allocating under a lock taken is not the best idea in general,
although xarray API somehow encourages this - alternative is to
preallocate and free when not used, or some lock dance that will be easy
to get wrong - and that's the raison d'etre of xa_reserve() :)

> +
> +	ref = kzalloc(sizeof(*ref), GFP_KERNEL);
> +	if (!ref) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	refcount_set(ref, 1);
> +	ret = xa_insert(&irqs, irq, ref, GFP_KERNEL);

[2] ... then __xa_insert() here

> +	if (ret)
> +		kfree(ref);
> +
> +out:
> +	mutex_unlock(&irqs_lock);
> +	return ret;
> +}
> +
> +/**
> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ
> + * @auxdev: auxiliary bus device to add the sysfs entry.
> + * @irq: The associated Linux interrupt number.
> + *
> + * This function should be called after auxiliary device have successfully
> + * received the irq.
> + *
> + * Return: zero on success or an error code on failure.
> + */
> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq)
> +{
> +	struct device *dev = &auxdev->dev;
> +	struct auxiliary_irq_info *info;
> +	int ret;
> +
> +	ret = auxiliary_irq_create(irq);
> +	if (ret)
> +		return ret;
> +
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info) {
> +		ret = -ENOMEM;
> +		goto info_err;
> +	}
> +
> +	sysfs_attr_init(&info->sysfs_attr.attr);
> +	info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq);
> +	if (!info->sysfs_attr.attr.name) {
> +		ret = -ENOMEM;
> +		goto name_err;
> +	}
> +	info->irq = irq;
> +	info->sysfs_attr.attr.mode = 0444;
> +	info->sysfs_attr.show = auxiliary_irq_mode_show;
> +
> +	ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL);
> +	if (ret)
> +		goto auxdev_xa_err;
> +
> +	ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr,
> +				      auxiliary_irqs_group.name);
> +	if (ret)
> +		goto sysfs_add_err;
> +
> +	return 0;
> +
> +sysfs_add_err:
> +	xa_erase(&auxdev->irqs, irq);
> +auxdev_xa_err:
> +	kfree(info->sysfs_attr.attr.name);
> +name_err:
> +	kfree(info);
> +info_err:
> +	auxiliary_irq_destroy(irq);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add);
> +
> +/**
> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the given IRQ
> + * @auxdev: auxiliary bus device to add the sysfs entry.
> + * @irq: the IRQ to remove.
> + *
> + * This function should be called to remove an IRQ sysfs entry.
> + */
> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq)
> +{
> +	struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq);
> +	struct device *dev = &auxdev->dev;
> +
> +	if (WARN_ON(!info))
> +		return;
> +
> +	sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr,
> +				     auxiliary_irqs_group.name);
> +	xa_erase(&auxdev->irqs, irq);
> +	kfree(info->sysfs_attr.attr.name);
> +	kfree(info);
> +	auxiliary_irq_destroy(irq);
> +}
> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove);
> +#endif
> +
>   static const struct auxiliary_device_id *auxiliary_match_id(const struct auxiliary_device_id *id,
>   							    const struct auxiliary_device *auxdev)
>   {
> @@ -295,6 +458,7 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init);
>    * __auxiliary_device_add - add an auxiliary bus device
>    * @auxdev: auxiliary bus device to add to the bus
>    * @modname: name of the parent device's driver module
> + * @irqs_sysfs_enable: whether to enable IRQs sysfs
>    *
>    * This is the third step in the three-step process to register an
>    * auxiliary_device.
> @@ -310,7 +474,8 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init);
>    * parameter.  Only if a user requires a custom name would this version be
>    * called directly.
>    */
> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname)
> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname,
> +			   bool irqs_sysfs_enable)
>   {
>   	struct device *dev = &auxdev->dev;
>   	int ret;
> @@ -325,6 +490,10 @@ int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname)
>   		dev_err(dev, "auxiliary device dev_set_name failed: %d\n", ret);
>   		return ret;
>   	}
> +	if (irqs_sysfs_enable) {
> +		dev->groups = auxiliary_irqs_groups;
> +		xa_init(&auxdev->irqs);
> +	}
>   
>   	ret = device_add(dev);
>   	if (ret)
> diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h
> index de21d9d24a95..760fadb26620 100644
> --- a/include/linux/auxiliary_bus.h
> +++ b/include/linux/auxiliary_bus.h
> @@ -58,6 +58,7 @@
>    *       in
>    * @name: Match name found by the auxiliary device driver,
>    * @id: unique identitier if multiple devices of the same name are exported,
> + * @irqs: irqs xarray contains irq indices which are used by the device,
>    *
>    * An auxiliary_device represents a part of its parent device's functionality.
>    * It is given a name that, combined with the registering drivers
> @@ -138,6 +139,7 @@
>   struct auxiliary_device {
>   	struct device dev;
>   	const char *name;
> +	struct xarray irqs;
>   	u32 id;
>   };
>   
> @@ -209,8 +211,26 @@ static inline struct auxiliary_driver *to_auxiliary_drv(struct device_driver *dr
>   }
>   
>   int auxiliary_device_init(struct auxiliary_device *auxdev);
> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname);
> -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME)
> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const char *modname,
> +			   bool irqs_sysfs_enable);
> +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, KBUILD_MODNAME, false)
> +#define auxiliary_device_add_with_irqs(auxdev) \
> +	__auxiliary_device_add(auxdev, KBUILD_MODNAME, true)
> +
> +#ifdef CONFIG_SYSFS
> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq);
> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev,
> +				       int irq);
> +#else /* CONFIG_SYSFS */
> +static inline int
> +auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq)
> +{
> +	return 0;
> +}
> +
> +static inline void
> +auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, int irq) {}
> +#endif
>   
>   static inline void auxiliary_device_uninit(struct auxiliary_device *auxdev)
>   {


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ