lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa46ce9f-75f8-49af-8fb7-37b1e698f349@nvidia.com>
Date: Wed, 8 May 2024 14:33:05 +0300
From: Shay Drori <shayd@...dia.com>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>, <netdev@...r.kernel.org>,
	<pabeni@...hat.com>, <davem@...emloft.net>, <kuba@...nel.org>,
	<edumazet@...gle.com>, <gregkh@...uxfoundation.org>,
	<david.m.ertman@...el.com>
CC: <rafael@...nel.org>, <ira.weiny@...el.com>, <linux-rdma@...r.kernel.org>,
	<leon@...nel.org>, <tariqt@...dia.com>, Parav Pandit <parav@...dia.com>
Subject: Re: [PATCH net-next v3 1/2] driver core: auxiliary bus: show
 auxiliary device IRQs



On 08/05/2024 12:34, Przemek Kitszel wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 5/8/24 06:04, Shay Drory wrote:
>> PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical
>> and virtual functions are anchored on the PCI bus;  the irq information
>> of each such function is visible to users via sysfs directory "msi_irqs"
>> containing file for each irq entry. However, for PCI SFs such information
>> is unavailable. Due to this users have no visibility on IRQs used by the
>> SFs.
>> Secondly, an SF is a multi function device supporting rdma, netdevice
>> and more. Without irq information at the bus level, the user is unable
>> to view or use the affinity of the SF IRQs.
>>
>> Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory,
>> for supporting auxiliary devices, containing file for each irq entry.
>>
>> Additionally, the PCI SFs sometimes share the IRQs with peer SFs. This
>> information is also not available to the users. To overcome this
>> limitation, each irq sysfs entry shows if irq is exclusive or shared.
>>
>> For example:
>> $ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/
>> 50  51  52  53  54  55  56  57  58
>> $ cat /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/52
>> exclusive
>>
>> Reviewed-by: Parav Pandit <parav@...dia.com>
>> Signed-off-by: Shay Drory <shayd@...dia.com>
>>
>> ---
>> v2->v3:
>> - fix function declaration in case SYSFS isn't defined (Parav)
>> - convert auxdev->groups array with auxiliary_irqs_groups (Przemek)
>> v1->v2:
>> - move #ifdefs from drivers/base/auxiliary.c to
>>    include/linux/auxiliary_bus.h (Greg)
>> - use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg)
>> - Fix kzalloc(ref) to kzalloc(*ref) (Simon)
>> - Add return description in auxiliary_device_sysfs_irq_add() kdoc (Simon)
>> - Fix auxiliary_irq_mode_show doc (kernel test boot)
>> ---
>>   Documentation/ABI/testing/sysfs-bus-auxiliary |  14 ++
>>   drivers/base/auxiliary.c                      | 171 +++++++++++++++++-
>>   include/linux/auxiliary_bus.h                 |  24 ++-
>>   3 files changed, 206 insertions(+), 3 deletions(-)
>>   create mode 100644 Documentation/ABI/testing/sysfs-bus-auxiliary
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-auxiliary 
>> b/Documentation/ABI/testing/sysfs-bus-auxiliary
>> new file mode 100644
>> index 000000000000..3b8299d49d9e
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-bus-auxiliary
>> @@ -0,0 +1,14 @@
>> +What:                /sys/bus/auxiliary/devices/.../irqs/
>> +Date:                April, 2024
>> +Contact:     Shay Drory <shayd@...dia.com>
>> +Description:
>> +             The /sys/devices/.../irqs directory contains a variable 
>> set of
>> +             files, with each file is named as irq number similar to 
>> PCI PF
>> +             or VF's irq number located in msi_irqs directory.
>> +
>> +What:                /sys/bus/auxiliary/devices/.../irqs/<N>
>> +Date:                April, 2024
>> +Contact:     Shay Drory <shayd@...dia.com>
>> +Description:
>> +             auxiliary devices can share IRQs. This attribute 
>> indicates if
>> +             the irq is shared with other SFs or exclusively used by 
>> the SF.
>> diff --git a/drivers/base/auxiliary.c b/drivers/base/auxiliary.c
>> index d3a2c40c2f12..6293c6707e1e 100644
>> --- a/drivers/base/auxiliary.c
>> +++ b/drivers/base/auxiliary.c
>> @@ -158,6 +158,169 @@
>>    *  };
>>    */
>>
>> +#ifdef CONFIG_SYSFS
>> +/* Xarray of irqs to determine if irq is exclusive or shared. */
>> +static DEFINE_XARRAY(irqs);
>> +/* Protects insertions into the irtqs xarray. */
>> +static DEFINE_MUTEX(irqs_lock);
> 
> sorry for not catching it earlier, you don't need a separate lock,
> xarray provides one, please see below [1], [2]
> 
>> +
>> +struct auxiliary_irq_info {
>> +     struct device_attribute sysfs_attr;
>> +     int irq;
>> +};
>> +
>> +static struct attribute *auxiliary_irq_attrs[] = {
>> +     NULL
>> +};
>> +
>> +static const struct attribute_group auxiliary_irqs_group = {
>> +     .name = "irqs",
>> +     .attrs = auxiliary_irq_attrs,
>> +};
>> +
>> +static const struct attribute_group *auxiliary_irqs_groups[2] = {
>> +     &auxiliary_irqs_group,
>> +     NULL
>> +};
>> +
>> +/* Auxiliary devices can share IRQs. Expose to user whether the 
>> provided IRQ is
>> + * shared or exclusive.
>> + */
>> +static ssize_t auxiliary_irq_mode_show(struct device *dev,
>> +                                    struct device_attribute *attr, 
>> char *buf)
>> +{
>> +     struct auxiliary_irq_info *info =
>> +             container_of(attr, struct auxiliary_irq_info, sysfs_attr);
>> +
>> +     if (refcount_read(xa_load(&irqs, info->irq)) > 1)
> 
> I didn't checked if it is possible with current implementation, but
> please imagine a scenario where user open()'s sysfs file, then triggers
> operation to remove irq (to call auxiliary_irq_destroy()), and only then
> read()'s sysfs contents, what results in nullptr dereference (xa_load()
> returning NULL). Splitting the code into two if statements would resolve
> this issue.

the first function in auxiliary_irq_destroy() is removing the sysfs.
I don't see how after that user can read() the sysfs...

> 
>> +             return sysfs_emit(buf, "%s\n", "shared");
>> +     else
>> +             return sysfs_emit(buf, "%s\n", "exclusive");
>> +}
>> +
>> +static void auxiliary_irq_destroy(int irq)
>> +{
>> +     refcount_t *ref;
>> +
>> +     xa_lock(&irqs);
>> +     ref = xa_load(&irqs, irq);
>> +     if (refcount_dec_and_test(ref)) {
>> +             __xa_erase(&irqs, irq);
>> +             kfree(ref);
>> +     }
>> +     xa_unlock(&irqs);
>> +}
>> +
>> +static int auxiliary_irq_create(int irq)
>> +{
>> +     refcount_t *ref;
>> +     int ret = 0;
>> +
>> +     mutex_lock(&irqs_lock);
> 
> [1] xa_lock() instead ...
> 
>> +     ref = xa_load(&irqs, irq);
>> +     if (ref && refcount_inc_not_zero(ref))
>> +             goto out;
> 
> `&& refcount_inc_not_zero()` here means: leak memory and wreak havoc on
> saturation, instead the logic should be:
>         if (ref) {
>                 refcount_inc(ref);
>                 goto out;
>         }
> 
> anyway allocating under a lock taken is not the best idea in general,
> although xarray API somehow encourages this - 

> alternative is to
> preallocate and free when not used, or some lock dance that will be easy
> to get wrong - and that's the raison d'etre of xa_reserve() :)

I don't understand what you picture here?
xa_reserve() can drop the lock while allocating the xa_entry, so how it 
will help?

> 
>> +
>> +     ref = kzalloc(sizeof(*ref), GFP_KERNEL);
>> +     if (!ref) {
>> +             ret = -ENOMEM;
>> +             goto out;
>> +     }
>> +
>> +     refcount_set(ref, 1);
>> +     ret = xa_insert(&irqs, irq, ref, GFP_KERNEL);
> 
> [2] ... then __xa_insert() here

__xa_insert() can drop the lock as well...

> 
>> +     if (ret)
>> +             kfree(ref);
>> +
>> +out:
>> +     mutex_unlock(&irqs_lock);
>> +     return ret;
>> +}
>> +
>> +/**
>> + * auxiliary_device_sysfs_irq_add - add a sysfs entry for the given IRQ
>> + * @auxdev: auxiliary bus device to add the sysfs entry.
>> + * @irq: The associated Linux interrupt number.
>> + *
>> + * This function should be called after auxiliary device have 
>> successfully
>> + * received the irq.
>> + *
>> + * Return: zero on success or an error code on failure.
>> + */
>> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, 
>> int irq)
>> +{
>> +     struct device *dev = &auxdev->dev;
>> +     struct auxiliary_irq_info *info;
>> +     int ret;
>> +
>> +     ret = auxiliary_irq_create(irq);
>> +     if (ret)
>> +             return ret;
>> +
>> +     info = kzalloc(sizeof(*info), GFP_KERNEL);
>> +     if (!info) {
>> +             ret = -ENOMEM;
>> +             goto info_err;
>> +     }
>> +
>> +     sysfs_attr_init(&info->sysfs_attr.attr);
>> +     info->sysfs_attr.attr.name = kasprintf(GFP_KERNEL, "%d", irq);
>> +     if (!info->sysfs_attr.attr.name) {
>> +             ret = -ENOMEM;
>> +             goto name_err;
>> +     }
>> +     info->irq = irq;
>> +     info->sysfs_attr.attr.mode = 0444;
>> +     info->sysfs_attr.show = auxiliary_irq_mode_show;
>> +
>> +     ret = xa_insert(&auxdev->irqs, irq, info, GFP_KERNEL);
>> +     if (ret)
>> +             goto auxdev_xa_err;
>> +
>> +     ret = sysfs_add_file_to_group(&dev->kobj, &info->sysfs_attr.attr,
>> +                                   auxiliary_irqs_group.name);
>> +     if (ret)
>> +             goto sysfs_add_err;
>> +
>> +     return 0;
>> +
>> +sysfs_add_err:
>> +     xa_erase(&auxdev->irqs, irq);
>> +auxdev_xa_err:
>> +     kfree(info->sysfs_attr.attr.name);
>> +name_err:
>> +     kfree(info);
>> +info_err:
>> +     auxiliary_irq_destroy(irq);
>> +     return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_add);
>> +
>> +/**
>> + * auxiliary_device_sysfs_irq_remove - remove a sysfs entry for the 
>> given IRQ
>> + * @auxdev: auxiliary bus device to add the sysfs entry.
>> + * @irq: the IRQ to remove.
>> + *
>> + * This function should be called to remove an IRQ sysfs entry.
>> + */
>> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device 
>> *auxdev, int irq)
>> +{
>> +     struct auxiliary_irq_info *info = xa_load(&auxdev->irqs, irq);
>> +     struct device *dev = &auxdev->dev;
>> +
>> +     if (WARN_ON(!info))
>> +             return;
>> +
>> +     sysfs_remove_file_from_group(&dev->kobj, &info->sysfs_attr.attr,
>> +                                  auxiliary_irqs_group.name);
>> +     xa_erase(&auxdev->irqs, irq);
>> +     kfree(info->sysfs_attr.attr.name);
>> +     kfree(info);
>> +     auxiliary_irq_destroy(irq);
>> +}
>> +EXPORT_SYMBOL_GPL(auxiliary_device_sysfs_irq_remove);
>> +#endif
>> +
>>   static const struct auxiliary_device_id *auxiliary_match_id(const 
>> struct auxiliary_device_id *id,
>>                                                           const struct 
>> auxiliary_device *auxdev)
>>   {
>> @@ -295,6 +458,7 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init);
>>    * __auxiliary_device_add - add an auxiliary bus device
>>    * @auxdev: auxiliary bus device to add to the bus
>>    * @modname: name of the parent device's driver module
>> + * @irqs_sysfs_enable: whether to enable IRQs sysfs
>>    *
>>    * This is the third step in the three-step process to register an
>>    * auxiliary_device.
>> @@ -310,7 +474,8 @@ EXPORT_SYMBOL_GPL(auxiliary_device_init);
>>    * parameter.  Only if a user requires a custom name would this 
>> version be
>>    * called directly.
>>    */
>> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const 
>> char *modname)
>> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const 
>> char *modname,
>> +                        bool irqs_sysfs_enable)
>>   {
>>       struct device *dev = &auxdev->dev;
>>       int ret;
>> @@ -325,6 +490,10 @@ int __auxiliary_device_add(struct 
>> auxiliary_device *auxdev, const char *modname)
>>               dev_err(dev, "auxiliary device dev_set_name failed: 
>> %d\n", ret);
>>               return ret;
>>       }
>> +     if (irqs_sysfs_enable) {
>> +             dev->groups = auxiliary_irqs_groups;
>> +             xa_init(&auxdev->irqs);
>> +     }
>>
>>       ret = device_add(dev);
>>       if (ret)
>> diff --git a/include/linux/auxiliary_bus.h 
>> b/include/linux/auxiliary_bus.h
>> index de21d9d24a95..760fadb26620 100644
>> --- a/include/linux/auxiliary_bus.h
>> +++ b/include/linux/auxiliary_bus.h
>> @@ -58,6 +58,7 @@
>>    *       in
>>    * @name: Match name found by the auxiliary device driver,
>>    * @id: unique identitier if multiple devices of the same name are 
>> exported,
>> + * @irqs: irqs xarray contains irq indices which are used by the device,
>>    *
>>    * An auxiliary_device represents a part of its parent device's 
>> functionality.
>>    * It is given a name that, combined with the registering drivers
>> @@ -138,6 +139,7 @@
>>   struct auxiliary_device {
>>       struct device dev;
>>       const char *name;
>> +     struct xarray irqs;
>>       u32 id;
>>   };
>>
>> @@ -209,8 +211,26 @@ static inline struct auxiliary_driver 
>> *to_auxiliary_drv(struct device_driver *dr
>>   }
>>
>>   int auxiliary_device_init(struct auxiliary_device *auxdev);
>> -int __auxiliary_device_add(struct auxiliary_device *auxdev, const 
>> char *modname);
>> -#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, 
>> KBUILD_MODNAME)
>> +int __auxiliary_device_add(struct auxiliary_device *auxdev, const 
>> char *modname,
>> +                        bool irqs_sysfs_enable);
>> +#define auxiliary_device_add(auxdev) __auxiliary_device_add(auxdev, 
>> KBUILD_MODNAME, false)
>> +#define auxiliary_device_add_with_irqs(auxdev) \
>> +     __auxiliary_device_add(auxdev, KBUILD_MODNAME, true)
>> +
>> +#ifdef CONFIG_SYSFS
>> +int auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, 
>> int irq);
>> +void auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev,
>> +                                    int irq);
>> +#else /* CONFIG_SYSFS */
>> +static inline int
>> +auxiliary_device_sysfs_irq_add(struct auxiliary_device *auxdev, int irq)
>> +{
>> +     return 0;
>> +}
>> +
>> +static inline void
>> +auxiliary_device_sysfs_irq_remove(struct auxiliary_device *auxdev, 
>> int irq) {}
>> +#endif
>>
>>   static inline void auxiliary_device_uninit(struct auxiliary_device 
>> *auxdev)
>>   {
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ