lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <emj5f7xfdcnkemdairmpyiqmq5aoj2uqr7bxhfiezqm4zeuchu@wuplknrtviud>
Date: Wed, 19 Mar 2025 12:56:04 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>
Cc: "Keller, Jacob E" <jacob.e.keller@...el.com>, 
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>, "davem@...emloft.net" <davem@...emloft.net>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Dumazet, Eric" <edumazet@...gle.com>, 
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>, 
	"saeedm@...dia.com" <saeedm@...dia.com>, "leon@...nel.org" <leon@...nel.org>, 
	"tariqt@...dia.com" <tariqt@...dia.com>, "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>, 
	"dakr@...nel.org" <dakr@...nel.org>, "rafael@...nel.org" <rafael@...nel.org>, 
	"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, "cratiu@...dia.com" <cratiu@...dia.com>, 
	"Knitter, Konrad" <konrad.knitter@...el.com>, "cjubran@...dia.com" <cjubran@...dia.com>
Subject: Re: [PATCH net-next RFC 2/3] net/mlx5: Introduce shared devlink
 instance for PFs on same chip

Wed, Mar 19, 2025 at 09:21:52AM +0100, przemyslaw.kitszel@...el.com wrote:
>On 3/18/25 23:05, Keller, Jacob E wrote:
>> 
>> 
>> > -----Original Message-----
>> > From: Jiri Pirko <jiri@...nulli.us>
>
>> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
>> > @@ -0,0 +1,150 @@
>> > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>> > +/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
>> > +
>> > +#include <linux/device/faux.h>
>> > +#include <linux/mlx5/driver.h>
>> > +#include <linux/mlx5/vport.h>
>> > +
>> > +#include "sh_devlink.h"
>> > +
>> > +static LIST_HEAD(shd_list);
>> > +static DEFINE_MUTEX(shd_mutex); /* Protects shd_list and shd->list */
>
>I essentially agree that faux_device could be used as-is, without any
>devlink changes, works for me.
>That does not remove the need to invent the name at some point ;)

What name? Name of faux device? Sure. In case of ice it's probably PCI DSN

>
>we have resolved this in similar manner, that's fine, given my
>understanding that you cannot let faux to dispatch for you, like:
>faux_get_instance(serial_number_equivalent)

Not sure what you are asking TBH.


>
>> > +
>> > +/* This structure represents a shared devlink instance,
>> > + * there is one created for PF group of the same chip.
>> > + */
>> > +struct mlx5_shd {
>> > +	/* Node in shd list */
>> > +	struct list_head list;
>> > +	/* Serial number of the chip */
>> > +	const char *sn;
>> > +	/* List of per-PF dev instances. */
>> > +	struct list_head dev_list;
>> > +	/* Related faux device */
>> > +	struct faux_device *faux_dev;
>> > +};
>> > +
>> 
>> For ice, the equivalent of this would essentially replace ice_adapter I imagine.
>
>or "ice_adapter will be the ice equivalent"

Oh yes, that makes sense.


>
>> 
>> > +static const struct devlink_ops mlx5_shd_ops = {
>
>please double check if there is no crash for:
>$ devlink dev info the/faux/thing

Will do, but why do you think so?


>
>> > +};
>> > +
>> > +static int mlx5_shd_faux_probe(struct faux_device *faux_dev)
>> > +{
>> > +	struct devlink *devlink;
>> > +	struct mlx5_shd *shd;
>> > +
>> > +	devlink = devlink_alloc(&mlx5_shd_ops, sizeof(struct mlx5_shd),
>
>sizeof(*shd)

Sure.


>
>I like that you reuse devlink_alloc(), with allocation of priv data,
>that suits also our needs
>
>> > &faux_dev->dev);
>> > +	if (!devlink)
>> > +		return -ENOMEM;
>> > +	shd = devlink_priv(devlink);
>> > +	faux_device_set_drvdata(faux_dev, shd);
>> > +
>> > +	devl_lock(devlink);
>> > +	devl_register(devlink);
>> > +	devl_unlock(devlink);
>> > +	return 0;
>> > +}
>
>[...]
>
>> > +int mlx5_shd_init(struct mlx5_core_dev *dev)
>> > +{
>> > +	u8 *vpd_data __free(kfree) = NULL;
>
>so bad that netdev mainainers discourage __free() :(
>perhaps I should propose higher abstraction wrapper for it
>on April 1st
>
>> > +	struct pci_dev *pdev = dev->pdev;
>> > +	unsigned int vpd_size, kw_len;
>> > +	struct mlx5_shd *shd;
>> > +	const char *sn;
>
>I would extract name retrieval, perhaps mlx5_shd_get_name()?

I had that, did not really make the code nicer :)


>
>> > +	char *end;
>> > +	int start;
>> > +	int err;
>> > +
>> > +	if (!mlx5_core_is_pf(dev))
>> > +		return 0;
>> > +
>> > +	vpd_data = pci_vpd_alloc(pdev, &vpd_size);
>> > +	if (IS_ERR(vpd_data)) {
>> > +		err = PTR_ERR(vpd_data);
>> > +		return err == -ENODEV ? 0 : err;
>
>what? that means the shared devlink instance is something you will
>work properly without?

Not sure. This is something that should not happen for any existing
device.


>
>> > +	}
>> > +	start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size, "V3",
>> > &kw_len);
>> > +	if (start < 0) {
>> > +		/* Fall-back to SN for older devices. */
>> > +		start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size,
>> > +
>> > PCI_VPD_RO_KEYWORD_SERIALNO, &kw_len);
>> > +		if (start < 0)
>> > +			return -ENOENT;
>> > +	}
>> > +	sn = kstrndup(vpd_data + start, kw_len, GFP_KERNEL);
>> > +	if (!sn)
>> > +		return -ENOMEM;
>> > +	end = strchrnul(sn, ' ');
>> > +	*end = '\0';
>> > +
>> > +	guard(mutex)(&shd_mutex);
>
>guard()() is a no-no too, per "discouraged by netdev maintainers",
>and here I'm on board with discouraging ;)

That's a fight with windmills. It will happen sooner than later anyway.
It is just too conventient. I don't understand why netdev has to have
special treat comparing to the rest of the kernel all the time...


>
>> > +	list_for_each_entry(shd, &shd_list, list) {
>> > +		if (!strcmp(shd->sn, sn)) {
>> > +			kfree(sn);
>> > +			goto found;
>> > +		}
>> > +	}
>> > +	shd = mlx5_shd_create(sn);
>> > +	if (!shd) {
>> > +		kfree(sn);
>> > +		return -ENOMEM;
>> > +	}
>> 
>> How is the faux device kept in memory? I guess its reference counted
>> somewhere?
>
>get_device()/put_device() with faxu_dev->dev as argument
>
>But I don't see that reference being incremented in the list_for_each.
>
>Jiri keeps "the counter" as the implicit observation of shd list size :)
>which is protected by mutex

Yep. Why isn't that enough? I need the list anyway. Plus, I'm using the
list to reference count shd, not the fauxdev. Fauxdev is explicitly
create/destroyed by calling appropriate function. I belive that is the
correct way (maybe the only way?) to instantiate/deinstantiate faux.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ