[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23f21648-d788-40fd-a148-49766b078a4c@intel.com>
Date: Wed, 19 Mar 2025 14:20:56 +0100
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Jiri Pirko <jiri@...nulli.us>
CC: "Keller, Jacob E" <jacob.e.keller@...el.com>, "gregkh@...uxfoundation.org"
<gregkh@...uxfoundation.org>, "davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Dumazet, Eric"
<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>, "saeedm@...dia.com"
<saeedm@...dia.com>, "leon@...nel.org" <leon@...nel.org>, "tariqt@...dia.com"
<tariqt@...dia.com>, "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
"dakr@...nel.org" <dakr@...nel.org>, "rafael@...nel.org" <rafael@...nel.org>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>, "cratiu@...dia.com"
<cratiu@...dia.com>, "Knitter, Konrad" <konrad.knitter@...el.com>,
"cjubran@...dia.com" <cjubran@...dia.com>
Subject: Re: [PATCH net-next RFC 2/3] net/mlx5: Introduce shared devlink
instance for PFs on same chip
On 3/19/25 12:56, Jiri Pirko wrote:
> Wed, Mar 19, 2025 at 09:21:52AM +0100, przemyslaw.kitszel@...el.com wrote:
>> On 3/18/25 23:05, Keller, Jacob E wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jiri Pirko <jiri@...nulli.us>
>>
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
>>>> @@ -0,0 +1,150 @@
>>>> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
>>>> +/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
>>>> +
>>>> +#include <linux/device/faux.h>
>>>> +#include <linux/mlx5/driver.h>
>>>> +#include <linux/mlx5/vport.h>
>>>> +
>>>> +#include "sh_devlink.h"
>>>> +
>>>> +static LIST_HEAD(shd_list);
>>>> +static DEFINE_MUTEX(shd_mutex); /* Protects shd_list and shd->list */
>>
>> I essentially agree that faux_device could be used as-is, without any
>> devlink changes, works for me.
>> That does not remove the need to invent the name at some point ;)
>
> What name? Name of faux device? Sure. In case of ice it's probably PCI DSN
yes x2
>
>>
>> we have resolved this in similar manner, that's fine, given my
>> understanding that you cannot let faux to dispatch for you, like:
>> faux_get_instance(serial_number_equivalent)
>
> Not sure what you are asking TBH.
not asking, just noticing, with the hope that Greg will notice
inconvenience for drivers and perhaps think of something even better
anyway faux dev let's us resolve problems in rather elegant way,
compared to what we have to do before (like this series is not even
touching devlink/)
>
>
>>
>>>> +
>>>> +/* This structure represents a shared devlink instance,
>>>> + * there is one created for PF group of the same chip.
>>>> + */
>>>> +struct mlx5_shd {
>>>> + /* Node in shd list */
>>>> + struct list_head list;
>>>> + /* Serial number of the chip */
>>>> + const char *sn;
>>>> + /* List of per-PF dev instances. */
>>>> + struct list_head dev_list;
>>>> + /* Related faux device */
>>>> + struct faux_device *faux_dev;
>>>> +};
>>>> +
>>>
>>> For ice, the equivalent of this would essentially replace ice_adapter I imagine.
>>
>> or "ice_adapter will be the ice equivalent"
>
> Oh yes, that makes sense.
>
>
>>
>>>
>>>> +static const struct devlink_ops mlx5_shd_ops = {
>>
>> please double check if there is no crash for:
>> $ devlink dev info the/faux/thing
>
> Will do, but why do you think so?
My local version of the ice equivalent crashes here, due to null ops
pointers, despite all the checks in devlink core trying to prevent that.
[...]
>>>> + char *end;
>>>> + int start;
>>>> + int err;
>>>> +
>>>> + if (!mlx5_core_is_pf(dev))
>>>> + return 0;
>>>> +
>>>> + vpd_data = pci_vpd_alloc(pdev, &vpd_size);
>>>> + if (IS_ERR(vpd_data)) {
>>>> + err = PTR_ERR(vpd_data);
>>>> + return err == -ENODEV ? 0 : err;
>>
>> what? that means the shared devlink instance is something you will
>> work properly without?
>
> Not sure. This is something that should not happen for any existing
> device.
then failing is a best option to notice that :)
>
>
>>
>>>> + }
>>>> + start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size, "V3",
>>>> &kw_len);
>>>> + if (start < 0) {
>>>> + /* Fall-back to SN for older devices. */
>>>> + start = pci_vpd_find_ro_info_keyword(vpd_data, vpd_size,
>>>> +
>>>> PCI_VPD_RO_KEYWORD_SERIALNO, &kw_len);
>>>> + if (start < 0)
>>>> + return -ENOENT;
>>>> + }
>>>> + sn = kstrndup(vpd_data + start, kw_len, GFP_KERNEL);
>>>> + if (!sn)
>>>> + return -ENOMEM;
>>>> + end = strchrnul(sn, ' ');
>>>> + *end = '\0';
>>>> +
>>>> + guard(mutex)(&shd_mutex);
>>
>> guard()() is a no-no too, per "discouraged by netdev maintainers",
>> and here I'm on board with discouraging ;)
>
> That's a fight with windmills. It will happen sooner than later anyway.
> It is just too conventient. I don't understand why netdev has to have
> special treat comparing to the rest of the kernel all the time...
>
>
>>
>>>> + list_for_each_entry(shd, &shd_list, list) {
>>>> + if (!strcmp(shd->sn, sn)) {
>>>> + kfree(sn);
>>>> + goto found;
>>>> + }
>>>> + }
>>>> + shd = mlx5_shd_create(sn);
>>>> + if (!shd) {
>>>> + kfree(sn);
>>>> + return -ENOMEM;
>>>> + }
>>>
>>> How is the faux device kept in memory? I guess its reference counted
>>> somewhere?
>>
>> get_device()/put_device() with faxu_dev->dev as argument
>>
>> But I don't see that reference being incremented in the list_for_each.
>>
>> Jiri keeps "the counter" as the implicit observation of shd list size :)
>> which is protected by mutex
>
> Yep. Why isn't that enough? I need the list anyway. Plus, I'm using the
> list to reference count shd, not the fauxdev. Fauxdev is explicitly
> create/destroyed by calling appropriate function. I belive that is the
> correct way (maybe the only way?) to instantiate/deinstantiate faux.
I've just explained my reasoning why your code is correct :)
The other correct design would be to have it in the faux/ to
dispatch/de-duplicate given a string or numeric ID
Powered by blists - more mailing lists