linux-kernel - Re: [RFC net-next v2 1/2] devlink: add whole device devlink instance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e21c9996-8ec7-42e9-b202-f9f502efa02d@intel.com>
Date: Tue, 25 Feb 2025 10:16:37 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>, Jiri Pirko
	<jiri@...nulli.us>
CC: <intel-wired-lan@...ts.osuosl.org>, Tony Nguyen
	<anthony.l.nguyen@...el.com>, Jakub Kicinski <kuba@...nel.org>, Cosmin Ratiu
	<cratiu@...dia.com>, Tariq Toukan <tariqt@...dia.com>,
	<netdev@...r.kernel.org>, Konrad Knitter <konrad.knitter@...el.com>,
	<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni
	<pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
	<linux-kernel@...r.kernel.org>, ITP Upstream
	<nxne.cnse.osdt.itp.upstreaming@...el.com>, Carolina Jubran
	<cjubran@...dia.com>
Subject: Re: [RFC net-next v2 1/2] devlink: add whole device devlink instance



On 2/25/2025 7:40 AM, Przemek Kitszel wrote:
> On 2/25/25 15:35, Jiri Pirko wrote:
>> Tue, Feb 25, 2025 at 12:30:49PM +0100, przemyslaw.kitszel@...el.com wrote:
>>>
>>>>> Thanks to Wojciech Drewek for very nice naming of the devlink instance:
>>>>> PF0:		pci/0000:00:18.0
>>>>> whole-dev:	pci/0000:00:18
>>>>> But I made this a param for now (driver is free to pass just "whole-dev").
>>>>>
>>>>> $ devlink dev # (Interesting part of output only)
>>>>> pci/0000:af:00:
>>>>>    nested_devlink:
>>>>>      pci/0000:af:00.0
>>>>>      pci/0000:af:00.1
>>>>>      pci/0000:af:00.2
>>>>>      pci/0000:af:00.3
>>>>>      pci/0000:af:00.4
>>>>>      pci/0000:af:00.5
>>>>>      pci/0000:af:00.6
>>>>>      pci/0000:af:00.7
>>>>
>>>>
>>>> In general, I like this approach. In fact, I have quite similar
>>>> patch/set in my sandbox git.
>>>>
>>>> The problem I didn't figure out how to handle, was a backing entity
>>>> for the parent devlink.
>>>>
>>>> You use part of PCI BDF, which is obviously wrong:
>>>> 1) bus_name/dev_name the user expects to be the backing device bus and
>>>>      address on it (pci/usb/i2c). With using part of BDF, you break this
>>>>      assumption.
>>>> 2) 2 PFs can have totally different BDF (in VM for example). Then your
>>>>      approach is broken.
>>>
>>> To make the hard part of it easy, I like to have the name to be provided
>>> by what the PF/driver has available (whichever will be the first of
>>> given device PFs), as of now, we resolve this issue (and provide ~what
>>> your devlink_shared does) via ice_adapter.
>>
>> I don't understand. Can you provide some examples please?
> 
> Right now we have one object of struct ice_adapter per device/card,
> it is refcounted and freed after last PF put()s their copy.
> In the struct one could have a mutex or spinlock to guard shared stuff,
> existing example is ptp_gltsyn_time_lock of ice driver.
> 
>>
>>
>>>
>>> Making it a devlink instance gives user an easy way to see the whole
>>> picture of all resources handled as "shared per device", my current
> 
> This part is what is missing in current devlink impl and likely would
> still be after your series. I would still like to have it :)
> (And the rest is sugar coating for me)
> 
>>> output, for all PFs and VFs on given device:
>>>
>>> pci/0000:af:00:
>>>   name rss size 8 unit entry size_min 0 size_max 24 size_gran 1
>>>     resources:
>>>       name lut_512 size 0 unit entry size_min 0 size_max 16 size_gran 1
>>>       name lut_2048 size 8 unit entry size_min 0 size_max 8 size_gran 1
>>>
>>> What is contributing to the hardness, this is not just one for all ice
>>> PFs, but one per device, which we distinguish via pci BDF.
>>
>> How?
> 
> code is in ice_adapter_index()
> Now I get what DSN is, looks like it could be used equally well instead
> pci BDF.
> 
> Still we need more instances, each card has their own PTP clock, their
> own "global RSS LUT" pool, etc.
> 
>>
>>
>>>
>>>>
>>>> I was thinking about having an auxiliary device created for the parent,
>>>> but auxiliary assumes it is child. The is upside-down.
>>>>
>>>> I was thinking about having some sort of made-up per-driver bus, like
>>>> "ice" of "mlx5" with some thing like DSN that would act as a "dev_name".
>>>> I have a patch that introduces:
>>>>
>>>> struct devlink_shared_inst;
>>>>
>>>> struct devlink *devlink_shared_alloc(const struct devlink_ops *ops,
>>>>                                        size_t priv_size, struct net *net,
>>>>                                        struct module *module, u64 per_module_id,
>>>>                                        void *inst_priv,
>>>>                                        struct devlink_shared_inst **p_inst);
>>>> void devlink_shared_free(struct devlink *devlink,
>>>>                           struct devlink_shared_inst *inst);
>>>>
>>>> I took a stab at it here:
>>>> https://github.com/jpirko/linux_mlxsw/commits/wip_dl_pfs_parent/
>>>> The work is not finished.
>>>>
>>>>
>>>> Also, I was thinking about having some made-up bus, like "pci_ids",
>>>> where instead of BDFs as addresses, there would be DSN for example.
>>>>
>>>> None of these 3 is nice.
>>>
>>> how one would invent/infer/allocate the DSN?
>>
>> Driver knows DSN, it can obtain from pci layer.
> 
> Aaach, I got the abbreviation wrong, pci_get_dsn() does the thing, thank
> you. BTW, again, by Jake :D
> 

I agree DSN is a good choice, but I will point out one potential issue,
at least for early development: A lot of pre-production cards I've
worked with in the past fail to have unique DSN. At least for Intel
cards it is typically stored in the NVM flash memory. A normal flash
update process will keep the same DSN, but if you have to do something
like dediprog to recover the card the DSN can be erased. This can make
using it as a unique identifier like this potentially problematic if
your test system has multiple pre-production cards.

Not a deal breaker, but just a warning if you run into that while
testing/working on an implementation.