[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ybrtz77i3hbxdwau4k55xn5brsnrtyomg6u65eyqm4fh7nsnob@arqyloer2l5z>
Date: Mon, 24 Feb 2025 17:14:24 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>
Cc: intel-wired-lan@...ts.osuosl.org,
Tony Nguyen <anthony.l.nguyen@...el.com>, Jakub Kicinski <kuba@...nel.org>,
Cosmin Ratiu <cratiu@...dia.com>, Tariq Toukan <tariqt@...dia.com>, netdev@...r.kernel.org,
Konrad Knitter <konrad.knitter@...el.com>, Jacob Keller <jacob.e.keller@...el.com>, davem@...emloft.net,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Andrew Lunn <andrew@...n.ch>,
linux-kernel@...r.kernel.org, ITP Upstream <nxne.cnse.osdt.itp.upstreaming@...el.com>,
Carolina Jubran <cjubran@...dia.com>
Subject: Re: [RFC net-next v2 1/2] devlink: add whole device devlink instance
Wed, Feb 19, 2025 at 05:32:54PM +0100, przemyslaw.kitszel@...el.com wrote:
>Add a support for whole device devlink instance. Intented as a entity
>over all PF devices on given physical device.
>
>In case of ice driver we have multiple PF devices (with their devlink
>dev representation), that have separate drivers loaded. However those
>still do share lots of resources due to being the on same HW. Examples
>include PTP clock and RSS LUT. Historically such stuff was assigned to
>PF0, but that was both not clear and not working well. Now such stuff
>is moved to be covered into struct ice_adapter, there is just one instance
>of such per HW.
>
>This patch adds a devlink instance that corresponds to that ice_adapter,
>to allow arbitrage over resources (as RSS LUT) via it (further in the
>series (RFC NOTE: stripped out so far)).
>
>Thanks to Wojciech Drewek for very nice naming of the devlink instance:
>PF0: pci/0000:00:18.0
>whole-dev: pci/0000:00:18
>But I made this a param for now (driver is free to pass just "whole-dev").
>
>$ devlink dev # (Interesting part of output only)
>pci/0000:af:00:
> nested_devlink:
> pci/0000:af:00.0
> pci/0000:af:00.1
> pci/0000:af:00.2
> pci/0000:af:00.3
> pci/0000:af:00.4
> pci/0000:af:00.5
> pci/0000:af:00.6
> pci/0000:af:00.7
In general, I like this approach. In fact, I have quite similar
patch/set in my sandbox git.
The problem I didn't figure out how to handle, was a backing entity
for the parent devlink.
You use part of PCI BDF, which is obviously wrong:
1) bus_name/dev_name the user expects to be the backing device bus and
address on it (pci/usb/i2c). With using part of BDF, you break this
assumption.
2) 2 PFs can have totally different BDF (in VM for example). Then your
approach is broken.
I was thinking about having an auxiliary device created for the parent,
but auxiliary assumes it is child. The is upside-down.
I was thinking about having some sort of made-up per-driver bus, like
"ice" of "mlx5" with some thing like DSN that would act as a "dev_name".
I have a patch that introduces:
struct devlink_shared_inst;
struct devlink *devlink_shared_alloc(const struct devlink_ops *ops,
size_t priv_size, struct net *net,
struct module *module, u64 per_module_id,
void *inst_priv,
struct devlink_shared_inst **p_inst);
void devlink_shared_free(struct devlink *devlink,
struct devlink_shared_inst *inst);
I took a stab at it here:
https://github.com/jpirko/linux_mlxsw/commits/wip_dl_pfs_parent/
The work is not finished.
Also, I was thinking about having some made-up bus, like "pci_ids",
where instead of BDFs as addresses, there would be DSN for example.
None of these 3 is nice.
The shared parent entity for PFs (and other Fs) is always reference
counted, first creates, last removes. I feel like this is something
missing in PCI spec. If such beast would exist, very easy to implement
this in devlink. We have all we need in place already.
Powered by blists - more mailing lists