netdev - Re: multi-function devices with global/cross-function interdependence (Was Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: Extend auxbus device naming)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cfe84890-f999-4b97-a012-6f9fd16da8e3@intel.com>
Date: Mon, 29 Apr 2024 15:28:24 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jiri Pirko <jiri@...nulli.us>, Jakub Kicinski <kuba@...nel.org>, "Russell
 King" <rmk+kernel@....linux.org.uk>, Sergey Temerkhanov
	<sergey.temerkhanov@...el.com>, Greg Kroah-Hartman
	<gregkh@...uxfoundation.org>, Michal Schmidt <mschmidt@...hat.com>, "Thomas
 Gleixner" <tglx@...utronix.de>
CC: Przemek Kitszel <przemyslaw.kitszel@...el.com>, "Temerkhanov, Sergey"
	<sergey.temerkhanov@...el.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "intel-wired-lan@...ts.osuosl.org"
	<intel-wired-lan@...ts.osuosl.org>, "Knitter, Konrad"
	<konrad.knitter@...el.com>, <kuba@...nel.org>, <gregkh@...uxfoundation.org>
Subject: Re: multi-function devices with global/cross-function interdependence
 (Was Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: Extend auxbus device
 naming)

Hi,

I'm attempting to widen the audience a bit for a discussion about the
ice driver and its current (ab)use and future of using auxiliary bus to
manage cross-function inter-dependencies.

I've included the latest discussion with Jiri, but the full context can
be read from lore.kernel.org:

https://lore.kernel.org/intel-wired-lan/20240423091459.72216-1-sergey.temerkhanov@intel.com/

There is also related work from Karol for the 2xNAC case discussed below:

https://lore.kernel.org/intel-wired-lan/20240424133542.113933-26-karol.kolacinski@intel.com/

As a short summary, ice is currently (as of at least commit d938a8cca88a
("ice: Auxbus devices & driver for E822 TS") merged in v6.7) using
auxiliary bus to handle some challenges we have when dealing with the
multi-function hardware that has global functionality that cannot be
independently controlled by each function. The specific context is the
IEEE 1588 PTP Hardware Clock functionality, but there are other areas of
the device which have similar challenges.

Other auxiliary driver and bus implementations are intended for
abstracting some device functionality into a separate driver. A single
module creates a bus, and then devices connect to it, each device then
talks to that driver.

In the ice model, each PF that owns a clock creates its own bus, and
then each port (including the PF that owns the clock) creates a device
on the matching bus for that physical PCI device.

As part of developing a new product, we were refactoring this auxiliary
bus implementation when Jiri pointed out that the entire use of
auxiliary bus seems incorrect. We are generating the bus name so that it
includes information about the PCI device, in order to ensure that ports
connecting to the bus connect to the driver "driver". In addition we're
basically *only* using this to get a reference to each driver without
really providing a coherent logical separation of functionality.

The future 2xNAC product complicates things even further, as we have in
that case multiple chips on the board, called NACs, and each of these is
a PCI device with its own functions. The hardware clock on NAC 0 is
connected to NAC 1, so that functions on NAC 1 ports share the same
clock domain as the functions on NAC 0. So in this case not only do we
need to tie functions on the same multi-function device together, but we
also need to in some boards to tie two sets of functions across two
devices together.

Jiri's arguments in the above thread have convinced me that we should
not be using auxiliary bus to solve this problem. It was never intended
to solve this kind of problem.

Fundamentally the issue is that we have a PCI device with multiple
functions, but these functions are not fully independent. Some parts of
the functionality cannot be correctly managed by all functions at once,
and functions need to be aware of what the other functions are doing.
For PTP, this means one function controlling and exposing the PTP
hardware clock which is used for timestamping across multiple functions.
There are also various global registers which modifying affects the
entire device. This is not managed very well in the existing drivers,
and breaks the simple PCI model of functions being independent devices.

Recently, Michal added struct ice_adapter to the driver. This is a
reference counted structure which each function acquires as it loads,
based on the PCI information so that all functions on a device get the
same ice_adapter. Sergey is currently investigating refactoring the rest
of the PTP logic to use ice_adapter instead of auxiliary bus.

Jiri also pointed out the component logic in drivers/base/component.c
which seems to be a subsystem extension to manage a related problem of
multiple devices that work together to provide functionality of an
aggregate device.

The component code doesn't seem to quite match what we want for ice, as
it requires the aggregate device to register. In the ice case, we might
have 2, 4, or 8 functions each with a pci_dev and then no struct device
to represent the combined adapter.

I also have considered something like an extension of the PCI driver
model to allow adding a combined instance that manages the device so
we'd have a sort of like "adapter driver" and a "function driver" or
similar. Jiri pointed out that the problem feels a bit more generic and
sort of "above" the PCI layer though.

Essentially, we want something like a device subsystem improvement where
we can have each function register to connect to a combined manager
device which can control the global functionality of the device. In the
2xNAC case, this would need to cross both NAC devices.

It is likely we can extend ice_adapter to do this for the ice driver,
but this would then be a bespoke device specific solution. It also
doesn't provide the struct device. We could re-use the struct device
from function 0, but then we aren't really representing topology of the
connected devices as neatly.

While the focus is currently on the PTP case, there are also some other
bits that could be improved such as exposing only a single devlink
instance for the whole device instead of one devlink instance per function.

I'm hoping we can get some direction on what method we should pursue,
and possibly pointers to folks who might have an interest in this, and
could help us figure out the path towards a better solution.

For now, Sergey is going to continue prototyping and experimenting with
the ice_adapter implementation.

Here follows the most recent part of the discussion:

On 4/29/2024 4:55 AM, Jiri Pirko wrote:
> Sat, Apr 27, 2024 at 12:25:44AM CEST, jacob.e.keller@...el.com wrote:
>> On 4/26/2024 6:43 AM, Jiri Pirko wrote:
>>> Fri, Apr 26, 2024 at 02:49:40PM CEST, przemyslaw.kitszel@...el.com wrote:
>>>> On 4/26/24 13:19, Jiri Pirko wrote:
>>>>> Wed, Apr 24, 2024 at 06:56:37PM CEST, jacob.e.keller@...el.com wrote:
>>>>>> On 4/24/2024 8:07 AM, Jiri Pirko wrote:
>>>>>>> Wed, Apr 24, 2024 at 12:03:25AM CEST, jacob.e.keller@...el.com wrote:
>>>>>>>> On 4/23/2024 6:14 AM, Jiri Pirko wrote:
>>>>>>>>> Tue, Apr 23, 2024 at 01:56:55PM CEST, sergey.temerkhanov@...el.com wrote:
>>>>
>>>> offtop:
>>>> It will be a good hook to put resources that are shared across ports
>>>> under it in devlink resources, so making this "merged device" an entity
>>>> would enable higher layer over what we have with devlink xxx $pf.
>>>
>>> Yes. That would be great. I think we need a "device" in a sense of
>>> struct device instance. Then you can properly model the relationships in
>>> sysfs, you can have devlink for that, etc.
>>>
>>> drivers/base/component.c does merging of multiple devices, but it does
>>> not create a "merged device", this is missing there. So we have 2
>>> options:
>>>
>>> 1) extend drivers/base/component.c to allow to create a merged device
>>>    entity
>>> 2) implement merged device infrastructure separatelly.
>>>
>>> IDK. I believe we need to rope more people in.
>>>
>>>
>>
>> drivers/base/component.c looks pretty close to what we want. Each PF
>> would register as a component, and then a driver would register as the
>> component master. It does lack a struct device, so might be challenging
>> to use with devlink unless we just opted to pick a device from the
>> components as the main device?
> 
> If I read the code correctly, the master component has to be a device as
> well. This is the missing piece I believe.
> 
> 
>>
>> extending components to have a device could be interesting, though
>> perhaps its not exactly the best place. It seems like components are
>> about combining a lot of small devices that ultimately combine into one
>> functionality at a logical level.
>>
>> That is pretty close to what we want here: one entity to control global
>> portions of an otherwise multi-function card.
>>
>> Extending it to include a struct device could work but I'm not 100% sure
>> how that fits into the component system.
> 
> Who knows? we need to rope them into this discussion...
> 



> 
>>
>> We could try extending PCI subsystem to do something similar to
>> components which is vaguely what I described a couple replies ago.
> 
> Well, feels to me this is more generic problem than PCI. One level
> above.
> 
> 
>>
>> ice_adapter is basically doing this but more bespoke and custom, and
>> still lacks the extra struct device.
> 
> Correct.
>