[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <698dd861-951b-44d9-91b0-5a39a953857c@intel.com>
Date: Fri, 26 Apr 2024 15:25:44 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jiri Pirko <jiri@...nulli.us>, Przemek Kitszel
<przemyslaw.kitszel@...el.com>
CC: "Temerkhanov, Sergey" <sergey.temerkhanov@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"Knitter, Konrad" <konrad.knitter@...el.com>, <kuba@...nel.org>,
<gregkh@...uxfoundation.org>
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: Extend auxbus device
naming
On 4/26/2024 6:43 AM, Jiri Pirko wrote:
> Fri, Apr 26, 2024 at 02:49:40PM CEST, przemyslaw.kitszel@...el.com wrote:
>> On 4/26/24 13:19, Jiri Pirko wrote:
>>> Wed, Apr 24, 2024 at 06:56:37PM CEST, jacob.e.keller@...el.com wrote:
>>>>
>>>>
>>>> On 4/24/2024 8:07 AM, Jiri Pirko wrote:
>>>>> Wed, Apr 24, 2024 at 12:03:25AM CEST, jacob.e.keller@...el.com wrote:
>>>>>>
>>>>>>
>>>>>> On 4/23/2024 6:14 AM, Jiri Pirko wrote:
>>>>>>> Tue, Apr 23, 2024 at 01:56:55PM CEST, sergey.temerkhanov@...el.com wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Jiri Pirko <jiri@...nulli.us>
>>>>>>>>> Sent: Tuesday, April 23, 2024 1:36 PM
>>>>>>>>> To: Temerkhanov, Sergey <sergey.temerkhanov@...el.com>
>>>>>>>>> Cc: intel-wired-lan@...ts.osuosl.org; netdev@...r.kernel.org; Kitszel,
>>>>>>>>> Przemyslaw <przemyslaw.kitszel@...el.com>
>>>>>>>>> Subject: Re: [PATCH iwl-next v2] ice: Extend auxbus device naming
>>>>>>>>>
>>>>>>>>> Tue, Apr 23, 2024 at 11:14:59AM CEST, sergey.temerkhanov@...el.com
>>>>>>>>> wrote:
>>>>>>>>>> Include segment/domain number in the device name to distinguish
>>>>>>>>> between
>>>>>>>>>> PCI devices located on different root complexes in multi-segment
>>>>>>>>>> configurations. Naming is changed from ptp_<bus>_<slot>_clk<clock> to
>>>>>>>>>> ptp_<domain>_<bus>_<slot>_clk<clock>
>>>>>>>>>
>>>>>>>>> I don't understand why you need to encode pci properties of a parent device
>>>>>>>>> into the auxiliary bus name. Could you please explain the motivation? Why
>>>>>>>>> you need a bus instance per PF?
>>>>>>>>>
>>>>>>>>> The rest of the auxbus registrators don't do this. Could you please align? Just
>>>>>>>>> have one bus for ice driver and that's it.
>>>>>>>>
>>>>>>>> This patch adds support for multi-segment PCIe configurations.
>>>>>>>> An auxdev is created for each adapter, which has a clock, in the system. There can be
>>>>>>>
>>>>>>> You are trying to change auxiliary bus name.
>>>>>>>
>>>>>>>
>>>>>>>> more than one adapter present, so there exists a possibility of device naming conflict.
>>>>>>>> To avoid it, auxdevs are named according to the PCI geographical addresses of the adapters.
>>>>>>>
>>>>>>> Why? It's the auxdev, the name should not contain anything related to
>>>>>>> PCI, no reason for it. I asked for motivation, you didn't provide any.
>>>>>>>
>>>>>>
>>>>>> We aren't creating one auxbus per PF. We're creating one auxbus per
>>>>>> *clock*. The device has multiple clocks in some configurations.
>>>>>
>>>>> Does not matter. Why you need separate bus for whatever-instance? Why
>>>>> can't you just have one ice-ptp bus and put devices on it?
>>>>>
>>>>>
>>>>
>>>> Because we only want ports on card A to be connected to the card owner
>>>> on card A. We were using auxiliary bus to manage this. If we use a
>>>
>>> You do that by naming auxiliary bus according to the PCI device
>>> created it, which feels obviously wrong.
>>>
>>>
>>>> single bus for ice-ptp, then we still have to implement separation
>>>> between each set of devices on a single card, which doesn't solve the
>>>> problems we had, and at least with the current code using auxiliary bus
>>>> doesn't buy us much if it doesn't solve that problem.
>>>
>>> I don't think that auxiliary bus is the answer for your problem. Please
>>> don't abuse it.
>>>
>>>>
>>>>>>
>>>>>> We need to connect each PF to the same clock controller, because there
>>>>>> is only one clock owner for the device in a multi-port card.
>>>>>
>>>>> Connect how? But putting a PF-device on a per-clock bus? That sounds
>>>>> quite odd. How did you figure out this usage of auxiliary bus?
>>>>>
>>>>>
>>>>
>>>> Yea, its a multi-function board but the functions aren't fully
>>>> independent. Each port has its own PF, but multiple ports can be tied to
>>>> the same clock. We have similar problems with a variety of HW aspects.
>>>> I've been told that the design is simpler for other operating systems,
>>>> (something about the way the subsystems work so that they expect ports
>>>> to be tied to functions). But its definitely frustrating from Linux
>>>> perspective where a single driver instance for the device would be a lot
>>>> easier to manage.
>>>
>>> You can always do it by internal accounting in ice, merge multiple PF
>>> pci devices into one internal instance. Or checkout
>>> drivers/base/component.c, perhaps it could be extended for your usecase.
>>>
>>>
>>>>
>>>>>>
>>>>>>> Again, could you please avoid creating auxiliary bus per-PF and just
>>>>>>> have one auxiliary but per-ice-driver?
>>>>>>>
>>>>>>
>>>>>> We can't have one per-ice driver, because we don't want to connect ports
>>>>> >from a different device to the same clock. If you have two ice devices
>>>>>> plugged in, the ports on each device are separate from each other.
>>>>>>
>>>>>> The goal here is to connect the clock ports to the clock owner.
>>>>>>
>>>>>> Worse, as described here, we do have some devices which combine multiple
>>>>>> adapters together and tie their clocks together via HW signaling. In
>>>>>> those cases the clocks *do* need to communicate across the device, but
>>>>>> only to other ports on the same physical device, not to ports on a
>>>>>> different device.
>>>>>>
>>>>>> Perhaps auxbus is the wrong approach here? but how else can we connect
>>>>>
>>>>> Yeah, feels quite wrong.
>>>>>
>>>>>
>>>>>> these ports without over-connecting to other ports? We could write
>>>>>> bespoke code which finds these devices, but that felt like it was risky
>>>>>> and convoluted.
>>>>>
>>>>> Sounds you need something you have for DPLL subsystem. Feels to me that
>>>>> your hw design is quite disconnected from the Linux device model :/
>>>>>
>>>>
>>>> I'm not 100% sure how DPLL handles this. I'll have to investigate.
>>>
>>> DPLL leaves the merging on DPLL level. The ice driver just register
>>> entities multiple times. It is specifically designed to fit ice needs.
>>>
>>>
>>>>
>>>> One thing I've considered a lot in the past (such as back when I was
>>>> working on devlink flash update) was to somehow have a sort of extra
>>>> layer where we could register with PCI subsystem some sort of "whole
>>>> device" driver, that would get registered first and could connect to the
>>>> rest of the function driver instances as they load. But this seems like
>>>> it would need a lot of work in the PCI layer, and apparently hasn't been
>>>> an issue for other devices? though ice is far from unique at least for
>>>> Intel NICs. Its just that the devices got significantly more complex and
>>>> functions more interdependent with this generation, and the issues with
>>>> global bits were solved in other ways: often via hiding them with
>>>> firmware >:(
>>>
>>> I think that others could benefit from such "merged device" as well. I
>>> think it is about the time to introduce it.
>>
>> so far I see that we want to merge based on different bits, but let's
>> see what will come from exploratory work that Sergey is doing right now.
>>
>> and anything that is not a device/driver feels much more lightweight to
>> operate with (think &ice_adapter, but extended with more fields).
>> Could you elaborate more on what you have in mind? (Or what you could
>> imagine reusing).
>
> Nothing concrete really. See below.
>
>>
>> offtop:
>> It will be a good hook to put resources that are shared across ports
>> under it in devlink resources, so making this "merged device" an entity
>> would enable higher layer over what we have with devlink xxx $pf.
>
> Yes. That would be great. I think we need a "device" in a sense of
> struct device instance. Then you can properly model the relationships in
> sysfs, you can have devlink for that, etc.
>
> drivers/base/component.c does merging of multiple devices, but it does
> not create a "merged device", this is missing there. So we have 2
> options:
>
> 1) extend drivers/base/component.c to allow to create a merged device
> entity
> 2) implement merged device infrastructure separatelly.
>
> IDK. I believe we need to rope more people in.
>
>
drivers/base/component.c looks pretty close to what we want. Each PF
would register as a component, and then a driver would register as the
component master. It does lack a struct device, so might be challenging
to use with devlink unless we just opted to pick a device from the
components as the main device?
extending components to have a device could be interesting, though
perhaps its not exactly the best place. It seems like components are
about combining a lot of small devices that ultimately combine into one
functionality at a logical level.
That is pretty close to what we want here: one entity to control global
portions of an otherwise multi-function card.
Extending it to include a struct device could work but I'm not 100% sure
how that fits into the component system.
We could try extending PCI subsystem to do something similar to
components which is vaguely what I described a couple replies ago.
ice_adapter is basically doing this but more bespoke and custom, and
still lacks the extra struct device.
Powered by blists - more mailing lists