[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zi-KwL3WJrJd3zdR@nanopsycho>
Date: Mon, 29 Apr 2024 13:55:44 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@...el.com>,
"Temerkhanov, Sergey" <sergey.temerkhanov@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"Knitter, Konrad" <konrad.knitter@...el.com>, kuba@...nel.org,
gregkh@...uxfoundation.org
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: Extend auxbus device
naming
Sat, Apr 27, 2024 at 12:25:44AM CEST, jacob.e.keller@...el.com wrote:
>
>
>On 4/26/2024 6:43 AM, Jiri Pirko wrote:
>> Fri, Apr 26, 2024 at 02:49:40PM CEST, przemyslaw.kitszel@...el.com wrote:
>>> On 4/26/24 13:19, Jiri Pirko wrote:
>>>> Wed, Apr 24, 2024 at 06:56:37PM CEST, jacob.e.keller@...el.com wrote:
>>>>>
>>>>>
>>>>> On 4/24/2024 8:07 AM, Jiri Pirko wrote:
>>>>>> Wed, Apr 24, 2024 at 12:03:25AM CEST, jacob.e.keller@...el.com wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 4/23/2024 6:14 AM, Jiri Pirko wrote:
>>>>>>>> Tue, Apr 23, 2024 at 01:56:55PM CEST, sergey.temerkhanov@...el.com wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Jiri Pirko <jiri@...nulli.us>
>>>>>>>>>> Sent: Tuesday, April 23, 2024 1:36 PM
>>>>>>>>>> To: Temerkhanov, Sergey <sergey.temerkhanov@...el.com>
>>>>>>>>>> Cc: intel-wired-lan@...ts.osuosl.org; netdev@...r.kernel.org; Kitszel,
>>>>>>>>>> Przemyslaw <przemyslaw.kitszel@...el.com>
>>>>>>>>>> Subject: Re: [PATCH iwl-next v2] ice: Extend auxbus device naming
>>>>>>>>>>
>>>>>>>>>> Tue, Apr 23, 2024 at 11:14:59AM CEST, sergey.temerkhanov@...el.com
>>>>>>>>>> wrote:
>>>>>>>>>>> Include segment/domain number in the device name to distinguish
>>>>>>>>>> between
>>>>>>>>>>> PCI devices located on different root complexes in multi-segment
>>>>>>>>>>> configurations. Naming is changed from ptp_<bus>_<slot>_clk<clock> to
>>>>>>>>>>> ptp_<domain>_<bus>_<slot>_clk<clock>
>>>>>>>>>>
>>>>>>>>>> I don't understand why you need to encode pci properties of a parent device
>>>>>>>>>> into the auxiliary bus name. Could you please explain the motivation? Why
>>>>>>>>>> you need a bus instance per PF?
>>>>>>>>>>
>>>>>>>>>> The rest of the auxbus registrators don't do this. Could you please align? Just
>>>>>>>>>> have one bus for ice driver and that's it.
>>>>>>>>>
>>>>>>>>> This patch adds support for multi-segment PCIe configurations.
>>>>>>>>> An auxdev is created for each adapter, which has a clock, in the system. There can be
>>>>>>>>
>>>>>>>> You are trying to change auxiliary bus name.
>>>>>>>>
>>>>>>>>
>>>>>>>>> more than one adapter present, so there exists a possibility of device naming conflict.
>>>>>>>>> To avoid it, auxdevs are named according to the PCI geographical addresses of the adapters.
>>>>>>>>
>>>>>>>> Why? It's the auxdev, the name should not contain anything related to
>>>>>>>> PCI, no reason for it. I asked for motivation, you didn't provide any.
>>>>>>>>
>>>>>>>
>>>>>>> We aren't creating one auxbus per PF. We're creating one auxbus per
>>>>>>> *clock*. The device has multiple clocks in some configurations.
>>>>>>
>>>>>> Does not matter. Why you need separate bus for whatever-instance? Why
>>>>>> can't you just have one ice-ptp bus and put devices on it?
>>>>>>
>>>>>>
>>>>>
>>>>> Because we only want ports on card A to be connected to the card owner
>>>>> on card A. We were using auxiliary bus to manage this. If we use a
>>>>
>>>> You do that by naming auxiliary bus according to the PCI device
>>>> created it, which feels obviously wrong.
>>>>
>>>>
>>>>> single bus for ice-ptp, then we still have to implement separation
>>>>> between each set of devices on a single card, which doesn't solve the
>>>>> problems we had, and at least with the current code using auxiliary bus
>>>>> doesn't buy us much if it doesn't solve that problem.
>>>>
>>>> I don't think that auxiliary bus is the answer for your problem. Please
>>>> don't abuse it.
>>>>
>>>>>
>>>>>>>
>>>>>>> We need to connect each PF to the same clock controller, because there
>>>>>>> is only one clock owner for the device in a multi-port card.
>>>>>>
>>>>>> Connect how? But putting a PF-device on a per-clock bus? That sounds
>>>>>> quite odd. How did you figure out this usage of auxiliary bus?
>>>>>>
>>>>>>
>>>>>
>>>>> Yea, its a multi-function board but the functions aren't fully
>>>>> independent. Each port has its own PF, but multiple ports can be tied to
>>>>> the same clock. We have similar problems with a variety of HW aspects.
>>>>> I've been told that the design is simpler for other operating systems,
>>>>> (something about the way the subsystems work so that they expect ports
>>>>> to be tied to functions). But its definitely frustrating from Linux
>>>>> perspective where a single driver instance for the device would be a lot
>>>>> easier to manage.
>>>>
>>>> You can always do it by internal accounting in ice, merge multiple PF
>>>> pci devices into one internal instance. Or checkout
>>>> drivers/base/component.c, perhaps it could be extended for your usecase.
>>>>
>>>>
>>>>>
>>>>>>>
>>>>>>>> Again, could you please avoid creating auxiliary bus per-PF and just
>>>>>>>> have one auxiliary but per-ice-driver?
>>>>>>>>
>>>>>>>
>>>>>>> We can't have one per-ice driver, because we don't want to connect ports
>>>>>> >from a different device to the same clock. If you have two ice devices
>>>>>>> plugged in, the ports on each device are separate from each other.
>>>>>>>
>>>>>>> The goal here is to connect the clock ports to the clock owner.
>>>>>>>
>>>>>>> Worse, as described here, we do have some devices which combine multiple
>>>>>>> adapters together and tie their clocks together via HW signaling. In
>>>>>>> those cases the clocks *do* need to communicate across the device, but
>>>>>>> only to other ports on the same physical device, not to ports on a
>>>>>>> different device.
>>>>>>>
>>>>>>> Perhaps auxbus is the wrong approach here? but how else can we connect
>>>>>>
>>>>>> Yeah, feels quite wrong.
>>>>>>
>>>>>>
>>>>>>> these ports without over-connecting to other ports? We could write
>>>>>>> bespoke code which finds these devices, but that felt like it was risky
>>>>>>> and convoluted.
>>>>>>
>>>>>> Sounds you need something you have for DPLL subsystem. Feels to me that
>>>>>> your hw design is quite disconnected from the Linux device model :/
>>>>>>
>>>>>
>>>>> I'm not 100% sure how DPLL handles this. I'll have to investigate.
>>>>
>>>> DPLL leaves the merging on DPLL level. The ice driver just register
>>>> entities multiple times. It is specifically designed to fit ice needs.
>>>>
>>>>
>>>>>
>>>>> One thing I've considered a lot in the past (such as back when I was
>>>>> working on devlink flash update) was to somehow have a sort of extra
>>>>> layer where we could register with PCI subsystem some sort of "whole
>>>>> device" driver, that would get registered first and could connect to the
>>>>> rest of the function driver instances as they load. But this seems like
>>>>> it would need a lot of work in the PCI layer, and apparently hasn't been
>>>>> an issue for other devices? though ice is far from unique at least for
>>>>> Intel NICs. Its just that the devices got significantly more complex and
>>>>> functions more interdependent with this generation, and the issues with
>>>>> global bits were solved in other ways: often via hiding them with
>>>>> firmware >:(
>>>>
>>>> I think that others could benefit from such "merged device" as well. I
>>>> think it is about the time to introduce it.
>>>
>>> so far I see that we want to merge based on different bits, but let's
>>> see what will come from exploratory work that Sergey is doing right now.
>>>
>>> and anything that is not a device/driver feels much more lightweight to
>>> operate with (think &ice_adapter, but extended with more fields).
>>> Could you elaborate more on what you have in mind? (Or what you could
>>> imagine reusing).
>>
>> Nothing concrete really. See below.
>>
>>>
>>> offtop:
>>> It will be a good hook to put resources that are shared across ports
>>> under it in devlink resources, so making this "merged device" an entity
>>> would enable higher layer over what we have with devlink xxx $pf.
>>
>> Yes. That would be great. I think we need a "device" in a sense of
>> struct device instance. Then you can properly model the relationships in
>> sysfs, you can have devlink for that, etc.
>>
>> drivers/base/component.c does merging of multiple devices, but it does
>> not create a "merged device", this is missing there. So we have 2
>> options:
>>
>> 1) extend drivers/base/component.c to allow to create a merged device
>> entity
>> 2) implement merged device infrastructure separatelly.
>>
>> IDK. I believe we need to rope more people in.
>>
>>
>
>drivers/base/component.c looks pretty close to what we want. Each PF
>would register as a component, and then a driver would register as the
>component master. It does lack a struct device, so might be challenging
>to use with devlink unless we just opted to pick a device from the
>components as the main device?
If I read the code correctly, the master component has to be a device as
well. This is the missing piece I believe.
>
>extending components to have a device could be interesting, though
>perhaps its not exactly the best place. It seems like components are
>about combining a lot of small devices that ultimately combine into one
>functionality at a logical level.
>
>That is pretty close to what we want here: one entity to control global
>portions of an otherwise multi-function card.
>
>Extending it to include a struct device could work but I'm not 100% sure
>how that fits into the component system.
Who knows? we need to rope them into this discussion...
>
>We could try extending PCI subsystem to do something similar to
>components which is vaguely what I described a couple replies ago.
Well, feels to me this is more generic problem than PCI. One level
above.
>
>ice_adapter is basically doing this but more bespoke and custom, and
>still lacks the extra struct device.
Correct.
Powered by blists - more mailing lists