netdev - Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: Extend auxbus device naming

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 26 Apr 2024 15:43:42 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Przemek Kitszel <przemyslaw.kitszel@...el.com>
Cc: Jacob Keller <jacob.e.keller@...el.com>,
	"Temerkhanov, Sergey" <sergey.temerkhanov@...el.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
	"Knitter, Konrad" <konrad.knitter@...el.com>, kuba@...nel.org,
	gregkh@...uxfoundation.org
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v2] ice: Extend auxbus device
 naming

Fri, Apr 26, 2024 at 02:49:40PM CEST, przemyslaw.kitszel@...el.com wrote:
>On 4/26/24 13:19, Jiri Pirko wrote:
>> Wed, Apr 24, 2024 at 06:56:37PM CEST, jacob.e.keller@...el.com wrote:
>> > 
>> > 
>> > On 4/24/2024 8:07 AM, Jiri Pirko wrote:
>> > > Wed, Apr 24, 2024 at 12:03:25AM CEST, jacob.e.keller@...el.com wrote:
>> > > > 
>> > > > 
>> > > > On 4/23/2024 6:14 AM, Jiri Pirko wrote:
>> > > > > Tue, Apr 23, 2024 at 01:56:55PM CEST, sergey.temerkhanov@...el.com wrote:
>> > > > > > 
>> > > > > > 
>> > > > > > > -----Original Message-----
>> > > > > > > From: Jiri Pirko <jiri@...nulli.us>
>> > > > > > > Sent: Tuesday, April 23, 2024 1:36 PM
>> > > > > > > To: Temerkhanov, Sergey <sergey.temerkhanov@...el.com>
>> > > > > > > Cc: intel-wired-lan@...ts.osuosl.org; netdev@...r.kernel.org; Kitszel,
>> > > > > > > Przemyslaw <przemyslaw.kitszel@...el.com>
>> > > > > > > Subject: Re: [PATCH iwl-next v2] ice: Extend auxbus device naming
>> > > > > > > 
>> > > > > > > Tue, Apr 23, 2024 at 11:14:59AM CEST, sergey.temerkhanov@...el.com
>> > > > > > > wrote:
>> > > > > > > > Include segment/domain number in the device name to distinguish
>> > > > > > > between
>> > > > > > > > PCI devices located on different root complexes in multi-segment
>> > > > > > > > configurations. Naming is changed from ptp_<bus>_<slot>_clk<clock>  to
>> > > > > > > > ptp_<domain>_<bus>_<slot>_clk<clock>
>> > > > > > > 
>> > > > > > > I don't understand why you need to encode pci properties of a parent device
>> > > > > > > into the auxiliary bus name. Could you please explain the motivation? Why
>> > > > > > > you need a bus instance per PF?
>> > > > > > > 
>> > > > > > > The rest of the auxbus registrators don't do this. Could you please align? Just
>> > > > > > > have one bus for ice driver and that's it.
>> > > > > > 
>> > > > > > This patch adds support for multi-segment PCIe configurations.
>> > > > > > An auxdev is created for each adapter, which has a clock, in the system. There can be
>> > > > > 
>> > > > > You are trying to change auxiliary bus name.
>> > > > > 
>> > > > > 
>> > > > > > more than one adapter present, so there exists a possibility of device naming conflict.
>> > > > > > To avoid it, auxdevs are named according to the PCI geographical addresses of the adapters.
>> > > > > 
>> > > > > Why? It's the auxdev, the name should not contain anything related to
>> > > > > PCI, no reason for it. I asked for motivation, you didn't provide any.
>> > > > > 
>> > > > 
>> > > > We aren't creating one auxbus per PF. We're creating one auxbus per
>> > > > *clock*. The device has multiple clocks in some configurations.
>> > > 
>> > > Does not matter. Why you need separate bus for whatever-instance? Why
>> > > can't you just have one ice-ptp bus and put devices on it?
>> > > 
>> > > 
>> > 
>> > Because we only want ports on card A to be connected to the card owner
>> > on card A. We were using auxiliary bus to manage this. If we use a
>> 
>> You do that by naming auxiliary bus according to the PCI device
>> created it, which feels obviously wrong.
>> 
>> 
>> > single bus for ice-ptp, then we still have to implement separation
>> > between each set of devices on a single card, which doesn't solve the
>> > problems we had, and at least with the current code using auxiliary bus
>> > doesn't buy us much if it doesn't solve that problem.
>> 
>> I don't think that auxiliary bus is the answer for your problem. Please
>> don't abuse it.
>> 
>> > 
>> > > > 
>> > > > We need to connect each PF to the same clock controller, because there
>> > > > is only one clock owner for the device in a multi-port card.
>> > > 
>> > > Connect how? But putting a PF-device on a per-clock bus? That sounds
>> > > quite odd. How did you figure out this usage of auxiliary bus?
>> > > 
>> > > 
>> > 
>> > Yea, its a multi-function board but the functions aren't fully
>> > independent. Each port has its own PF, but multiple ports can be tied to
>> > the same clock. We have similar problems with a variety of HW aspects.
>> > I've been told that the design is simpler for other operating systems,
>> > (something about the way the subsystems work so that they expect ports
>> > to be tied to functions). But its definitely frustrating from Linux
>> > perspective where a single driver instance for the device would be a lot
>> > easier to manage.
>> 
>> You can always do it by internal accounting in ice, merge multiple PF
>> pci devices into one internal instance. Or checkout
>> drivers/base/component.c, perhaps it could be extended for your usecase.
>> 
>> 
>> > 
>> > > > 
>> > > > > Again, could you please avoid creating auxiliary bus per-PF and just
>> > > > > have one auxiliary but per-ice-driver?
>> > > > > 
>> > > > 
>> > > > We can't have one per-ice driver, because we don't want to connect ports
>> > > >from a different device to the same clock. If you have two ice devices
>> > > > plugged in, the ports on each device are separate from each other.
>> > > > 
>> > > > The goal here is to connect the clock ports to the clock owner.
>> > > > 
>> > > > Worse, as described here, we do have some devices which combine multiple
>> > > > adapters together and tie their clocks together via HW signaling. In
>> > > > those cases the clocks *do* need to communicate across the device, but
>> > > > only to other ports on the same physical device, not to ports on a
>> > > > different device.
>> > > > 
>> > > > Perhaps auxbus is the wrong approach here? but how else can we connect
>> > > 
>> > > Yeah, feels quite wrong.
>> > > 
>> > > 
>> > > > these ports without over-connecting to other ports? We could write
>> > > > bespoke code which finds these devices, but that felt like it was risky
>> > > > and convoluted.
>> > > 
>> > > Sounds you need something you have for DPLL subsystem. Feels to me that
>> > > your hw design is quite disconnected from the Linux device model :/
>> > > 
>> > 
>> > I'm not 100% sure how DPLL handles this. I'll have to investigate.
>> 
>> DPLL leaves the merging on DPLL level. The ice driver just register
>> entities multiple times. It is specifically designed to fit ice needs.
>> 
>> 
>> > 
>> > One thing I've considered a lot in the past (such as back when I was
>> > working on devlink flash update) was to somehow have a sort of extra
>> > layer where we could register with PCI subsystem some sort of "whole
>> > device" driver, that would get registered first and could connect to the
>> > rest of the function driver instances as they load. But this seems like
>> > it would need a lot of work in the PCI layer, and apparently hasn't been
>> > an issue for other devices? though ice is far from unique at least for
>> > Intel NICs. Its just that the devices got significantly more complex and
>> > functions more interdependent with this generation, and the issues with
>> > global bits were solved in other ways: often via hiding them with
>> > firmware >:(
>> 
>> I think that others could benefit from such "merged device" as well. I
>> think it is about the time to introduce it.
>
>so far I see that we want to merge based on different bits, but let's
>see what will come from exploratory work that Sergey is doing right now.
>
>and anything that is not a device/driver feels much more lightweight to
>operate with (think &ice_adapter, but extended with more fields).
>Could you elaborate more on what you have in mind? (Or what you could
>imagine reusing).

Nothing concrete really. See below.

>
>offtop:
>It will be a good hook to put resources that are shared across ports
>under it in devlink resources, so making this "merged device" an entity
>would enable higher layer over what we have with devlink xxx $pf.

Yes. That would be great. I think we need a "device" in a sense of
struct device instance. Then you can properly model the relationships in
sysfs, you can have devlink for that, etc.

drivers/base/component.c does merging of multiple devices, but it does
not create a "merged device", this is missing there. So we have 2
options:

1) extend drivers/base/component.c to allow to create a merged device
   entity
2) implement merged device infrastructure separatelly.

IDK. I believe we need to rope more people in.


>
>> 
>> 
>> > 
>> > 
>> > I've tried explaining the issues with this to HW designers here, but so
>> > far haven't gotten a lot of traction.
>> > 
>> > > > We could make use of ice_adapter, though we'd need some logic to manage
>> > > > devices which have multiple clocks, as well as devices like the one
>> > > > Sergey is working on which tie multiple adapters together.
>> > > 
>> > > Perhaps that is an answer. Maybe we can make DPLL much more simple after
>> > > that :)
>> > 
>> > The only major issue with ice_adapter is the case where we have one
>> > clock connected to multiple adapters, but hopefully Sergey has some
>> > ideas for how to solve that.
>> > 
>> > I think we can mostly make the rest of the logic for the usual case work
>> > via ice_adapter:
>> > 
>> > 1) we already get an ice_adapter reference during early probe
>> > 2) each PF knows whether its an owner or not from the NVM/firmware interface
>> > 3) we can move the list of ports from the auxbus thing into ice_adapter,
>> > possibly keeping one list per clock number, so that NVMs with multiple
>> > clocks enabled don't have conflicts or put all ports onto the same clock.
>> > 
>> > I'm not sure how best to solve the weird case when we have multiple
>> > adapters tied together, but perhaps something with extending how the
>> > ice_adapter lookup is done could work? Sergey, I think we can continue
>> > this design discussion off list and come up with a proposal.
>> > 
>> > We also have to be careful that whatever new solution also handles
>> > things which we got with auxiliary bus:
>> > 
>> > 1) prevent issues with ordering if a clock port loads before the clock
>> > owner PF. If the clock owner loads first, then we need to ensure the
>> > port still gets initialized. If the clock owner loads second, we need to
>> > avoid issues with things not being setup yet, i.e. ensure all the
>> > structures were already initialized (for example by initializing lists
>> > and such when we create the ice_adapter, not when the clock owner loads).
>> > 
>> > 2) prevent issues with teardown ordering that were previously serialized
>> > by the auxiliary bus unregister bits, where the driver unregister
>> > function would wait for all ports to shutdown.
>> > 
>> > I think this can be done correctly with ice_adapter, but I wanted to
>> > point them out because we get them implicitly with the current design
>> > with auxiliary bus.
>