netdev - Re: [patch net-next 0/4] net/mlx5: expose peer SF devlink instance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZORXVr4bcTlbstj8@nanopsycho>
Date: Tue, 22 Aug 2023 08:36:06 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, pabeni@...hat.com, davem@...emloft.net,
	edumazet@...gle.com, moshe@...dia.com, saeedm@...dia.com,
	shayd@...dia.com, leon@...nel.org
Subject: Re: [patch net-next 0/4] net/mlx5: expose peer SF devlink instance 

Mon, Aug 21, 2023 at 10:19:37PM CEST, kuba@...nel.org wrote:
>On Mon, 21 Aug 2023 12:49:54 +0200 Jiri Pirko wrote:
>> Fri, Aug 18, 2023 at 11:20:07PM CEST, kuba@...nel.org wrote:
>> >On Fri, 18 Aug 2023 09:30:17 +0200 Jiri Pirko wrote:  
>> >> SF devlink instance is created in init_ns and can move to another one.
>> >> So no.
>> >> 
>> >> I was thinking about this, as with the devlink handles we are kind of in
>> >> between sysfs and network. We have concept of network namespace in
>> >> devlink, but mainly because of the related netdevices.
>> >> 
>> >> There is no possibility of collision of devlink handles in between
>> >> separate namespaces, the handle is ns-unaware. Therefore the linkage to
>> >> instance in different ns is okay, I believe. Even more, It is handy as
>> >> the user knows that there exists such linkage.
>> >> 
>> >> What do you think?  
>> 
>> First of all, I'm having difficulties to understand exactly what you
>> say. I'll try my best with the reply :)
>> 
>> >The way I was thinking about it is that the placement of the dl
>> >instance should correspond to the entity which will be configuring it.
>> >
>> >Assume a typical container setup where app has net admin in its
>> >netns and there is an orchestration daemon with root in init_net 
>> >which sets the containers up.
>> >
>> >Will we ever want the app inside the netns to configure the interface
>> >via the dl instance? Given that the SF is like giving the container
>> >full access to the HW it seems to me that we should also delegate   
>> 
>> Nope. SF has limitations that could be set by devlink port function
>> caps. So no full HW access.
>> 
>> 
>> >the devlink control to the app, i.e. move it to the netns?
>> >
>> >Same thing for devlink instances of VFs.  
>> 
>> Like VFs, SFs are getting probed by mlx5 driver. Both create the devlink
>> instances in init_ns. For both the user can reload them to a different
>> netns. It's consistent approach.
>> 
>> I see a possibility to provide user another ATTR to pass during SF
>> activation that would indicate the netns new instance is going to be
>> created in (of course only if it is local). That would provide
>> the flexibility to solve the case you are looking for I believe.
>> ***
>>
>> >The orchestration daemon has access to the "PF" / main dl instance of
>> >the device, and to the ports / port fns so it has other ways to control
>> >the HW. While the app would otherwise have no devlink access.
>> >
>> >So my intuition is that the devlink instance should follow the SF
>> >netdev into a namespace.  
>> 
>> It works the other way around. The only way to change devlink netns is
>> to reload the instance to a different netns. The related
>> netdevice/netdevices are reinstantiated to that netns. If later on the
>> user decides to move a netdev to a different netns, he can do it.
>> 
>> This behavious is consistent for all devlink instances, devlink port and
>> related netdevice/netdevices, no matter if there is only one netdevice
>> of more. What you suggest, I can't see how that could work when instance
>> have multiple netdevices.
>
>Netdevs can move to netns without their devlink following (leaving
>representors aside). We can't change that because uAPI.
>But can we make it impossible to move SFs by themselves and require
>devlink reload to move them?

That is how we currently implement SFs in mlx5. Example:
$ sudo devlink dev eswitch set pci/0000:08:00.0 mode switchdev
$ sudo devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 106
pci/0000:08:00.0/32768: type eth netdev eth4 flavour pcisf controller 0 pfnum 0 sfnum 106 splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached roce enable
$ sudo devlink port function set pci/0000:08:00.0/32768 state active
$ devlink dev
pci/0000:08:00.0
pci/0000:08:00.1
auxiliary/mlx5_core.sf.2
$ sudo ip netns add ns1
$ sudo devlink dev reload auxiliary/mlx5_core.sf.2 netns ns1
$ devlink dev
pci/0000:08:00.0
pci/0000:08:00.1
$ sudo ip netns exec ns1 devlink dev
auxiliary/mlx5_core.sf.2


>
>> >And then the next question is - once the devlink instances are in
>> >different namespaces - do we still show the "nested_devlink" attribute?
>> >Probably yes but we need to add netns id / link as well?  
>> 
>> Not sure what is the usecase. Currently, once VFs/SFs/ could be probed
>> and devlink instance created in init_ns, the orchestrator does not need
>> this info.
>> 
>> In future, if the extension I suggested above (***) would be
>> implemented, the orchestrator still knows the netns he asked the
>> instance to be created in.
>> 
>> So I would say is it not needed for anything. Plus it would make code
>> more complex making sure the notifications are coming in case of SF
>> devlink instance netns changes.
>> 
>> So do you see the usecase? If not, I would like to go with what I have
>> in this patchset version.
>
>I'm thinking about containers. Since the SF configuration is currently
>completely vendor ad-hoc I'm trying to establish who's supposed to be
>in control of the devlink instance of an SF - orchestrator or the
>workload. We should pick one and force everyone to fall in line.

I think that both are valid. In the VF case, the workload (VM) owns the
devlink instance and netdev. In the SF case:
1) It could be the same. You can reload SF into netns, then
   the container has them both. That would provide the container
   more means (e.g. configuration of rdma,netdev,vdev etc).
2) Or, your can only put netdev into netns.

Both usecases are valid. But back to my question regarding to this
patchsets. Do you see the need to expose netns for nested port function
devlink instance? Even now, I still don't.