lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 May 2020 08:52:07 +0200
From:   Jiri Pirko <jiri@...nulli.us>
To:     Jacob Keller <jacob.e.keller@...el.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
        parav@...lanox.com, yuvalav@...lanox.com, jgg@...pe.ca,
        saeedm@...lanox.com, leon@...nel.org,
        andrew.gospodarek@...adcom.com, michael.chan@...adcom.com,
        moshe@...lanox.com, ayal@...lanox.com, eranbe@...lanox.com,
        vladbu@...lanox.com, kliteyn@...lanox.com, dchickles@...vell.com,
        sburla@...vell.com, fmanlunas@...vell.com, tariqt@...lanox.com,
        oss-drivers@...ronome.com, snelson@...sando.io,
        drivers@...sando.io, aelior@...vell.com,
        GR-everest-linux-l2@...vell.com, grygorii.strashko@...com,
        mlxsw@...lanox.com, idosch@...lanox.com, markz@...lanox.com,
        valex@...lanox.com, linyunsheng@...wei.com, lihong.yang@...el.com,
        vikas.gupta@...adcom.com, sridhar.samudrala@...el.com
Subject: Re: [RFC v2] current devlink extension plan for NICs

Fri, May 15, 2020 at 11:36:19PM CEST, jacob.e.keller@...el.com wrote:
>
>
>On 5/15/2020 2:30 AM, Jiri Pirko wrote:
>> Fri, May 15, 2020 at 01:52:54AM CEST, jacob.e.keller@...el.com wrote:
>>>> $ devlink port add pci/0000.06.00.0/100 flavour pcisf pfnum 1 sfnum 10
>>>>
>>>
>>> Can you clarify what sfnum means here? and why is it different from the
>>> index? I get that the index is a unique number that identifies the port
>>> regardless of type, so sfnum must be some sort of hardware internal
>>> identifier?
>> 
>> Basically pfnum, sfnum and vfnum could overlap. Index is unique within
>> all groups together.
>> 
>
>Right. Index is just an identifier for which port this is.
>
>> 
>>>
>>> When looking at this with colleagues, there was a lot of confusion about
>>> the difference between the index and the sfnum.
>> 
>> No confusion about index and pfnum/vfnum? They behave the same.
>> Index is just a port handle.
>> 
>
>I'm less confused about the difference between index and these "nums",
>and more so questioning what pfnum/vfnum/sfnum represent? Are they
>similar to the vf ID that we have in the legacy SRIOV functions? I.e. a
>hardware index?
>
>I don't think in general users necessarily care which "index" they get
>upfront. They obviously very much care about the index once it's
>selected. I do believe the interfaces should start with the capability
>for the index to be selected automatically at creation (with the
>optional capability to select a specific index if desired, as shown here).
>
>I do not think most users want to care about what to pick for this
>number. (Just as they would not want to pick a number for the port index
>either).

I see your point. However I don't think it is always the right
scenario. The "nums" are used for naming of the netdevices, both the
eswitch port representor and the actual SF (in case of SF).

I think that in lot of usecases is more convenient for user to select
the "num" on the cmdline.



>
>> 
>>>
>>>> The devlink kernel code calls down to device driver (devlink op) and asks
>>>> it to create a SF port with particular attributes. Driver then instantiates
>>>> the SF port in the same way it is done for VF.
>>>>
>>>
>>> What do you mean by attributes here? what sort of attributes can be
>>> requested?
>> 
>> In the original slice proposal, it was possible to pass the mac address
>> too. However with new approach (port func subobject) that is not
>> possible. I'll remove this rudiment.
>> 
>
>Ok.
>
>> 
>>>
>>>>
>>>> Note that it may be possible to avoid passing port index and let the
>>>> kernel assign index for you:
>>>> $ devlink port add pci/0000.06.00.0 flavour pcisf pfnum 1 sfnum 10
>>>>
>>>> This would work in a similar way as devlink region id assignment that
>>>> is being pushed now.
>>>>
>>>
>>> Sure, this makes sense to me after seeing Jakub's recent patch for
>>> regions. I like this approach. Letting the user not have to pick an ID
>>> ahead of time is useful.
>>>
>>> Is it possible to skip providing an sfnum, and let the kernel or driver
>>> pick one? Or does that not make sense?
>> 
>> Does not. The sfnum is something that should be deterministic. The sfnum
>> is then visible on the other side on the virtbus device:
>> /sys/bus/virtbus/devices/mlx5_sf.1/sfnum
>> and it's name is generated accordingly: enp6s0f0s10
>> 
>
>Why not have the option to say "create me an sfnum and then report it to
>me" in the same way we do with region numbers now and plan to with port
>indexes?

Sure, why not.


>
>Basically: why do I as a user of the front end care what this number
>actually is? What does it represent?

See my answer above.


>
>> 
>> 
>>>
>>>> ==================================================================
>>>> ||                                                              ||
>>>> ||   VF manual creation and activation user cmdline API draft   ||
>>>> ||                                                              ||
>>>> ==================================================================
>>>>
>>>> To enter manual mode, the user has to turn off VF dummies creation:
>>>> $ devlink dev set pci/0000:06:00.0 vf_dummies disabled
>>>> $ devlink dev show
>>>> pci/0000:06:00.0: vf_dummies disabled
>>>>
>>>> It is "enabled" by default in order not to break existing users.
>>>>
>>>> By setting the "vf_dummies" attribute to "disabled", the driver
>>>> removes all dummy VFs. Only physical ports are present:
>>>>
>>>> $ devlink port show
>>>> pci/0000:06:00.0/0: flavour physical pfnum 0 type eth netdev enp6s0f0np1
>>>> pci/0000:06:00.0/1: flavour physical pfnum 1 type eth netdev enp6s0f0np2
>>>>
>>>> Then the user is able to create them in a similar way as SFs:
>>>>
>>>> $ devlink port add pci/0000:06:00.0/99 flavour pcivf pfnum 1 vfnum 8
>>>>
>>>
>>> So in this case, you have to specify the VF index to create? So this
>>> vfum is very similar to the sfnum (and pfnum?) above?
>> 
>> Yes.
>> 
>> 
>>>
>>> What about the ability to just say "please give me a VF, but I don't
>>> care which one"?
>> 
>> Well, that could be eventually done too, with Jakub's extension.
>> 
>
>Sure. I think that's what I was asking above as well. Ok.
>
>>>>
>>>>    $ devlink port show
>>>>    pci/0000:06:00.0/0: flavour physical pfnum 0 type eth netdev enp6s0f0np1
>>>>
>>>>    If there is another parent PF, say "0000:06:00.1", that share the
>>>>    same embedded switch, the aliasing is established for devlink handles.
>>>>
>>>>    The user can use devlink handles:
>>>>    pci/0000:06:00.0
>>>>    pci/0000:06:00.1
>>>>    as equivalents, pointing to the same devlink instance.
>>>>
>>>>    Parent PFs are the ones that may be in control of managing
>>>>    embedded switch, on any hierarchy leve>
>>>> 2) Child PF. This is a leg of a PF put to the parent PF. It is
>>>>    represented by a port a port with a netdevice and func:
>>>>
>>>>    $ devlink port show
>>>>    pci/0000:06:00.0/0: flavour physical pfnum 0 type eth netdev enp6s0f0np1
>>>>    pci/0000:06:00.0/1: flavour pcipf pfnum 2 type eth netdev enp6s0f0pf2
>>>>        func: hw_addr aa:bb:cc:aa:bb:87 state active
>>>>
>>>>    This is a typical smartnic scenario. You would see this list on
>>>>    the smartnic CPU. The port pci/0000:06:00.0/1 is a leg to
>>>>    one of the hosts. If you send packets to enp6s0f0pf2, they will
>>>>    go to the child PF.
>>>>
>>>>    Note that inside the host, the PF is represented again as "Parent PF"
>>>>    and may be used to configure nested embedded switch.
>>>>
>>>>
>>>
>>> I'm not sure I understand this section. Child PF? Is this like a PF in
>>> another host? Or representing the other side of the virtual link?
>> 
>> It's both actually, at the same time.
>> 
>> 
>
>Ok. I still don't think I fully grasp this yet.
>
>
>>> Obviously this is a TODO, but how does this differ from the current
>>> port_split and port_unsplit?
>> 
>> Does not have anything to do with port splitting. This is about creating
>> a "child PF" from the section above.
>> 
>
>Hmm. Ok so this is about internal connections in the switch, then?

Yes. Take the smartnic as an example. On the smartnic cpu, the
eswitch management is being done. There's devlink instance with all
eswitch port visible as devlink ports. One PF-type devlink port per
host. That are the "child PFs".

Now from perspective of the host, there are 2 scenarios:
1) have the "simple dumb" PF, which just exposes 1 netdev for host to
   run traffic over. smartnic cpu manages the VFs/SFs and sees the
   devlink ports for them. This is 1 level switch - merged switch

2) PF manages a sub-switch/nested-switch. The devlink/devlink ports are
   created on the host and the devlink ports for SFs/VFs are created
   there. This is multi-level eswitch. Each "child PF" on a parent
   manages a nested switch. And could in theory have other PF child with
   another nested switch.

Powered by blists - more mailing lists