netdev - Re: [RFC v2] current devlink extension plan for NICs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0f75e76-e6cb-a069-b863-d09f77bc67f6@intel.com>
Date:   Thu, 14 May 2020 16:52:54 -0700
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org
Cc:     davem@...emloft.net, kuba@...nel.org, parav@...lanox.com,
        yuvalav@...lanox.com, jgg@...pe.ca, saeedm@...lanox.com,
        leon@...nel.org, andrew.gospodarek@...adcom.com,
        michael.chan@...adcom.com, moshe@...lanox.com, ayal@...lanox.com,
        eranbe@...lanox.com, vladbu@...lanox.com, kliteyn@...lanox.com,
        dchickles@...vell.com, sburla@...vell.com, fmanlunas@...vell.com,
        tariqt@...lanox.com, oss-drivers@...ronome.com,
        snelson@...sando.io, drivers@...sando.io, aelior@...vell.com,
        GR-everest-linux-l2@...vell.com, grygorii.strashko@...com,
        mlxsw@...lanox.com, idosch@...lanox.com, markz@...lanox.com,
        valex@...lanox.com, linyunsheng@...wei.com, lihong.yang@...el.com,
        vikas.gupta@...adcom.com, sridhar.samudrala@...el.com
Subject: Re: [RFC v2] current devlink extension plan for NICs



On 5/1/2020 2:14 AM, Jiri Pirko wrote:
> ==================================================================
> ||                                                              ||
> ||          SF (subfunction) user cmdline API draft             ||
> ||                                                              ||
> ==================================================================
> 
> Note that some of the "devlink port" attributes may be forgotten,
> misordered or omitted on purpose.
> 
> $ devlink port show
> pci/0000:06:00.0/0: flavour physical pfnum 0 type eth netdev enp6s0f0np1
> pci/0000:06:00.0/1: flavour physical pfnum 1 type eth netdev enp6s0f0np2
> pci/0000:06:00.0/2: flavour pcivf pfnum 0 vfnum 0 type eth netdev enp6s0pf0vf0
>                     func: hw_addr 10:22:33:44:55:66 state active
> 
> There is one VF on the NIC.
> 
> Now create subfunction of SF0 on PF1, index of the port is going to be 100:
> 

Here, you say "SF0 on PF1", but you then specify sfnum as 10 below.. Is
there some naming scheme or terminology here?

> $ devlink port add pci/0000.06.00.0/100 flavour pcisf pfnum 1 sfnum 10
> 

Can you clarify what sfnum means here? and why is it different from the
index? I get that the index is a unique number that identifies the port
regardless of type, so sfnum must be some sort of hardware internal
identifier?

When looking at this with colleagues, there was a lot of confusion about
the difference between the index and the sfnum.

> The devlink kernel code calls down to device driver (devlink op) and asks
> it to create a SF port with particular attributes. Driver then instantiates
> the SF port in the same way it is done for VF.
> 

What do you mean by attributes here? what sort of attributes can be
requested?

> 
> Note that it may be possible to avoid passing port index and let the
> kernel assign index for you:
> $ devlink port add pci/0000.06.00.0 flavour pcisf pfnum 1 sfnum 10
> 
> This would work in a similar way as devlink region id assignment that
> is being pushed now.
> 

Sure, this makes sense to me after seeing Jakub's recent patch for
regions. I like this approach. Letting the user not have to pick an ID
ahead of time is useful.

Is it possible to skip providing an sfnum, and let the kernel or driver
pick one? Or does that not make sense?

> ==================================================================
> ||                                                              ||
> ||   VF manual creation and activation user cmdline API draft   ||
> ||                                                              ||
> ==================================================================
> 
> To enter manual mode, the user has to turn off VF dummies creation:
> $ devlink dev set pci/0000:06:00.0 vf_dummies disabled
> $ devlink dev show
> pci/0000:06:00.0: vf_dummies disabled
> 
> It is "enabled" by default in order not to break existing users.
> 
> By setting the "vf_dummies" attribute to "disabled", the driver
> removes all dummy VFs. Only physical ports are present:
> 
> $ devlink port show
> pci/0000:06:00.0/0: flavour physical pfnum 0 type eth netdev enp6s0f0np1
> pci/0000:06:00.0/1: flavour physical pfnum 1 type eth netdev enp6s0f0np2
> 
> Then the user is able to create them in a similar way as SFs:
> 
> $ devlink port add pci/0000:06:00.0/99 flavour pcivf pfnum 1 vfnum 8
> 

So in this case, you have to specify the VF index to create? So this
vfum is very similar to the sfnum (and pfnum?) above?

What about the ability to just say "please give me a VF, but I don't
care which one"?

> The devlink kernel code calls down to device driver (devlink op) and asks
> it to create a VF port with particular attributes. Driver then instantiates
> the VF port with func.
> 

> 
> ==================================================================
> ||                                                              ||
> ||                             PFs                              ||
> ||                                                              ||
> ==================================================================
> 
> There are 2 flavours of PFs:
> 1) Parent PF. That is coupled with uplink port. The flavour is:
>     a) "physical" - in case the uplink port is actual port in the NIC.
>     b) "virtual" - in case this Parent PF is actually a leg to
>        upstream embedded switch.

So "physical" is for the physical NIC port. Ok. And "virtual" is one
side of an internal embedded switch. This makes sense.

> 
>    $ devlink port show
>    pci/0000:06:00.0/0: flavour physical pfnum 0 type eth netdev enp6s0f0np1
> 
>    If there is another parent PF, say "0000:06:00.1", that share the
>    same embedded switch, the aliasing is established for devlink handles.
> 
>    The user can use devlink handles:
>    pci/0000:06:00.0
>    pci/0000:06:00.1
>    as equivalents, pointing to the same devlink instance.
> 
>    Parent PFs are the ones that may be in control of managing
>    embedded switch, on any hierarchy leve>
> 2) Child PF. This is a leg of a PF put to the parent PF. It is
>    represented by a port a port with a netdevice and func:
> 
>    $ devlink port show
>    pci/0000:06:00.0/0: flavour physical pfnum 0 type eth netdev enp6s0f0np1
>    pci/0000:06:00.0/1: flavour pcipf pfnum 2 type eth netdev enp6s0f0pf2
>        func: hw_addr aa:bb:cc:aa:bb:87 state active
> 
>    This is a typical smartnic scenario. You would see this list on
>    the smartnic CPU. The port pci/0000:06:00.0/1 is a leg to
>    one of the hosts. If you send packets to enp6s0f0pf2, they will
>    go to the child PF.
> 
>    Note that inside the host, the PF is represented again as "Parent PF"
>    and may be used to configure nested embedded switch.
> 
> 

I'm not sure I understand this section. Child PF? Is this like a PF in
another host? Or representing the other side of the virtual link?
> 
> ==================================================================
> ||                                                              ||
> ||            Dynamic PFs user cmdline API draft                ||
> ||                                                              ||
> ==================================================================
> 
> User might want to create another PF, similar as VF.
> TODO
> 

Obviously this is a TODO, but how does this differ from the current
port_split and port_unsplit?