[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190318122105.GH2270@nanopsycho>
Date: Mon, 18 Mar 2019 13:21:05 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Parav Pandit <parav@...lanox.com>
Cc: "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"oss-drivers@...ronome.com" <oss-drivers@...ronome.com>
Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
ports
Fri, Mar 15, 2019 at 10:59:33PM CET, parav@...lanox.com wrote:
>
>
>> -----Original Message-----
>> From: Jiri Pirko <jiri@...nulli.us>
>> Sent: Friday, March 15, 2019 3:08 PM
>> To: Parav Pandit <parav@...lanox.com>
>> Cc: Samudrala, Sridhar <sridhar.samudrala@...el.com>; Jakub Kicinski
>> <jakub.kicinski@...ronome.com>; davem@...emloft.net;
>> netdev@...r.kernel.org; oss-drivers@...ronome.com
>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
>> ports
>>
>> Fri, Mar 15, 2019 at 04:32:24PM CET, parav@...lanox.com wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Samudrala, Sridhar <sridhar.samudrala@...el.com>
>> >> Sent: Friday, March 15, 2019 12:58 AM
>> >> To: Parav Pandit <parav@...lanox.com>; Jakub Kicinski
>> >> <jakub.kicinski@...ronome.com>
>> >> Cc: Jiri Pirko <jiri@...nulli.us>; davem@...emloft.net;
>> >> netdev@...r.kernel.org; oss-drivers@...ronome.com
>> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> devlink PCI ports
>> >>
>> >>
>> >> On 3/14/2019 7:40 PM, Parav Pandit wrote:
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Samudrala, Sridhar <sridhar.samudrala@...el.com>
>> >> >> Sent: Thursday, March 14, 2019 9:16 PM
>> >> >> To: Parav Pandit <parav@...lanox.com>; Jakub Kicinski
>> >> >> <jakub.kicinski@...ronome.com>
>> >> >> Cc: Jiri Pirko <jiri@...nulli.us>; davem@...emloft.net;
>> >> >> netdev@...r.kernel.org; oss-drivers@...ronome.com
>> >> >> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> >> devlink PCI ports
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 3/14/2019 6:28 PM, Parav Pandit wrote:
>> >> >>>
>> >> >>>
>> >> >>>> -----Original Message-----
>> >> >>>> From: Jakub Kicinski <jakub.kicinski@...ronome.com>
>> >> >>>> Sent: Thursday, March 14, 2019 6:39 PM
>> >> >>>> To: Parav Pandit <parav@...lanox.com>
>> >> >>>> Cc: Jiri Pirko <jiri@...nulli.us>; davem@...emloft.net;
>> >> >>>> netdev@...r.kernel.org; oss-drivers@...ronome.com
>> >> >>>> Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on
>> >> >>>> devlink PCI ports
>> >> >>>>
>> >> >>>> On Thu, 14 Mar 2019 22:35:36 +0000, Parav Pandit wrote:
>> >> >>>>>>> Then instances of flavour pci_vf are going to appear in the
>> >> >>>>>>> same devlink instance. Those are the switch ports:
>> >> >>>>>>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
>> >> >>>>>>> flavour pci_vf pf 0 vf 0
>> >> >>>>>>> switch_id 00154d130d2f peer
>> >> >>>>>>> pci/0000:05:10.1/0
>> >> >>>>>>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
>> >> >>>>>>> flavour pci_vf pf 0 vf 0 subport 1
>> >> >>>>>>> switch_id 00154d130d2f peer
>> >> >>>>>>> pci/0000:05:10.1/1
>> >> >>>>>>>
>> >> >>>>>>> With that, peers are going to appear too, and those are the
>> >> >>>>>>> actual VF/VF
>> >> >>>>>>> subport:
>> >> >>>>>>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>> peer pci/0000:05:00.0/10002
>> >> >>>>>>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>> peer pci/0000:05:00.0/10003
>> >> >>>>>>>
>> >> >>>>>>> Later you can push this VF along with all subports to VM. So
>> >> >>>>>>> in VM, you are going to see the VF like this:
>> >> >>>>>>> $ devlink dev
>> >> >>>>>>> pci/0000:00:08.0
>> >> >>>>>>> $ devlink port
>> >> >>>>>>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
>> >> >>>>>>>
>> >> >>>>>>> And back to your question of how are they connected in eswitch.
>> >> >>>>>>> That is totally up to the original user John who did the creation.
>> >> >>>>>>> He is in charge of the eswitch on baremetal, he would
>> >> >>>>>>> configure the forwarding however he likes.
>> >> >>>>>>
>> >> >>>>>> Ack, so I think you're saying VM has to communicate to the
>> >> >>>>>> cloud environment to have this provisioned using some service
>> >> >>>>>> API, not a kernel API. That's what I wanted to confirm.
>> >> >>>>>>
>> >> >>>>>> I don't see any benefit to having the "host ports" under
>> >> >>>>>> devlink, as such I think it's a matter of preference.
>> >> >>>>>
>> >> >>>>> We need 'host ports' to configure parameters of this host port
>> >> >>>>> which is not exposed by the rep-netdev.
>> >> >>>>> Such as mac address.
>> >> >>>>
>> >> >>>> Please look at the quote of what Jiri wrote above - the host
>> >> >>>> port gets passed to the VM, you can't use it as a handle to set the
>> MAC.
>> >> >>>>
>> >> >>>> The way to set the MAC remains:
>> >> >>>>
>> >> >>>> # devlink port set pci/0000:05:00.0/10002 peer mac_addr
>> >> >>>> 00:11:22:33:44:55
>> >> >>>>
>> >> >>> Even though it can be done, I think this is wrong model to
>> >> >>> program
>> >> >> hostport mac address using eswitch port.
>> >> >>> All devlink objects are control objects, so what is passed to VM
>> >> >>> is what is
>> >> >> represented by devlink.
>> >> >>> VF in the VM will anyway create its devlink object.
>> >> >>> What is wrong in programming hostport?
>> >> >>> It gives a very clear view to users of topology and objects.
>> >> >>
>> >> >> The VF or any subport MAC address should be configured by the
>> >> >> orchestration layer that is running on the hypervisor and when a
>> >> >> VF is assigned to a VF, the host port is not visible to the hypervisor.
>> >> > What prevents creation of hostport due to which is not visible?
>> >> > Hostport is control port to program host side of parameters.
>> >> > It should be created when user wants to program the parameters.
>> >> >
>> >> > Model is really straight forward.
>> >> > Program host port params using hostport object.
>> >> > Program switchport params using rep-netdev.
>> >>
>> >> IIUC, Jiri/Jakub are proposing creation of 2 devlink objects for each
>> >> port - host facing ports and switch facing ports. This is in addition
>> >> to the netdevs that are created today.
>> >>
>> >I am not proposing any different.
>> >I am proposing only two changes.
>> >1. control hostport params via referring hostport (not via indirect
>> >peer)
>>
>> Not really possible. If you passthrough VF into VM, the hostport goes along
>> with it.
>>
>No.
>I am sorry in showing the enumeration which is the source of confusion.
>
>Below is the right enumeration.
>
>When VF is enumerated initially in the host, where eswitch devlink instance is located.
>Below enumeration is seen.
>
>First two entries shows the link between hostport and switchport.
>$ devlink port show
>pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
>
>pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002
Hostport should not have switch_id.
>
>pci/0000:05:10.1/0 eth netdev flavour hostport
>This entry won't be seen if VF auto probing is disabled. Because than VF is not enumerated.
>
>As a user, I will be programming the mac address of hostport for a VF.
>pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002
Hmm, so you are going to have 2 hostports for VF:
1) pci/0000:05:10.1/0
real one, that is going to go to VM - with a separate pci address
and devlink instance.
2) pci/0000:05:00.0/1
dummy one, which is not really a hostport, as there is no netdev
created for it. It only models the other side of cable, which is away
in VM.
>
>
>>
>> >2. flavour should not be vf/pf, flavour should be hostport, switchport.
>> >Because switch is flat and agnostic of pf/vf/mdev.
>>
>> Not sure. It's good to have this kind of visibility.
>>
>port can have label/attribute indicating that this belong to VF-1 or mdev as long as you are agreeing to have mdev attribute on host port.
>(and not ask for abstracting it, because mdev is well defined kernel object).
Why mdev cannot be another flavour?
>
>>
>> >
>> >> Are you suggesting that all the devlink objects should be visible
>> >> only at the hypervisor layer?
>> >>
>> >Of course not.
>> >
>> >Ports and params controlled by hypervisor should be exposed at
>> hypervisor/eswitch wherever its parent devlink instance exist.
>> >Ports which should be visible inside a VM should be exposed inside a VM.
>> >So for a given VF,
>> >
>> >If eswitch is at hypervisor level,
>> >$ devlink port show
>> >pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id
>> >00154d130d2f peer pci/0000:05:10.1/0
>> >pci/0000:05:10.1/0 eth netdev flavour hostport switch_id 00154d130d2f
>> >peer pci/0000:05:00.0/10002
>> >
>> >where VF is enumerated,
>> >$ devlink port show
>> >pci/0000:05:10.1/0 eth netdev flavour hostport
>>
>> So this is how it looks like in VM, right?
>>
>Yep.
>Once VF is mapped to VM only two entries are seen and hostport can be still controlled.
>
>$ devlink port show
>pci/0000:05:00.0/10002 eth netdev flavour switchport switch_id 00154d130d2f peer pci/0000:05:00.0/1
>
>pci/0000:05:00.0/1 eth netdev flavour hostport switch_id 00154d130d2f peer pci/0000:05:00.0/10002
>
>This addresses the case for Infiniband where there is no eswitch, but hostports exists and should be managed.
>We shouldn't be inventing new devlink APIs or create a fake sw eswitch object which doesn't exist in hw.
>
>>
>> >This is because unprivileged VF doesn't have visibility to eswitch and its
>> links.
>> >
>> >> I think the terminology need to be defined clearly so that we are all
>> >> on the same page.
>> >>
>> >> >
>> >> >> Currently we have ndo_set_vf_mac_addr api that works with PF
>> >> >> netdev, but i think we are trying to move away from that API and
>> >> >> do all the configuration via the port representor netdevs.
>> >> > This is fine rep-netdev represents eswitch port.
>> >> > You normally don't go to switch to program host port params.
>> >> >
>> >> >> As the mac address cannot be configured using this netdev, i think
>> >> >> Jakub is suggesting creating a devlink opject for each port
>> >> >> representor and use that interface to set peer mac address.
>> >> >
>> >> > I understand but is convoluted interface.
>> >> > When you program host NIC mac address you talk to iLo or BIOS.
>> >> > When you program switch side mac address, you go
>> switch/router/modem.
>> >> >
>> >> > Also programming host params on host side, also doesn't make
>> >> assumption that its connected to eswitch.
>> >> > It also doesn't assume that same connectivity for its life.
>> >> >
>> >> > If you model around how physical devices are configured, it will
>> >> > almost
>> >> never go wrong and still provides same level of flexibility.
>> >> >
>> >> >> We should be able use this to configure port vlan too.
>> >> >>
>> >> >> Also, instead of subport, can we call vport and support different
>> >> >> types of vports - sr-iov, siov, vmdq etc.
>> >> >>
>> >> > At switch level there are just ports.
>> >> > sriov, siov, mdev, vmdq are their couter part (peer) where it is
>> connected.
>> >> >
>> >> >>>
>> >> >>> Also eswitch is flat. There is no need of pf/vf flavour for port.
>> >> >>> It doesn't make sense to define 'mdev' flavour which we are
>> >> >>> already
>> >> >> working.
>> >> >>> At eswitch level it is just a port, it happen to be connected to
>> >> >>> vf or pf or
>> >> >> other objects, it doesn't matter.
>> >> >>> Port should be flavoured as 'hostport' or 'switchport'.
>> >> >>>
>> >> >>>
>> >> >>>> (using the port ids from above)
Powered by blists - more mailing lists