netdev - Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI ports

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190315070049.GD3034@nanopsycho>
Date:   Fri, 15 Mar 2019 08:00:49 +0100
From:   Jiri Pirko <jiri@...nulli.us>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     davem@...emloft.net, netdev@...r.kernel.org,
        oss-drivers@...ronome.com
Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
 ports

Thu, Mar 14, 2019 at 11:09:45PM CET, jakub.kicinski@...ronome.com wrote:
>On Thu, 14 Mar 2019 08:38:40 +0100, Jiri Pirko wrote:
>> Wed, Mar 13, 2019 at 05:55:55PM CET, jakub.kicinski@...ronome.com wrote:
>> >On Wed, 13 Mar 2019 17:22:43 +0100, Jiri Pirko wrote:  
>> >> Wed, Mar 13, 2019 at 05:17:31PM CET, jakub.kicinski@...ronome.com wrote:  
>> >> >On Wed, 13 Mar 2019 07:07:01 +0100, Jiri Pirko wrote:    
>> >> >> Tue, Mar 12, 2019 at 09:56:28PM CET, jakub.kicinski@...ronome.com wrote:    
>> >> >> >On Tue, 12 Mar 2019 15:02:39 +0100, Jiri Pirko wrote:      
>> >> >> >> Tue, Mar 12, 2019 at 03:10:54AM CET, wrote:      
>> >> >> >> >On Mon, 11 Mar 2019 09:52:04 +0100, Jiri Pirko wrote:        
>> >> >> >> >> Fri, Mar 08, 2019 at 08:09:43PM CET, wrote:        
>> >> >> >> >> >If the switchport is in the hypervisor then only the hypervisor can
>> >> >> >> >> >control switching/forwarding, correct?          
>> >> >> >> >> 
>> >> >> >> >> Correct.
>> >> >> >> >>         
>> >> >> >> >> >The primary use case for partitioning within a VM (of a VF) would be
>> >> >> >> >> >containers (and DPDK)?          
>> >> >> >> >> 
>> >> >> >> >> Makes sense.
>> >> >> >> >>         
>> >> >> >> >> >SR-IOV makes things harder.  Splitting a PF is reasonably easy to grasp.
>> >> >> >> >> >I'm trying to get a sense of is how would we control an SR-IOV
>> >> >> >> >> >environment as a whole.          
>> >> >> >> >> 
>> >> >> >> >> You mean orchestration?         
>> >> >> >> >
>> >> >> >> >Right, orchestration.
>> >> >> >> >
>> >> >> >> >To be clear on where I'm going with this - if we want to allow VFs 
>> >> >> >> >to partition themselves then they have to control what is effectively 
>> >> >> >> >a "nested" switch.  A per-VF set of rules which would the get        
>> >> >> >> 
>> >> >> >> Wait. If you allow to make VF subports (I believe that is what you ment
>> >> >> >> by VFs partition themselves), that does not mean they will have a
>> >> >> >> separate nested switch. They would still belong under the same one.      
>> >> >> >
>> >> >> >But that existing switch is administered by the hypervisor, how would
>> >> >> >the VF owners install forwarding rules in a switch they don't control?      
>> >> >> 
>> >> >> They won't.    
>> >> >
>> >> >Argh.  So how is forwarding configured if there are no rules?  Are you
>> >> >going to assume its switching on MACs?  We're supposed to offload
>> >> >software constructs.  If its a software port it needs to be explicitly
>> >> >switched.  If it's not explicitly switched - we already have macvlan
>> >> >offload.    
>> >> 
>> >> Wait a second. You configure the switch. And for that, you have the
>> >> switchports (representors). What we are talking about are VF (VF
>> >> subport) host legs. Am I missing something?  
>> >
>> >Hm :)  So when VM gets a new port, how is it connected?  Are we
>> >assuming all ports of a VM are plugged into one big L2 switch?
>> >The use case for those sub ports is a little murky, sorry about
>> >the endless confusion :)  
>> 
>> Np. When user John (on baremetal, or whenever the devlink instance
>> with switch port is) creates VF of VF subport by: 
>> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0
>> or by:
>> $ devlink dev port add pci/0000:05:00.0 flavour pci_vf pf 0 vf 0
>> 
>> Then instances of flavour pci_vf are going to appear in the same devlink
>> instance. Those are the switch ports:
>> pci/0000:05:00.0/10002: type eth netdev enp5s0npf0pf0s0
>>                         flavour pci_vf pf 0 vf 0
>>                         switch_id 00154d130d2f peer pci/0000:05:10.1/0    
>> pci/0000:05:00.0/10003: type eth netdev enp5s0npf0pf0s0
>>                         flavour pci_vf pf 0 vf 0 subport 1
>>                         switch_id 00154d130d2f peer pci/0000:05:10.1/1
>> 
>> With that, peers are going to appear too, and those are the actual VF/VF
>> subport:
>> pci/0000:05:10.1/0: type eth netdev ??? flavour pci_vf_host
>>                     peer pci/0000:05:00.0/10002
>> pci/0000:05:10.1/1: type eth netdev ??? flavour pci_vf_host
>>                     peer pci/0000:05:00.0/10003
>> 
>> Later you can push this VF along with all subports to VM. So in VM, you
>> are going to see the VF like this:
>> $ devlink dev
>> pci/0000:00:08.0
>> $ devlink port
>> pci/0000:00:08.0/0: type eth netdev ??? flavour pci_vf_host
>> pci/0000:00:08.0/1: type eth netdev ??? flavour pci_vf_host
>> 
>> And back to your question of how are they connected in eswitch.
>> That is totally up to the original user John who did the creation.
>> He is in charge of the eswitch on baremetal, he would configure
>> the forwarding however he likes.
>
>Ack, so I think you're saying VM has to communicate to the cloud
>environment to have this provisioned using some service API, not 
>a kernel API.  That's what I wanted to confirm.

Okay.

>
>I don't see any benefit to having the "host ports" under devlink,
>as such I think it's a matter of preference.  I'll try to describe 
>the two options to Netronome's FAEs and see which one they find more
>intuitive.

Yeah, the "host ports" are probably not a must. I just like to have them
for visibility purposes. No big deal to implement them.

>
>Makes sense?

Okay. Thanks!