netdev - Re: [RFC PATCH net-next] docs: net: add an explanation of VF (and other) Representors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <572c50b0-2f10-50d5-76fc-dfa409350dbe@gmail.com>
Date:   Wed, 10 Aug 2022 17:02:33 +0100
From:   Edward Cree <ecree.xilinx@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     ecree@...inx.com, netdev@...r.kernel.org, davem@...emloft.net,
        pabeni@...hat.com, edumazet@...gle.com, corbet@....net,
        linux-doc@...r.kernel.org, linux-net-drivers@....com,
        Jacob Keller <jacob.e.keller@...el.com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Michael Chan <michael.chan@...adcom.com>,
        Andy Gospodarek <andy@...yhouse.net>,
        Saeed Mahameed <saeed@...nel.org>,
        Jiri Pirko <jiri@...nulli.us>,
        Shannon Nelson <snelson@...sando.io>,
        Simon Horman <simon.horman@...igine.com>,
        Alexander Duyck <alexander.duyck@...il.com>
Subject: Re: [RFC PATCH net-next] docs: net: add an explanation of VF (and
 other) Representors

On 09/08/2022 04:41, Jakub Kicinski wrote:
>>> AFAIK there's no "management PF" in the Linux model.  
>>
>> Maybe a bad word choice.  I'm referring to whichever PF (which likely
>>  also has an ordinary netdevice) has administrative rights over the NIC /
>>  internal switch at a firmware level.  Other names I've seen tossed
>>  around include "primary PF", "admin PF".
> 
> I believe someone (mellanox?) used the term eswitch manager.
> I'd use "host PF", somehow that makes most sense to me.

Not sure about that, I've seen "host" used as antonym of "SoC", so
 if the device is configured with the SoC as the admin this could
 confuse people.
I think whatever term we settle on, this document might need to
 have a 'Definitions' section to make it clear :S

>>> What is "the PCIe controller" here? I presume you've seen the
>>> devlink-port doc.  
>>
>> Yes, that's where I got this terminology from.
>> "the" PCIe controller here is the one on which the mgmt PF lives.  For
>>  instance you might have a NIC where you run OVS on a SoC inside the
>>  chip, that has its own PCIe controller including a PF it uses to drive
>>  the hardware v-switch (so it can offload OVS rules), in addition to
>>  the PCIe controller that exposes PFs & VFs to the host you plug it
>>  into through the physical PCIe socket / edge connector.
>> In that case this bullet would refer to any additional PFs the SoC has
>>  besides the management one...
> 
> IMO the model where there's a overall controller for the entire device
> is also a mellanox limitation, due to lack of support for nested
> switches
Instead of "the PCIe controller" I should probably say "the local PCIe
 controller", since that's the wording the devlink-port doc uses.

> Say I pay for a bare metal instance in my favorite public could. 
> Why would the forwarding between VFs I spawn be controlled by the cloud
> provider and not me?!
> 
> But perhaps Netronome was the only vendor capable of nested switching.

Quite possibly.  Current EF100 NICs can't do nested switching either.

>>>> + - PFs and VFs with other personalities, including network block devices (such
>>>> +   as a vDPA virtio-blk PF backed by remote/distributed storage).  
>>>
>>> IDK how you can configure block forwarding (which is DMAs of command
>>> + data blocks, not packets AFAIU) with the networking concepts..
>>> I've not used the storage functions tho, so I could be wrong.  
>>
>> Maybe I'm way off the beam here, but my understanding is that this
>>  sort of thing involves a block interface between the host and the
>>  NIC, but then something internal to the NIC converts those
>>  operations into network operations (e.g. RDMA traffic or Ceph TCP
>>  packets), which then go out on the network to access the actual
>>  data.  In that case the back-end has to have network connectivity,
>>  and the obvious™ way to do that is give it a v-port on the v-switch
>>  just like anyone else.
> 
> I see. I don't think this covers all implementations. 

Right, I should probably make it more clear that this isn't the only
 way it could be done.
I'm merely trying to make clear that things that don't look like
 netdevices might still have a v-port and hence need a repr.

> "TX queue attached to" made me think of a netdev Tx queue with a qdisc
> rather than just a HW queue. No better ideas tho.

Would adding the word "hardware" before "TX queue" help?  Have to
 admit the netdev-queue interpretation hadn't occurred to me.

>> (And it looks like the core uses `c<N>` for my `if<N>` that you were
>>  so horrified by.  Devlink-port documentation doesn't make it super
>>  clear whether controller 0 is "the controller that's in charge" or
>>  "the controller from which we're viewing things", though I think in
>>  practice it comes to the same thing.)
> 
> I think we had a bit. Perhaps @external? The controller which doesn't
> have @external == true should be the local one IIRC. And by extension
> presumably in charge.

Yes, and that should work fine per se.  It's just not reflected in the
 phys_port_name string in any way, so legacy userland that relies on
 that won't have that piece of info (but it never did) and probably
 assumes that c0 is local.

-ed