netdev - Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI ports

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 4 Mar 2019 17:03:20 -0800
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     Jason Gunthorpe <jgg@...lanox.com>
Cc:     Jiri Pirko <jiri@...nulli.us>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "oss-drivers@...ronome.com" <oss-drivers@...ronome.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Parav Pandit <parav@...lanox.com>
Subject: Re: [PATCH net-next 4/8] devlink: allow subports on devlink PCI
 ports

On Mon, 4 Mar 2019 16:15:14 +0000, Jason Gunthorpe wrote:
> On Wed, Feb 27, 2019 at 10:30:00AM -0800, Jakub Kicinski wrote:
> > On Wed, 27 Feb 2019 13:37:53 +0100, Jiri Pirko wrote:  
> > > Tue, Feb 26, 2019 at 07:24:32PM CET, jakub.kicinski@...ronome.com wrote:  
> > > >PCI endpoint corresponds to a PCI device, but such device
> > > >can have one more more logical device ports associated with it.
> > > >We need a way to distinguish those. Add a PCI subport in the
> > > >dumps and print the info in phys_port_name appropriately.
> > > >
> > > >This is not equivalent to port splitting, there is no split
> > > >group. It's just a way of representing multiple netdevs on
> > > >a single PCI function.
> > > >
> > > >Note that the quality of being multiport pertains only to
> > > >the PCI function itself. A PF having multiple netdevs does
> > > >not mean that its VFs will also have multiple, or that VFs
> > > >are associated with any particular port of a multiport VF.  
> > > 
> > > We've been discussing the problem of subport (we call it "subfunction"
> > > or "SF") for some time internally. Turned out, this is probably harder
> > > task to model. Please prove me wrong.
> > > 
> > > The nature of VF makes it a logically separate entity. It has a separate
> > > PCI address, it should therefore have a separate devlink instance.
> > > You can pass it through to VM, then the same devlink instance should be
> > > created inside the VM and disappear from the host.  
> > 
> > Depends what a devlink instance represents :/  On one hand you may want
> > to create an instance for a VF to allow it to spawn soft ports, on the
> > other you may want to group multiple functions together.
> > 
> > IOW if devlink instance is for an ASIC, there should be one per device
> > per host.    
> 
> Don't we already have devlink instances for every mlx5 physical port
> and VF as they are unique PCI functions?

That's a very NIC-centric view of the world, though.  Equating devlink
instances to ports, and further to PCI devices.  Its fundamentally
different from what switches and some NICs do, where all ports are under
single devlink instance.

> > You guys come from the RDMA side of the world, with which I'm less
> > familiar, and the soft bus + spawning devices seems to be a popular
> > design there.  Could you describe the advantages of that model for 
> > the sake of the netdev-only folks? :)  
> 
> I don't think we do this in RDMA at all yet, or maybe I'm not sure
> what you are thinking of?

Mm.. I caught an Intel patch set recently which was talking about buses
and spawning devices.  It must have been a different kettle of fish.

> The forward looking discussion is mainly to create something like
> macvlan that can be offloaded, so we have something like a 'rdma
> offload HW context' for each 'rdma macvlan'
> 
> .. and that 'rdma offload HW context' has all the same knobs as an
> offload context for a VF that would normally be tied to a unique PCI
> BDF (via devlink). But in this case there is no unique BDF.
> 
> From another angle this is broadly similar to the scalable IOV stuff,
> but without placing requirements on the host IOMMU to implement it.

Understood, thanks for clarifying.  The question becomes how do we
square this SR-IOV world with world of switches.  Is the hypervisor
supposed to know that the VM has partitioned its VF?