lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM0PR05MB486640F7A27F6D894450F1C2D1660@AM0PR05MB4866.eurprd05.prod.outlook.com>
Date:   Mon, 28 Oct 2019 20:02:06 +0000
From:   Parav Pandit <parav@...lanox.com>
To:     Andy Gospodarek <andrew.gospodarek@...adcom.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>
CC:     Yuval Avnery <yuvalav@...lanox.com>, Jiri Pirko <jiri@...nulli.us>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Jiri Pirko <jiri@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "leon@...nel.org" <leon@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "shuah@...nel.org" <shuah@...nel.org>,
        Daniel Jurgens <danielj@...lanox.com>,
        Michael Chan <michael.chan@...adcom.com>
Subject: RE: [PATCH net-next 0/9] devlink vdev

Hi Andy, Jakub,

> -----Original Message-----
> From: netdev-owner@...r.kernel.org <netdev-owner@...r.kernel.org> On
> Behalf Of Andy Gospodarek
> Sent: Friday, October 25, 2019 9:58 AM
> To: Jakub Kicinski <jakub.kicinski@...ronome.com>
> Cc: Yuval Avnery <yuvalav@...lanox.com>; Jiri Pirko <jiri@...nulli.us>;
> netdev@...r.kernel.org; Jiri Pirko <jiri@...lanox.com>; Saeed Mahameed
> <saeedm@...lanox.com>; leon@...nel.org; davem@...emloft.net;
> shuah@...nel.org; Daniel Jurgens <danielj@...lanox.com>;
> andrew.gospodarek@...adcom.com; Michael Chan
> <michael.chan@...adcom.com>
> Subject: Re: [PATCH net-next 0/9] devlink vdev
> 
> On Wed, Oct 23, 2019 at 07:51:41PM -0700, Jakub Kicinski wrote:
> > On Thu, 24 Oct 2019 00:11:48 +0000, Yuval Avnery wrote:
> > > >>> We need some proper ontology and decisions what goes where. We
> > > >>> have half of port attributes duplicated here, and hw_addr which
> > > >>> honestly makes more sense in a port (since port is more of a
> > > >>> networking construct, why would ep storage have a hw_addr?).
> > > >>> Then you say you're going to dump more PCI stuff in here :(
> > > >> Well basically what this "vdev" is is the "port peer" we
> > > >> discussed couple of months ago. It provides possibility for the
> > > >> user on bare metal to cofigure things for the VF - for example.
> > > >>
> > > >> Regarding hw_addr vs. port - it is not correct to make that a
> > > >> devlink port attribute. It is not port's hw_addr, but the port's peer
> hw_addr.
> > > > Yeah, I remember us arguing with others that "the other side of
> > > > the wire" should not be a port.
> > > >
> > > >>> "vdev" sounds entirely meaningless, and has a high chance of
> > > >>> becoming a dumping ground for attributes.
> > > >> Sure, it is a madeup name. If you have a better name, please share.
> > > > IDK. I think I started the "peer" stuff, so it made sense to me.
> > > > Now it sounds like you'd like to kill a lot of problems with this
> > > > one stone. For PCIe "vdev" is def wrong because some of the config
> > > > will be for PF (which is not virtual). Also for PCIe the config
> > > > has to be done with permanence in mind from day 1, PCI often
> > > > requires HW reset to reconfig.
> > >
> > > The PF is "virtual" from the SmartNic embedded CPU point of view.
> >
> > We also want to configure PCIe on local host thru this in non-SmartNIC
> > case, having the virtual in the name would be confusing there.
> >
> > > Maybe gdev is better? (generic)
> >
How about naming it 'subdev' -> as sub device?
Since these are the sub devices of one or different class which are getting managed.

> > Let's focus on the scope and semantics of the object we are modelling
> > first. Can we talk goals, requirements, user scenarios etc.?
> >
> > IMHO the hw_addr use case is kind of weak, clouds usually do
> > tunnelling so nobody cares which MAC customer has assigned in the overlay.
> >
> > CCing Andy and Michael from Broadcom for their perspective and
> > requirements.
> 
> Thanks, Jakub, I'm happy to chime in based on our deployment experience.
> We definitely understand the desire to be able to configure properties of
> devices on the SmartNIC (the kind with general purpose cores not the kind with
> only flow offload) from the server side.
> 
> In addition to addressing NVMe devices, I'd also like to be be able to create
> virtual or real serial ports as well as there is an interest in
> *sometimes* being able to gain direct access to the SmartNIC console not just
> a shell via ssh.  So my point is that there are multiple use-cases.
> 
Yes. we also see that use case/desire of accessing it sometimes.
We believe that current direction of vdev is good starting point with your example below.
Actually want to call it 'subdev' for rest of the discussion below and in updated v1 series.

s/vdev/subdev

> Arm are also _extremely_ interested in developing a method to enable some
> form of SmartNIC discovery method and while lots of ideas have been thrown
> around, discovery via devlink is a reasonable option.  
Great. 
> So while doing all this will
> be much more work than simply handling this case where we set the peer or
> local MAC for a vdev, I think it will be worth it to make this more usable for
> all^W more types of devices.  I also agree that not everything on the other side
> of the wire should be a port.
> 
Right. Having a generic object named 'subdev' of different flavours (similar to port flavours), is very useful and extendible
For subdev flavours as,
(a) PCI PF,
(b) PCI VF,
(b) mdev
(c) serial/console device

> So if we agree that addressing this device as a PCIe device then it feels like we
> would be better served to query device capabilities and depending on what
> capabilities exist we would be able to configure properties for those.  In an
> ideal world, I could query a device using devlink ('devlink info'?) and it would
> show me different devices that are available for configuration on the SmartNIC
> and would also give me a way to address them.  So while I like the idea of being
> able to address and set parameters as shown in patch 05 of this series, I would
> like to see a bit more flexibility to define what type of device is available and
> how it might be configured.
> 
> So if we took the devlink info command as an example (whether its the proper
> place for this or not), it could look _like_ this:
> 
> $ devlink dev info pci/0000:03:00.0
$ devlink subdev info pci/0000:03:00.0
This will show all subdevices of different class and based on their class will show attributes?

I also liked your idea of resources (more than capabilities).
Using 'resource' will give visibility/information about what resources exist.
Immediate one that becomes useful is total_msix_vectors of the device and then how much of this vectors(resource) to provision to a VF.
So each subdev will show much resource is being assigned to it.

Devlink already has notion of 'resource' as described in [1]. However it is bit complex to parse and use, though mlxsw I think uses it.

Jiri,
What is your opinion on 'devlink resource'?

[1] http://man7.org/linux/man-pages/man8/devlink-resource.8.html

> pci/0000:03:00.0:
>   driver foo
>   serial_number 8675309
>   versions:
> [...]
>   capabilities:
>       storage 0
>       console 1
>       mdev 1024
>       [something else] [limit]
> 
> (Additionally rather than putting this as part of 'info' the device capabilities and
> limits could be part of the 'resource' section and frankly may make more sense
> if this is part of that.)
> 
> and then those capabilities would be something that could be set using the
> 'vdev' or whatever-it-is-named interface:
> 
> # devlink vdev show pci/0000:03:00.0
> pci/0000:03:00.0/console/0: speed 115200 device /dev/ttySNIC0
> pci/0000:03:00.0/mdev/0: hw_addr 02:00:00:00:00:00 [...]
> pci/0000:03:00.0/mdev/1023: hw_addr 02:00:00:00:03:ff
> 
> # devlink vdev set pci/0000:03:00.0/mdev/0 hw_addr 00:22:33:44:55:00
> 

> Since these Arm/RISC-V based SmartNICs are going to be used in a variety of
> different ways and will have a variety of different personalities (not just
> different SKUs that vendors will offer but different ways in which these will be
> deployed), I think it's critical that we consider more than just the
> mdev/representer case from the start.
> 
I completely agree with you.
Yuval patches are showing VF/mdev as starting example, but are not limited to it.

> > > >> Basically it is something that represents VF/mdev - the other
> > > >> side of devlink port. But in some cases, like NVMe, there is no
> > > >> associated devlink port - that is why "devlink port peer" would not work
> here.
> > > > What are the NVMe parameters we'd configure here? Queues etc. or
> > > > some IDs? Presumably there will be a NVMe-specific way to configure
> things?
> > > > Something has to point the NVMe VF to a backend, right?
> > > >
> > > > (I haven't looked much into NVMe myself in case that's not obvious
> > > > ;))
It doesn't matter a given PCI VF class is nvme/network/gpu or other.
Since devlink framework handles the generic PCI device, having well defined PCI VF object and handling generic PCI VF properties in common way is desired.

So at minimum, I see following changes to be made to this series.

1. Update the cover letter for below items.
(a) Provide crisply, limitation of $ ip link set vf mac, and how is it overcome here
(b) describe future updates to use resources for device resource configuration (one example is irq vectors)
(c) rename vdev to 'subdev' or any suggestion for better name?
(d) explicitly mention the current purpose and use case for future extension

2. update patches from vdev to subdev.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ