[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191216124441.634ea8ea@cakuba.netronome.com>
Date: Mon, 16 Dec 2019 12:44:41 -0800
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: Yuval Avnery <yuvalav@...lanox.com>
Cc: Jiri Pirko <jiri@...lanox.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andy Gospodarek <andy@...yhouse.net>,
Daniel Jurgens <danielj@...lanox.com>
Subject: Re: [PATCH net-next] netdevsim: Add max_vfs to bus_dev
On Fri, 13 Dec 2019 20:05:00 +0000, Yuval Avnery wrote:
> > On Fri, 13 Dec 2019 03:21:02 +0000, Yuval Avnery wrote:
> > > > I see, is this a more fine grained capability or all or nothing for SR-IOV control?
> > > > I'd think that if the SmartNIC's eswitch just encapsulates all the
> > > > frames into a
> > > > L4 tunnel it shouldn't care about L2 addresses.
> > >
> > > People keep saying that, but there are customers who wants this
> > > capability :)
> >
> > Right, but we should have a plan for both, right? Some form of a switch
> > between L4/no checking/ip link changes are okay vs strict checking/L2/
> > SmartNIC provisions MAC addrs?
>
> I am not sure I understand
> The L2 checks will be on NIC, not on the switch.
> Packet decapsulated and forwarded to the NIC, Where the MAC matters..
If there is tunnelling involved where customer's L2 is not visible to
the provider underlay why does the host ip-link not have a permission
to change the MAC address?
The NIC CPU can just learn about the customer MAC change and configure
the overlay forwarding appropriately.
> > > > > > What happens if the SR-IOV host changes the MAC? Is it used by
> > > > > > HW or is the MAC provisioned by the control CPU used for things
> > > > > > like spoof check?
> > > > >
> > > > > Host shouldn't have privileges to do it.
> > > > > If it does, then it's under the host ownership (like in non-smartnic mode).
> > > >
> > > > I see so the MAC is fixed from bare metal host's PoV? And it has to
> > > > be set
> > >
> > > Yes
> > >
> > > > through some high level cloud API (for live migration etc)?
> > > > Do existing software stacks like libvirt handle not being able to
> > > > set the MAC happily?
> > >
> > > I am not sure what you mean.
> > > What we are talking about here is the E-switch manager setting a MAC to another VF.
> > > When the VF driver loads it will query this MAC from the NIC. This is
> > > the way It works today with "ip link set _vf_ mac"
> > >
> > > Or in other words we are replacing "ip link set _vf_ mac" and not "ip link set address"
> > > So that it can work from the SmartNic embedded system.
> > > There is nothing really new here, ip link will not work from a
> > > SmartNic, this is why need devlink subdev.
> >
> > Ack, but are we targeting the bare metal cloud scenario here or something
> > more limited? In a bare metal cloud AFAIU the customers can use SR-IOV on
> > the host, but the MACs need to be communicated/ /requested from the
> > cloud management system.
>
> Yes, so the cloud management system communicates with the Control CPU, not the host,
> Not whatever customer decides to run on the hypervisor. The host PF is powerless here (almost like VF).
>
> >
> > IOW the ip link and the devlink APIs are in different domains of control.
> > Customer has access to ip link and provider has access to devlink.
>
> For host VF - Customer has access to ip link exactly like in non-smartnic mode.
> For host PF - "ip link set vf" will return error. Everything running on the host is not-trusted.
>
> >
> > So my question is does libvirt run by the customer handle the fact that it can't
> > poke at ip link gracefully, and if live migration is involved how is the customer
> > supposed to ask the provider to move an address?
>
> I don't understand the question because I don't understand why is it different
> from non-smartnic where the host hypervisor is in-charge.
The ip-link API will suddenly start returning errors which may not be
expected to the user space. So the question is what the user space is
you're expecting to run/testing with? _Some_ user space should prove
this design out before we merge it.
The alternative design is to "forward" hosts ip-link requests to the
NIC CPU and let software running there talk to the cloud back end.
Rather than going
customer -> could API -> NIC,
go
customer -> NIC -> cloud API
That obviously is more complex, but has the big advantage of nothing
on the host CPU having to change.
Powered by blists - more mailing lists