[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM8PR12MB5480BE54D27770DEB39EA009DC369@DM8PR12MB5480.namprd12.prod.outlook.com>
Date: Wed, 9 Jun 2021 09:24:03 +0000
From: Parav Pandit <parav@...dia.com>
To: Yunsheng Lin <linyunsheng@...wei.com>,
"dsahern@...il.com" <dsahern@...il.com>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
CC: Jiri Pirko <jiri@...dia.com>,
"moyufeng@...wei.com" <moyufeng@...wei.com>,
"linuxarm@...neuler.org" <linuxarm@...neuler.org>
Subject: RE: Re: [PATCH RESEND iproute2-next] devlink: Add optional controller
user input
> From: Yunsheng Lin <linyunsheng@...wei.com>
> Sent: Tuesday, June 8, 2021 3:02 PM
>
> On 2021/6/8 16:47, Parav Pandit wrote:
> >> From: Yunsheng Lin <linyunsheng@...wei.com>
> >> Sent: Tuesday, June 8, 2021 1:06 PM
> >>
> >> On 2021/6/8 13:26, Parav Pandit wrote:
> >>>> From: Yunsheng Lin <linyunsheng@...wei.com>
> >>>> Sent: Tuesday, June 8, 2021 8:58 AM
> >>>>
> >>>> On 2021/6/7 19:12, Parav Pandit wrote:
> >>>>>> From: Yunsheng Lin <linyunsheng@...wei.com>
> >>>>>> Sent: Monday, June 7, 2021 4:27 PM
> >>>>>>
> >>>>
> >>>> [..]
> >>>>
> >>>>>>>
> >>>>>>>> 2. each PF's devlink instance has three types of port, which is
> >>>>>>>> FLAVOUR_PHYSICAL, FLAVOUR_PCI_PF and
> >>>>>> FLAVOUR_PCI_VF(supposing I
> >>>>>>>> understand
> >>>>>>>> port flavour correctly).
> >>>>>>>>
> >>>>>>> FLAVOUR_PCI_{PF,VF,SF} belongs to eswitch (representor) side on
> >>>>>> switchdev device.
> >>>>>>
> >>>>>> If devlink instance or eswitch is in
> DEVLINK_ESWITCH_MODE_LEGACY
> >>>>>> mode, the FLAVOUR_PCI_{PF,VF,SF} port instance does not need to
> >>>> created?
> >>>>> No. in eswitch legacy, there are no representor netdevice or
> >>>>> devlink
> >> ports.
> >>>>
> >>>> It seems each devlink port instance corresponds to a netdevice.
> >>>> More specificly, the devlink instance is created in the struct
> >>>> pci_driver' probe function of a pci function, a devlink port
> >>>> instance is created and registered to that devlink instance when a
> >>>> netdev of that
> >> pci function is created?
> >>>>
> >>> Yes.
> >>>
> >>>> As in diagram [1], the devlink port instance(flavour
> >>>> FLAVOUR_PHYSICAL) for
> >>>> ctrl-0-pf0 is created when the netdev of ctrl-0-pf0 is created in
> >>>> the host of smartNIC, the devlink port instance(flavour
> >>>> FLAVOUR_VIRTUAL) for ctrl-0- pf0vfN is created when the netdev of
> >>>> ctrl-0-pf0vfN is created in the host of smartNIC, right?
> >>>>
> >>> Ctrl-0-pf0vfN, ctrl-0-pf0 ports are eswitch ports. They are created
> >>> where
> >> there is eswitch.
> >>> Usually in smartnic where eswitch is located.
> >>
> >> Does diagram in [1] corresponds to the multi-host (two) host setup as
> >> memtioned previously?
> >> H1.pf0.phyical_port = p0.
> >> H1.pf1.phyical_port = p1.
> >> H2.pf0.phyical_port = p0.
> >> H2.pf1.phyical_port = p1.
> >>
> > Yes.
> >
> >> Let's say H1 = server and H2 = smartNIC as the pci rc connected to below:
> >> ---------------------------------------------------------
> >> | |
> >> | --------- --------- ------- ------- |
> >> ----------- | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
> >> | server | | ------- ----/---- ---/----- ------- ---/--- ---/--- |
> >> | pci rc |=== | pf0 |______/________/ | pf1 |___/_______/ |
> >> | connect | | ------- ------- |
> >> ----------- | | controller_num=1 (no eswitch) |
> >> ------|--------------------------------------------------
> >> (internal wire)
> >> |
> >> ---------------------------------------------------------
> >> | devlink eswitch ports and reps |
> >> | ----------------------------------------------------- |
> >> | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
> >> | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
> >> | ----------------------------------------------------- |
> >> | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
> >> | |pf0 | pf0vfN | pf0sfN | pf1 | pf1vfN |pf1sfN | |
> >> | ----------------------------------------------------- |
> >> | |
> >> | |
> >> ----------- | --------- --------- ------- ------- |
> >> | smartNIC| | | vf(s) | | sf(s) | |vf(s)| |sf(s)| |
> >> | pci rc |==| ------- ----/---- ---/----- ------- ---/--- ---/--- |
> >> | connect | | | pf0 |______/________/ | pf1 |___/_______/ |
> >> ----------- | ------- ------- |
> >> | |
> >> | local controller_num=0 (eswitch) |
> >>
> >> ---------------------------------------------------------
> >>
> >> A vanilla kernel can run on the smartNIC host, right?
> > Right.
> >
> >> what the smartNIC host see is two PF corresponding to ctrl-0-pf0 and
> >> ctrl-0-pf1 When the kernel is boot up first and mlx driver is not
> >> loaded yet, right?
> >>
> >> I am not sure it is ok to leave out the VF and SF, but let's leave
> >> them out for simplicity now.
> >> When mlx driver is loaded, two devlink instances are created, which
> >> corresponds to ctrl-0-pf0 and ctrl-0-pf1, and two devlink port
> >> instances (flavour FLAVOUR_PHYSICAL) is created and registered to
> >> corresponding devlink instances just created, right?
> >>
> >> As the eswitch mode is based on devlink instance, Let's only set the
> >> mode of ctrl-0-pf0' devlink instance to
> >> DEVLINK_ESWITCH_MODE_SWITCHDEV, the representor netdev of ctrl-1-
> pf0
> >> is created and devlink port instance of that representor netdev is
> >> created and registered to devlink instances corresponding to ctrl-0-pf0?
> >>
> >> I think I miss something here, the above does not seems right,
> >> because:
> >> 1. For single host case:the PF is not passed through to the VM, devlink
> port
> >> instance of VF's representor netdev can be registered to the
> >> devlink instance
> >> corresponding to it's PF, right?
> > Yes, if I understand your question right.
> >
> >> 2. But for two-host case as above, do we need to create a devlink
> instances
> >> for the PF corresponding to ctrl-1-pf0 in smartNIC host?
> > You can choose not to create a devlink instance in external controller PF. It
> may not be even a Linux OS running there.
> >
> > I read questions few more times, but I find it hard to understand what you
> really want to ask.
> > Not sure I understood you.
> >
> > Trying again,
> >
> > The model is really very straight forward as visible in the diagram.
> >
> > There is one PF that has the eswitch. Eswitch contains representor ports.
>
> I thought the representor ports of a PF'eswitch is decided by the function
> under a specific PF(For example, the PF itself and the VF under this PF)?
Eswitch is not per PF in context of smartnic/multi-host.
PF _has_ eswitch that contains the representor ports for PF, VF, SF.
>
> > Each representor port represent either PF, VF or SF.
> > This PF, VF or SF can be of local controller residing on the eswitch device or
> it can be of an external controller(s).
> > Here external controller = 1.
>
> If I understood above correctly:
> The fw/hw decide which PF has the eswitch, and how many
> devlink/representor port does this eswitch has?
Number of ports are dynamic. When new SFs/VFs are created, ports get added to the switch.
> Suppose PF0 of controller_num=0 in have the eswitch, and the eswitch may
> has devlink/representor port representing other PF, like PF1 in
> controller_num=0, and even PF0/PF1 in controller_num=1?
Yes. Correct.
Powered by blists - more mailing lists