[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240228090604.66c17088@kernel.org>
Date: Wed, 28 Feb 2024 09:06:04 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Jiri Pirko <jiri@...nulli.us>
Cc: "Samudrala, Sridhar" <sridhar.samudrala@...el.com>, Greg Kroah-Hartman
<gregkh@...uxfoundation.org>, Tariq Toukan <ttoukan.linux@...il.com>, Saeed
Mahameed <saeed@...nel.org>, "David S. Miller" <davem@...emloft.net>, Paolo
Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>, Saeed
Mahameed <saeedm@...dia.com>, netdev@...r.kernel.org, Tariq Toukan
<tariqt@...dia.com>, Gal Pressman <gal@...dia.com>, Leon Romanovsky
<leonro@...dia.com>, jay.vosburgh@...onical.com
Subject: Re: [net-next V3 15/15] Documentation: networking: Add description
for multi-pf netdev
On Wed, 28 Feb 2024 09:13:57 +0100 Jiri Pirko wrote:
> >> 2) it is basically a matter of device layout/provisioning that this
> >> feature should be enabled, not user configuration.
> >
> >We can still auto-instantiate it, not a deal breaker.
>
> "Auto-instantiate" in meating of userspace orchestration deamon,
> not kernel, that's what you mean?
Either kernel, or pass some hints to a user space agent, like networkd
and have it handle the creation. We have precedent for "kernel side
bonding" with the VF<>virtio bonding thing.
> >I'm not sure you're right in that assumption, tho. At Meta, we support
> >container sizes ranging from few CPUs to multiple NUMA nodes. Each NUMA
> >node may have it's own NIC, and the orchestration needs to stitch and
> >un-stitch NICs depending on whether the cores were allocated to small
> >containers or a huge one.
>
> Yeah, but still, there is one physical port for NIC-numanode pair.
Well, today there is.
> Correct? Does the orchestration setup a bond on top of them or some other
> master device or let the container use them independently?
Just multi-nexthop routing and binding sockets to the netdev (with
some BPF magic, I think).
> >So it would be _easier_ to deal with multiple netdevs. Orchestration
> >layer already understands netdev <> NUMA mapping, it does not understand
> >multi-NUMA netdevs, and how to match up queues to nodes.
> >
> >> 3) other subsystems like RDMA would benefit the same feature, so this
> >> int not netdev specific in general.
> >
> >Yes, looks RDMA-centric. RDMA being infamously bonding-challenged.
>
> Not really. It's just needed to consider all usecases, not only netdev.
All use cases or lowest common denominator, depends on priorities.
Powered by blists - more mailing lists