lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zd7rRTSSLO9-DM2t@nanopsycho>
Date: Wed, 28 Feb 2024 09:13:57 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Tariq Toukan <ttoukan.linux@...il.com>,
	Saeed Mahameed <saeed@...nel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
	Saeed Mahameed <saeedm@...dia.com>, netdev@...r.kernel.org,
	Tariq Toukan <tariqt@...dia.com>, Gal Pressman <gal@...dia.com>,
	Leon Romanovsky <leonro@...dia.com>, jay.vosburgh@...onical.com
Subject: Re: [net-next V3 15/15] Documentation: networking: Add description
 for multi-pf netdev

Wed, Feb 28, 2024 at 03:06:19AM CET, kuba@...nel.org wrote:
>On Fri, 23 Feb 2024 10:36:25 +0100 Jiri Pirko wrote:
>> >> It's really a special type of bonding of two netdevs. Like you'd bond
>> >> two ports to get twice the bandwidth. With the twist that the balancing
>> >> is done on NUMA proximity, rather than traffic hash.
>> >> 
>> >> Well, plus, the major twist that it's all done magically "for you"
>> >> in the vendor driver, and the two "lower" devices are not visible.
>> >> You only see the resulting bond.
>> >> 
>> >> I personally think that the magic hides as many problems as it
>> >> introduces and we'd be better off creating two separate netdevs.
>> >> And then a new type of "device bond" on top. Small win that
>> >> the "new device bond on top" can be shared code across vendors.  
>> >
>> >Yes. We have been exploring a small extension to bonding driver to enable a
>> >single numa-aware multi-threaded application to efficiently utilize multiple
>> >NICs across numa nodes.  
>> 
>> Bonding was my immediate response when we discussed this internally for
>> the first time. But I had to eventually admit it is probably not that
>> suitable in this case, here's why:
>> 1) there are no 2 physical ports, only one.
>
>Right, sorry, number of PFs matches number of ports for each bus.
>But it's not necessarily a deal breaker - it's similar to a multi-host
>device. We also have multiple netdevs and PCIe links, they just go to
>different host rather than different NUMA nodes on one host.

That is a different scenario. You have multiple hosts and a switch
between them and the physical port. Yeah, it might be invisible switch,
but there still is one. On DPU/smartnic, it is visible and configurable.


>
>> 2) it is basically a matter of device layout/provisioning that this
>>    feature should be enabled, not user configuration.
>
>We can still auto-instantiate it, not a deal breaker.

"Auto-instantiate" in meating of userspace orchestration deamon,
not kernel, that's what you mean?


>
>I'm not sure you're right in that assumption, tho. At Meta, we support
>container sizes ranging from few CPUs to multiple NUMA nodes. Each NUMA
>node may have it's own NIC, and the orchestration needs to stitch and
>un-stitch NICs depending on whether the cores were allocated to small
>containers or a huge one.

Yeah, but still, there is one physical port for NIC-numanode pair.
Correct? Does the orchestration setup a bond on top of them or some other
master device or let the container use them independently?


>
>So it would be _easier_ to deal with multiple netdevs. Orchestration
>layer already understands netdev <> NUMA mapping, it does not understand
>multi-NUMA netdevs, and how to match up queues to nodes.
>
>> 3) other subsystems like RDMA would benefit the same feature, so this
>>    int not netdev specific in general.
>
>Yes, looks RDMA-centric. RDMA being infamously bonding-challenged.

Not really. It's just needed to consider all usecases, not only netdev.


>
>Anyway, back to the initial question - from Greg's reply I'm guessing
>there's no precedent for doing such things in the device model either.
>So we're on our own.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ