lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de852162-faad-40fa-9a73-c7cf2e710105@intel.com>
Date: Thu, 22 Feb 2024 19:23:32 -0600
From: "Samudrala, Sridhar" <sridhar.samudrala@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, Greg Kroah-Hartman
	<gregkh@...uxfoundation.org>
CC: Tariq Toukan <ttoukan.linux@...il.com>, Saeed Mahameed <saeed@...nel.org>,
	"David S. Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>, Saeed Mahameed <saeedm@...dia.com>,
	<netdev@...r.kernel.org>, Tariq Toukan <tariqt@...dia.com>, Gal Pressman
	<gal@...dia.com>, Leon Romanovsky <leonro@...dia.com>,
	<jay.vosburgh@...onical.com>
Subject: Re: [net-next V3 15/15] Documentation: networking: Add description
 for multi-pf netdev



On 2/22/2024 5:00 PM, Jakub Kicinski wrote:
> On Thu, 22 Feb 2024 08:51:36 +0100 Greg Kroah-Hartman wrote:
>> On Tue, Feb 20, 2024 at 05:33:09PM -0800, Jakub Kicinski wrote:
>>> Greg, we have a feature here where a single device of class net has
>>> multiple "bus parents". We used to have one attr under class net
>>> (device) which is a link to the bus parent. Now we either need to add
>>> more or not bother with the linking of the whole device. Is there any
>>> precedent / preference for solving this from the device model
>>> perspective?
>>
>> How, logically, can a netdevice be controlled properly from 2 parent
>> devices on two different busses?  How is that even possible from a
>> physical point-of-view?  What exact bus types are involved here?
> 
> Two PCIe buses, two endpoints, two networking ports. It's one piece

Isn't it only 1 networking port with multiple PFs?

> of silicon, tho, so the "slices" can talk to each other internally.
> The NVRAM configuration tells both endpoints that the user wants
> them "bonded", when the PCI drivers probe they "find each other"
> using some cookie or DSN or whatnot. And once they did, they spawn
> a single netdev.
> 
>> This "shouldn't" be possible as in the end, it's usually a PCI device
>> handling this all, right?
> 
> It's really a special type of bonding of two netdevs. Like you'd bond
> two ports to get twice the bandwidth. With the twist that the balancing
> is done on NUMA proximity, rather than traffic hash.
> 
> Well, plus, the major twist that it's all done magically "for you"
> in the vendor driver, and the two "lower" devices are not visible.
> You only see the resulting bond.
> 
> I personally think that the magic hides as many problems as it
> introduces and we'd be better off creating two separate netdevs.
> And then a new type of "device bond" on top. Small win that
> the "new device bond on top" can be shared code across vendors.

Yes. We have been exploring a small extension to bonding driver to 
enable a single numa-aware multi-threaded application to efficiently 
utilize multiple NICs across numa nodes.

Here is an early version of a patch we have been trying and seems to be 
working well.

=========================================================================
bonding: select tx device based on rx device of a flow

If napi_id is cached in the sk associated with skb, use the
device associated with napi_id as the transmit device.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@...el.com>

diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c
index 7a7d584f378a..77e3bf6c4502 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5146,6 +5146,30 @@ static struct slave 
*bond_xmit_3ad_xor_slave_get(struct bonding *bond,
         unsigned int count;
         u32 hash;

+       if (skb->sk) {
+               int napi_id = skb->sk->sk_napi_id;
+               struct net_device *dev;
+               int idx;
+
+               rcu_read_lock();
+               dev = dev_get_by_napi_id(napi_id);
+               rcu_read_unlock();
+
+               if (!dev)
+                       goto hash;
+
+               count = slaves ? READ_ONCE(slaves->count) : 0;
+               if (unlikely(!count))
+                       return NULL;
+
+               for (idx = 0; idx < count; idx++) {
+                       slave = slaves->arr[idx];
+                       if (slave->dev->ifindex == dev->ifindex)
+                               return slave;
+               }
+       }
+
+hash:
         hash = bond_xmit_hash(bond, skb);
         count = slaves ? READ_ONCE(slaves->count) : 0;
         if (unlikely(!count))
=========================================================================

If we make this as a configurable bonding option, would this be an 
acceptable solution to accelerate numa-aware apps?

> 
> But there's only so many hours in the day to argue with vendors.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ