lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UdtEJ0Xe5icMOSj0dg-unEgTR8AwDrtdAWTKEH4D-0www@mail.gmail.com>
Date:   Fri, 18 Dec 2020 08:01:28 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Parav Pandit <parav@...dia.com>
Cc:     David Ahern <dsahern@...il.com>, Jason Gunthorpe <jgg@...dia.com>,
        Saeed Mahameed <saeed@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Leon Romanovsky <leonro@...dia.com>,
        Netdev <netdev@...r.kernel.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        David Ahern <dsahern@...nel.org>,
        Jacob Keller <jacob.e.keller@...el.com>,
        Sridhar Samudrala <sridhar.samudrala@...el.com>,
        "Ertman, David M" <david.m.ertman@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Kiran Patil <kiran.patil@...el.com>,
        Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support

On Thu, Dec 17, 2020 at 9:20 PM Parav Pandit <parav@...dia.com> wrote:
>
>
> > From: Alexander Duyck <alexander.duyck@...il.com>
> > Sent: Friday, December 18, 2020 8:41 AM
> >
> > On Thu, Dec 17, 2020 at 5:30 PM David Ahern <dsahern@...il.com> wrote:
> > >
> > > On 12/16/20 3:53 PM, Alexander Duyck wrote:
> > The problem is PCIe DMA wasn't designed to function as a network switch
> > fabric and when we start talking about a 400Gb NIC trying to handle over 256
> > subfunctions it will quickly reduce the receive/transmit throughput to gigabit
> > or less speeds when encountering hardware multicast/broadcast replication.
> > With 256 subfunctions a simple 60B ARP could consume more than 19KB of
> > PCIe bandwidth due to the packet having to be duplicated so many times. In
> > my mind it should be simpler to simply clone a single skb 256 times, forward
> > that to the switchdev ports, and have them perform a bypass (if available) to
> > deliver it to the subfunctions. That's why I was thinking it might be a good
> > time to look at addressing it.
> Linux tc framework is rich to address this and already used by openvswich for years now.
> Today arp broadcasts are not offloaded. They go through software path and replicated in the L2 domain.
> It is a solved problem for many years now.

When you say they are replicated in the L2 domain I assume you are
talking about the software switch connected to the switchdev ports. My
question is what are you doing with them after you have replicated
them? I'm assuming they are being sent to the other switchdev ports
which will require a DMA to transmit them, and another to receive them
on the VF/SF, or are you saying something else is going on here?

My argument is that this cuts into both the transmit and receive DMA
bandwidth of the NIC, and could easily be avoided in the case where SF
exists in the same kernel as the switchdev port by identifying the
multicast bit being set and simply bypassing the device.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ