lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Dec 2020 19:11:19 -0800
From:   Alexander Duyck <>
To:     David Ahern <>
Cc:     Jason Gunthorpe <>,
        Saeed Mahameed <>,
        "David S. Miller" <>,
        Jakub Kicinski <>,
        Leon Romanovsky <>,
        Netdev <>,,
        David Ahern <>,
        Jacob Keller <>,
        Sridhar Samudrala <>,
        "Ertman, David M" <>,
        Dan Williams <>,
        Kiran Patil <>,
        Greg KH <>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support

On Thu, Dec 17, 2020 at 5:30 PM David Ahern <> wrote:
> On 12/16/20 3:53 PM, Alexander Duyck wrote:
> > The problem in my case was based on a past experience where east-west
> > traffic became a problem and it was easily shown that bypassing the
> > NIC for traffic was significantly faster.
> If a deployment expects a lot of east-west traffic *within a host* why
> is it using hardware based isolation like a VF. That is a side effect of
> a design choice that is remedied by other options.

I am mostly talking about this from past experience as I had seen a
few instances when I was at Intel when it became an issue. Sales and
marketing people aren't exactly happy when you tell them "don't sell
that" in response to them trying to sell a feature into an area where
it doesn't belong. Generally they want a solution. The macvlan offload
addressed these issues as the replication and local switching can be
handled in software.

The problem is PCIe DMA wasn't designed to function as a network
switch fabric and when we start talking about a 400Gb NIC trying to
handle over 256 subfunctions it will quickly reduce the
receive/transmit throughput to gigabit or less speeds when
encountering hardware multicast/broadcast replication. With 256
subfunctions a simple 60B ARP could consume more than 19KB of PCIe
bandwidth due to the packet having to be duplicated so many times. In
my mind it should be simpler to simply clone a single skb 256 times,
forward that to the switchdev ports, and have them perform a bypass
(if available) to deliver it to the subfunctions. That's why I was
thinking it might be a good time to look at addressing it.

Powered by blists - more mailing lists