netdev - Re: Offloading DSA taggers to hardware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+h21hrmP_nDZK9edm6_SGm6orzySj7=SGoui1QzmPV2BgFdBA@mail.gmail.com>
Date:   Thu, 14 Nov 2019 18:40:53 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     Florian Fainelli <f.fainelli@...il.com>
Cc:     netdev <netdev@...r.kernel.org>, Andrew Lunn <andrew@...n.ch>,
        Vivien Didelot <vivien.didelot@...il.com>
Subject: Re: Offloading DSA taggers to hardware

Hi Florian,

On Wed, 13 Nov 2019 at 21:40, Florian Fainelli <f.fainelli@...il.com> wrote:
>
> On 11/13/19 4:40 AM, Vladimir Oltean wrote:
> > DSA is all about pairing any tagging-capable (or at least VLAN-capable) switch
> > to any NIC, and the software stack creates N "virtual" net devices, each
> > representing a switch port, with I/O capabilities based on the metadata present
> > in the frame. It all looks like an hourglass:
> >
> >   switch           switch           switch           switch           switch
> > net_device       net_device       net_device       net_device       net_device
> >      |                |                |                |                |
> >      |                |                |                |                |
> >      |                |                |                |                |
> >      +----------------+----------------+----------------+----------------+
> >                                        |
> >                                        |
> >                                   DSA master
> >                                   net_device
> >                                        |
> >                                        |
> >                                   DSA master
> >                                       NIC
> >                                        |
> >                                     switch
> >                                    CPU port
> >                                        |
> >                                        |
> >      +----------------+----------------+----------------+----------------+
> >      |                |                |                |                |
> >      |                |                |                |                |
> >      |                |                |                |                |
> >   switch           switch           switch           switch           switch
> >    port             port             port             port             port
> >
> >
> > But the process by which the stack:
> > - Parses the frame on receive, decodes the DSA tag and redirects the frame from
> >   the DSA master net_device to a switch net_device based on the source port,
> >   then removes the DSA tag from the frame and recalculates checksums as
> >   appropriate
> > - Adds the DSA tag on xmit, then redirects the frame from the "virtual" switch
> >   net_device to the real DSA master net_device
> >
> > can be optimized, if the DSA master NIC supports this. Let's say there is a
> > fictional NIC that has a programmable hardware parser and the ability to
> > perform frame manipulation (insert, extract a tag). Such a NIC could be
> > programmed to do a better job adding/removing the DSA tag, as well as
> > masquerading skb->dev based on the parser meta-data. In addition, there would
> > be a net benefit for QoS, which as a consequence of the DSA model, cannot be
> > really end-to-end: a frame classified to a high-priority traffic class by the
> > switch may be treated as best-effort by the DSA master, due to the fact that it
> > doesn't really parse the DSA tag (the traffic class, in this case).
>
> The QoS part can be guaranteed for an integrated design, not so much if
> you have discrete/separate NIC and switch vendors and there is no agreed
> upon mechanism to "not lose information" between the two.
>
> >
> > I think the DSA hotpath would still need to be involved, but instead of calling
> > the tagger's xmit/rcv it would need to call a newly introduced ndo that
> > offloads this operation.
> >
> > Is there any hardware out there that can do this? Is it desirable to see
> > something like this in DSA?
>
> BCM7445 and BCM7278 (and other DSL and Cable Modem chips, just not
> supported upstream) use drivers/net/dsa/bcm_sf2.c along with
> drivers/net/ethernet/broadcom/bcmsysport.c. It is possible to offload
> the creation and extraction of the Broadcom tag:
>
> http://linux-kernel.2935.n7.nabble.com/PATCH-net-next-0-3-net-Switch-tag-HW-extraction-insertion-td1162606.html
>
> (this was reverted shortly after because napi_gro_receive() occupies the
> full 48 bytes skb->cb[] space on 64-bit hosts, I have now a better view
> of solving this though, see below).
>
> In my experience though, since the data is already hot in the cache in
> either direction, so a memmove() is not that costly, it was not possible
> to see sizable throughput improvements at 1Gbps or 2Gbps speeds because
> the CPU is more than capable of managing the tag extraction in software,
> and that is the most compatible way of doing it.
>
> To give you some more details, the SYSTEMPORT MAC will pre-pend an 8
> byte Receive Status Block, word 0 contains status/length/error and word
> 1 can contain the full 4byte Broadcom tag as extracted. Then there is a
> (configurable) 2byte gap to align the IP header and then the Ethernet
> header can be found. This is quite similar to the
> NET_DSA_TAG_BRCM_PREPEND case, except for this 2b gap, which is why I am
> wondering if I am not going to introduce an additional tagging protocol
> NET_DSA_TAG_BRCM_PREPEND_WITH_2B or whatever side band information I can
> provide in the skb to permit the removal of these extraneous 2bytes.
>
> On transmit, we also have an 8byte transmit status block which can be
> constructed to contain information for the HW to insert a 4byte Broadcom
> tag, along with a VLAN tag, and with the same length/checksum insertion
> information. TX path would be equivalent to not doing any tagging, so
> similarly, it may be desirable to have a separate
> NET_DSA_TAG_BRCM_PREPEN value that indicates that nothing needs to be
> done except queue the frame for transmission on the master netdev.
>
> Now from a practical angle, offloading DSA tagging only makes sense if
> you happen to have a lot of host initiated/received traffic, which would
> be the case for either a streaming device (BCM7445/BCM7278) with their
> ports either completely separate (DSA default), or bridged. Does that
> apply in your case?

Not at all, I would say. In fact, I was trying to understand what are
the chances of interpreting information from the master's frame
descriptor as the de-facto DSA tag in mainline Linux. Your story with
Starfighter 2 chips seems to indicate that it isn't such a good idea.

> --
> Florian

Thanks,
-Vladimir