[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6fbc4127-ab67-3898-8eaa-409c3209a2e2@gmail.com>
Date: Wed, 13 Nov 2019 11:40:48 -0800
From: Florian Fainelli <f.fainelli@...il.com>
To: Vladimir Oltean <olteanv@...il.com>,
netdev <netdev@...r.kernel.org>, Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>
Subject: Re: Offloading DSA taggers to hardware
On 11/13/19 4:40 AM, Vladimir Oltean wrote:
> DSA is all about pairing any tagging-capable (or at least VLAN-capable) switch
> to any NIC, and the software stack creates N "virtual" net devices, each
> representing a switch port, with I/O capabilities based on the metadata present
> in the frame. It all looks like an hourglass:
>
> switch switch switch switch switch
> net_device net_device net_device net_device net_device
> | | | | |
> | | | | |
> | | | | |
> +----------------+----------------+----------------+----------------+
> |
> |
> DSA master
> net_device
> |
> |
> DSA master
> NIC
> |
> switch
> CPU port
> |
> |
> +----------------+----------------+----------------+----------------+
> | | | | |
> | | | | |
> | | | | |
> switch switch switch switch switch
> port port port port port
>
>
> But the process by which the stack:
> - Parses the frame on receive, decodes the DSA tag and redirects the frame from
> the DSA master net_device to a switch net_device based on the source port,
> then removes the DSA tag from the frame and recalculates checksums as
> appropriate
> - Adds the DSA tag on xmit, then redirects the frame from the "virtual" switch
> net_device to the real DSA master net_device
>
> can be optimized, if the DSA master NIC supports this. Let's say there is a
> fictional NIC that has a programmable hardware parser and the ability to
> perform frame manipulation (insert, extract a tag). Such a NIC could be
> programmed to do a better job adding/removing the DSA tag, as well as
> masquerading skb->dev based on the parser meta-data. In addition, there would
> be a net benefit for QoS, which as a consequence of the DSA model, cannot be
> really end-to-end: a frame classified to a high-priority traffic class by the
> switch may be treated as best-effort by the DSA master, due to the fact that it
> doesn't really parse the DSA tag (the traffic class, in this case).
The QoS part can be guaranteed for an integrated design, not so much if
you have discrete/separate NIC and switch vendors and there is no agreed
upon mechanism to "not lose information" between the two.
>
> I think the DSA hotpath would still need to be involved, but instead of calling
> the tagger's xmit/rcv it would need to call a newly introduced ndo that
> offloads this operation.
>
> Is there any hardware out there that can do this? Is it desirable to see
> something like this in DSA?
BCM7445 and BCM7278 (and other DSL and Cable Modem chips, just not
supported upstream) use drivers/net/dsa/bcm_sf2.c along with
drivers/net/ethernet/broadcom/bcmsysport.c. It is possible to offload
the creation and extraction of the Broadcom tag:
http://linux-kernel.2935.n7.nabble.com/PATCH-net-next-0-3-net-Switch-tag-HW-extraction-insertion-td1162606.html
(this was reverted shortly after because napi_gro_receive() occupies the
full 48 bytes skb->cb[] space on 64-bit hosts, I have now a better view
of solving this though, see below).
In my experience though, since the data is already hot in the cache in
either direction, so a memmove() is not that costly, it was not possible
to see sizable throughput improvements at 1Gbps or 2Gbps speeds because
the CPU is more than capable of managing the tag extraction in software,
and that is the most compatible way of doing it.
To give you some more details, the SYSTEMPORT MAC will pre-pend an 8
byte Receive Status Block, word 0 contains status/length/error and word
1 can contain the full 4byte Broadcom tag as extracted. Then there is a
(configurable) 2byte gap to align the IP header and then the Ethernet
header can be found. This is quite similar to the
NET_DSA_TAG_BRCM_PREPEND case, except for this 2b gap, which is why I am
wondering if I am not going to introduce an additional tagging protocol
NET_DSA_TAG_BRCM_PREPEND_WITH_2B or whatever side band information I can
provide in the skb to permit the removal of these extraneous 2bytes.
On transmit, we also have an 8byte transmit status block which can be
constructed to contain information for the HW to insert a 4byte Broadcom
tag, along with a VLAN tag, and with the same length/checksum insertion
information. TX path would be equivalent to not doing any tagging, so
similarly, it may be desirable to have a separate
NET_DSA_TAG_BRCM_PREPEN value that indicates that nothing needs to be
done except queue the frame for transmission on the master netdev.
Now from a practical angle, offloading DSA tagging only makes sense if
you happen to have a lot of host initiated/received traffic, which would
be the case for either a streaming device (BCM7445/BCM7278) with their
ports either completely separate (DSA default), or bridged. Does that
apply in your case?
--
Florian
Powered by blists - more mailing lists