netdev - Re: DSA: some questions regarding TX forwarding offload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd717680-dbac-4329-75af-32d0c677d622@bang-olufsen.dk>
Date:   Wed, 6 Oct 2021 16:16:34 +0000
From:   Alvin Šipraga <ALSI@...g-olufsen.dk>
To:     Vladimir Oltean <vladimir.oltean@....com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>
Subject: Re: DSA: some questions regarding TX forwarding offload

On 10/5/21 5:25 PM, Vladimir Oltean wrote:
> 
> So let me rephrase the facts which you've presented to make sure I get this right.
> 
> (a) The switch processes each frame in an internal 4-bit FID.
> 
> (b) Each VLAN (not {port, VLAN} pair) can be configured for SVL or IVL.
>      When a packet is received, it is first classified to a VLAN, then
>      the VLAN table is looked up, and the switch determines whether that
>      VLAN is configured for SVL or IVL.
> 
> (c) If configured for SVL, the 4-bit internal FID is derived exclusively
>      from the VLAN table entry.
> 
> (d) If configured for IVL, the ingress port's EFID is read, and the
>      4-bit internal FID is derived from the {12-bit VID, 3-bit port EFID}
>      squashed into a 4-bit number.
> 
> (e) The sum of internal FIDs in use does not exceed 16, regardless of
>      whether SVL or IVL is used for a VID. Otherwise said, the FDB cannot
>      be partitioned in more than 16 groups.
> 
> (f) The FDB is always looked up by {internal FID, MAC}.
> 

Hi Vladimir,

The idea that the chip maps everything to an (internal) 4-bit FID - even 
in IVL mode - was just conjecture based on what I read in the datasheet 
of the chip. I think you can see that I'm still a bit confused by this 
hardware. I'm sorry if you feel like you wasted your time, but hopefully 
this mail clarifies some things for you.

First, allow me to reproduce the relevant part of the datasheet here:

| == Search and Learning
|
| = Search
|
| When a packet is received, the RTL8365MB-VC uses the destination MAC
| address, Filtering Identifier (FID) and Enhanced Filtering Identifier
| (EFID) to search the 2K-entry look-up table. The 48-bit MAC address,
| 4-bit FID and 3-bit EFID use a hash algorithm, to calculate an 11-bit
| index value. The RTL8365MB-VC uses the index to compare the packet MAC
| address with the entries (MAC addresses) in the look-up table. This is
| the ‘Address Search’. If the destination MAC address is not found, the
| switch will broadcast the packet according to VLAN configuration.
|
| = Learning
|
| The RTL8365MB-VC uses the source MAC address, FID, and EFID of the
| incoming packet to hash into a 9-bit index. It then compares the source
| MAC address with the data (MAC addresses) in this index. If there is a
| match with one of the entries, the RTL8365MB-VC will update the entry
| with new information. If there is no match and the 2K entries are not
| all occupied by other MAC addresses, the RTL8365MB- VC will record the
| source MAC address and ingress port number into an empty entry. This
| process is called ‘Learning’.
|
| Address aging is used to keep the contents of the address table correct
| in a dynamic network topology. The look-up engine will update the time
| stamp information of an entry whenever the corresponding source MAC
| address appears. An entry will be invalid (aged out) if its time stamp
| information is not refreshed by the address learning process during the
| aging time period. The aging time of the RTL8365MB-VC is between 200 and
| 400 seconds (typical is 300 seconds).
|
| == SVL and IVL/SVL
|
| The RTL8365MB-VC supports a 16-group Filtering Identifier (FID) for L2
| search and learning. In default operation, all VLAN entries belong to
| the same FID. This is called Shared VLAN Learning (SVL). If VLAN entries
| are configured to different FIDs, then the same source MAC address with
| multiple FIDs can be learned into different look-up table entries. This
| is called Independent VLAN Learning and Shared VLAN Learning (IVL/SVL).

This "IVL/SVL" mode would appear to correspond to a field in the vendor 
driver sources called ivl_svl (ivl_svl=1 is what I have referred to as 
"IVL" all this time), which is part of each VLAN configuration in the 
VLAN table. But that field also comes with a /* IVL */ or /* IVL_EN */ 
comment next to it in some places. So I am unsure whether there is a 
third, "genuine" IVL mode which does not use the FID at all. At least, 
the description in the datasheet doesn't seem seem to correlate with the 
behaviour of this ivl_svl switch. But I could be parsing it wrong.

Now, rather than speculate further on the semantics, I went ahead and 
tested out the behaviour by:

- adding 32 VLANs 100..131 on a port, all with IVL (i.e. ivl_svl=1)
- cycling through the 8 possible port EFIDs (0..7) on that port
- for each EFID, sending one 802.1Q-tagged frame with VID=n for 
n=100..131 to the port from the network

Some notes:

- the chip supports up to 32 concurrent VLANs (globally); this is a 
general limitation of the hardware.
- in this scenario the MAC SA is the same for each frame I transmit from 
the network into the port.

By dumping the hardware FDB after the fact, I can see 32 * 8 FDB entries 
for the given MAC SA of my frames:

	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	0004    00:00:aa:aa:aa:aa       104     2       0       0
	0005    00:00:aa:aa:aa:aa       112     2       0       3
	0006    00:00:aa:aa:aa:aa       120     2       0       2
	0008    00:00:aa:aa:aa:aa       128     2       0       5
	0036    00:00:aa:aa:aa:aa       105     2       0       0
	0037    00:00:aa:aa:aa:aa       113     2       0       3
	0038    00:00:aa:aa:aa:aa       121     2       0       2
	0040    00:00:aa:aa:aa:aa       129     2       0       5
	0068    00:00:aa:aa:aa:aa       106     2       0       0
	... (table continues with an entry for each VID/EFID combo)

Legend:
	addr: look-up-table index
	mac: MAC address
	vid_fid: VID of the frame for both IVL and SVL
	spa: source port address, i.e. the port that learned
	fid: FID (of the VLAN)
	efid: EFID (of the port)

I also tried sending untagged frames from the network and cycling 
through one of the VLANs as PVID, in which case the port would learn and 
make an entry with vid_fid corresponding to the PVID.

This suggests to me that the IVL field of the VLAN configuration really 
does achieve Independent VLAN learning, and that there are not many 
constraints here besides the size of the look-up-table.
	

Now to address your questions...

> How do you know that point (e) is true?

Evidently it is not true, since I can partition the FDB into more than 
16 groups.

> If you add more than 16 VLANs using IVL, is there any error?

I added 32 and things seem to work OK.

> If the user can map a SVL VID to a FID directly through the VLAN table,
> does that mean that the hardware continuously remaps IVL {VID, EFID}
> VLANs to different FIDs, as FID values keep getting used up by SVL?

This would be quite some gymnastics on the part of the ASIC. Let's take 
a step back.

Could it be that the ivl_svl switch simply controls how this 
look-up-table index is computed? That is to say:

SVL: {FID, EFID, MAC} -> index
IVL: {VID, EFID, MAC} -> index

I tried the following scenario:

	# add VLAN 100/101
	bridge vlan add vid 100 dev swp2
	bridge vlan add vid 101 dev swp2

	# send VID 100 frame from another host on the network
	mausezahn eth2 -Q 100 -c 1 -a '00:00:aa:aa:aa:aa' -t tcp

	# dump HW FDB
	cat /sys/kernel/debug/rtl8365mb/lut_dump

	# send VID 101 frame this time
	mausezahn eth2 -Q 101 -c 1 -a '00:00:aa:aa:aa:aa' -t tcp

	# dump HW FDB
	cat /sys/kernel/debug/rtl8365mb/lut_dump

I then tested this out with:

- IVL, FID=0

	##### send frame on VLAN 100
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	0388    00:00:aa:aa:aa:aa       100     2       0       0
	##### send frame on VLAN 101
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	0388    00:00:aa:aa:aa:aa       100     2       0       0
	0420    00:00:aa:aa:aa:aa       101     2       0       0

- IVL, FID=9

	##### send frame on VLAN 100
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	0388    00:00:aa:aa:aa:aa       100     2       9       0
	##### send frame on VLAN 101
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	0388    00:00:aa:aa:aa:aa       100     2       9       0
	0420    00:00:aa:aa:aa:aa       101     2       9       0

- SVL, FID=0

	##### send frame on VLAN 100
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	1280    00:00:aa:aa:aa:aa       100     2       0       0
	##### send frame on VLAN 101
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	1280    00:00:aa:aa:aa:aa       101     2       0       0

- SVL, FID=9

	##### send frame on VLAN 100
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	1056    00:00:aa:aa:aa:aa       100     2       9       0
	##### send frame on VLAN 101
	cat /sys/kernel/debug/rtl8365mb/lut_dump
	addr    mac                     vid_fid spa     fid     efid
	1056    00:00:aa:aa:aa:aa       101     2       9       0

Some observations:

- with IVL, index is the same for FID=0,9
- with SVL, index is different for FID=0,9
- with IVL, index is different for VID=100,101
- with SVL, index is the same for VID=100,101

In particular, with IVL, the FID is stored in the table but it does not 
seem to affect the index.

> Can you make an IVL VID reuse the
> same internal FID as an SVL VID? Can you make two IVL VIDs use the same
> internal FID?
> 
> Anyway, this complicates things by quite a bit. The Linux bridge doesn't
> really have an SVL/IVL knob. It assumes IVL. Where things will get
> challenging is when you offload FDB entries with a given {VID, MAC DA},
> what to do if you access the FDB by FID, but in fact there isn't a
> bijective mapping between the VID and the FID?

I _think_ I can look up the FDB by VID, not just FID - I still have to 
confirm that but I think it depends on whether the particular VLAN is in 
IVL or SVL mode.

But either way, there are bound to be collisions given the way the 
look-up-table works. If the driver is asked to offload two FDB entries 
which map to the same look-up-table entry (i.e. same index), can't it 
just error out on the second request? Something like "I see this entry 
is already occupied by a static (offloaded) FDB entry, so I can't 
satisfy this request".

> You keep reference counts
> per FDB entry, such that when the user deletes a MAC DA from VID A, but
> you also have that MAC DA in VID B, both of which map to the same FID,
> you still keep the entry?

> And most importantly, do you see the FID bits
> in the tagger in the receive path as well?

No, I don't, which is kind of strange. But is it a problem?

> Can you dump them for packets
> classified to a FID in different ways, using IVL, SVL?
> 
>> It could be that my conclusions about "lookup by VID" as opposed to
>> "lookup by FID" are wrong, but if it comes to that, I will just have to
>> manually implement VID<->FID mapping in the driver.
> 
> And this is the second complication. Whatever VID<->FID mapping you make,
> if it's not static, you'll need a lookup table in the tagging protocol
> driver to translate the VID from the skb to a FID. Odd. Or maybe I'm wrong.

OK, I think these last questions of yours are based on the premise of 
some kind of VID<->FID mapping. But I hope this email demonstrates to 
you that the switch behaves somewhat differently.

> 
>>> Practically are you saying that the switch loses the EFID information
>>> between the ingress and the egress stage, since the destination port
>>> mask is selected based on a key constructed with "don't care" in the EFID bits?
>>> Strange.
>>
>> Strange indeed - and wrong! I just checked this again. The switch
>> actually _does_ preserve the EFID for the second lookup when selecting
>> the destination port mask, and this behaves as you would expect. My
>> observation to the contrary was specifically for the case where there is
>> no hit for the destination address, in which case the switch will
>> _flood_ according to the VLAN and MAC DA, without regard for the EFID.
>> This kind of makes sense, since the EFID is just a searching/learning
>> look-up-table concept and is not related to flooding. OTOH there are
>> flooding port mask registers where one can set for
>> {uni,multi,broad}cast, but this configuration is independent of VLAN.
> 
> So flooding is indeed the miss action from the FDB, but I'm just
> wondering, aren't the flood control registers replicated per FID in fact?

No, they seem to be global. Here's what the register definitions look 
like in the vendor driver:

#define    RTL8367C_REG_UNDA_FLOODING_PMSK    0x0890
#define    RTL8367C_UNDA_FLOODING_PMSK_OFFSET    0
#define    RTL8367C_UNDA_FLOODING_PMSK_MASK    0x7FF

#define    RTL8367C_REG_UNMCAST_FLOADING_PMSK    0x0891
#define    RTL8367C_UNMCAST_FLOADING_PMSK_OFFSET    0
#define    RTL8367C_UNMCAST_FLOADING_PMSK_MASK    0x7FF

#define    RTL8367C_REG_BCAST_FLOADING_PMSK    0x0892
#define    RTL8367C_BCAST_FLOADING_PMSK_OFFSET    0
#define    RTL8367C_BCAST_FLOADING_PMSK_MASK    0x7FF

Thanks for your help.

	Alvin