netdev - Re: DSA: some questions regarding TX forwarding offload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0726ca75-e615-0872-7222-abdb7a28ce8a@bang-olufsen.dk>
Date:   Thu, 7 Oct 2021 11:22:32 +0000
From:   Alvin Šipraga <ALSI@...g-olufsen.dk>
To:     Vladimir Oltean <vladimir.oltean@....com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>
Subject: Re: DSA: some questions regarding TX forwarding offload

On 10/7/21 11:47 AM, Vladimir Oltean wrote:
> On Wed, Oct 06, 2021 at 04:16:34PM +0000, Alvin Šipraga wrote:
>> First, allow me to reproduce the relevant part of the datasheet here:
>>
>> | == Search and Learning
>> |
>> | = Search
>> |
>> | When a packet is received, the RTL8365MB-VC uses the destination MAC
>> | address, Filtering Identifier (FID) and Enhanced Filtering Identifier
>> | (EFID) to search the 2K-entry look-up table. The 48-bit MAC address,
>> | 4-bit FID and 3-bit EFID use a hash algorithm, to calculate an 11-bit
>> | index value. The RTL8365MB-VC uses the index to compare the packet MAC
>> | address with the entries (MAC addresses) in the look-up table. This is
>> | the ‘Address Search’. If the destination MAC address is not found, the
>> | switch will broadcast the packet according to VLAN configuration.
> 
> Unless the wording is plain incorrect or does not cover all cases except
> for the "default operation", I think this says that the LUT is always
> indexed based on a hash of {48-bit MAC, 4-bit FID, 3-bit EFID}.
> No mention of VID.

Yes, I parsed this paragraph in the same way.

> 
>> |
>> | = Learning
>> |
>> | The RTL8365MB-VC uses the source MAC address, FID, and EFID of the
>> | incoming packet to hash into a 9-bit index. It then compares the source
>> | MAC address with the data (MAC addresses) in this index. If there is a
>> | match with one of the entries, the RTL8365MB-VC will update the entry
>> | with new information. If there is no match and the 2K entries are not
>> | all occupied by other MAC addresses, the RTL8365MB- VC will record the
>> | source MAC address and ingress port number into an empty entry. This
>> | process is called ‘Learning’.
>> | Address aging is used to keep the contents of the address table correct
>> | in a dynamic network topology. The look-up engine will update the time
>> | stamp information of an entry whenever the corresponding source MAC
>> | address appears. An entry will be invalid (aged out) if its time stamp
>> | information is not refreshed by the address learning process during the
>> | aging time period. The aging time of the RTL8365MB-VC is between 200 and
>> | 400 seconds (typical is 300 seconds).
>> |
>> | == SVL and IVL/SVL
>> |
>> | The RTL8365MB-VC supports a 16-group Filtering Identifier (FID) for L2
>> | search and learning. In default operation, all VLAN entries belong to
>> | the same FID. This is called Shared VLAN Learning (SVL). If VLAN entries
>> | are configured to different FIDs, then the same source MAC address with
>> | multiple FIDs can be learned into different look-up table entries. This
>> | is called Independent VLAN Learning and Shared VLAN Learning (IVL/SVL).
> 
> I think I understand what they're trying to say, although I don't
> understand what does "default operation" mean. Typical usage? No idea.

I think it's just saying that this is the state when the chip is reset. 
Put another way, all VLANs have FID=0 and are in SVL mode after one 
resets the chip. At least that's the state I find the chip when I reset 
it - which is consistent with this paragraph.

> 
>> This "IVL/SVL" mode would appear to correspond to a field in the vendor
>> driver sources called ivl_svl (ivl_svl=1 is what I have referred to as
>> "IVL" all this time), which is part of each VLAN configuration in the
>> VLAN table. But that field also comes with a /* IVL */ or /* IVL_EN */
>> comment next to it in some places. So I am unsure whether there is a
>> third, "genuine" IVL mode which does not use the FID at all. At least,
>> the description in the datasheet doesn't seem seem to correlate with the
>> behaviour of this ivl_svl switch. But I could be parsing it wrong.
> 
> So you've said that a VLAN table entry contains an IVL_EN bit, and a FID.
> It's this structure, right?
> 
> struct rtl8365mb_vlan_4k {
> 	u16 vid;
> 	u16 member;
> 	u16 untag;
> 	u8 fid;
> 	u8 priority;
> 	u8 priority_en : 1;
> 	u8 policing_en : 1;
> 	u8 ivl_en : 1;
> 	u8 meteridx;
> };

Correct.

> 
> What they say is: if you configure some of the VLAN table entries with
> non-identical FIDs, you are operating in mixed IVL/SVL mode. Meaning:
> you still haven't set the IVL bit in any of the VLAN table entries,
> therefore you are still using SVL, where the VLAN table maps a VID to a
> FID (and this is in line with the explanation given above).
> But on the other hand, not all VLANs map to the same FID (as in the pure
> SVL case). So it is a mixed SVL/IVL mode.

This is how I understand it too, yeah. In particular, the description in 
the datasheet only covers scenarios where the IVL_EN bit is 0.

> 
> What I suspect is that if you set the IVL bit in the VLAN table entry,
> the FID is completely ignored. Or maybe, with IVL, the VID _is_ the FID,
> and in that case, the description above would actually be correct in
> stating that the LUT is always looked up by {MAC, EFID, FID}.
> What absolutely bugs me is the fact that they say the FID is 4-bit.
> When using a 4K VLAN ID as FID, you can't use just 4 bits of it...

Yeah, it bugs me too... But I am now of the view that the datasheet is 
simply incomplete in its description of the LUT, and that it only talks 
about the SVL scenario.

> 
>> Now, rather than speculate further on the semantics, I went ahead and
>> tested out the behaviour by:
>>
>> - adding 32 VLANs 100..131 on a port, all with IVL (i.e. ivl_svl=1)
>> - cycling through the 8 possible port EFIDs (0..7) on that port
>> - for each EFID, sending one 802.1Q-tagged frame with VID=n for
>> n=100..131 to the port from the network
>>
>> Some notes:
>>
>> - the chip supports up to 32 concurrent VLANs (globally); this is a
>> general limitation of the hardware.
>> - in this scenario the MAC SA is the same for each frame I transmit from
>> the network into the port.
>>
>> By dumping the hardware FDB after the fact, I can see 32 * 8 FDB entries
>> for the given MAC SA of my frames:
>>
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	0004    00:00:aa:aa:aa:aa       104     2       0       0
>> 	0005    00:00:aa:aa:aa:aa       112     2       0       3
>> 	0006    00:00:aa:aa:aa:aa       120     2       0       2
>> 	0008    00:00:aa:aa:aa:aa       128     2       0       5
>> 	0036    00:00:aa:aa:aa:aa       105     2       0       0
>> 	0037    00:00:aa:aa:aa:aa       113     2       0       3
>> 	0038    00:00:aa:aa:aa:aa       121     2       0       2
>> 	0040    00:00:aa:aa:aa:aa       129     2       0       5
>> 	0068    00:00:aa:aa:aa:aa       106     2       0       0
>> 	... (table continues with an entry for each VID/EFID combo)
>>
>> Legend:
>> 	addr: look-up-table index
>> 	mac: MAC address
>> 	vid_fid: VID of the frame for both IVL and SVL
> 
> Who gave it this "vid_fid" name?

The name is borrowed from the vendor driver's data structure.

> 
>> 	spa: source port address, i.e. the port that learned
>> 	fid: FID (of the VLAN)
>> 	efid: EFID (of the port)
>>
>> I also tried sending untagged frames from the network and cycling
>> through one of the VLANs as PVID, in which case the port would learn and
>> make an entry with vid_fid corresponding to the PVID.
>>
>> This suggests to me that the IVL field of the VLAN configuration really
>> does achieve Independent VLAN learning, and that there are not many
>> constraints here besides the size of the look-up-table.
> 
> Can you repeat the experiment sweeping through EFIDs, but with the VLANs
> configured for SVL and having the same FID? I would expect that the LUT
> indices will be different, but still as many. Just want to confirm my
> theory that the EFID provides port-based isolation regardless of IVL_EN.

I was actually testing this just now.

For VLANs with SVL same FID and EFID, the same MAC is learned into the 
same index, irrespective of VID (no surprise).

However, cycling through the EFID, the same MAC is instead learned into 
8 different indices.

So yes, EFID provides port-based isolation regardless of IVL_EN. This is 
consistent with the description in the datasheet too.

> 
> Also, can you please repeat the IVL experiment but with VIDs not having
> consecutive values, but rather N, N+16, N+32, N+48, ... N+2048 etc?
> I would like to get to the bottom of that 4-bit FID thing.

Sure. I ran the test as you suggested with N=100 and the results are the 
same: for 32 VLANs and cycling through the 8 EFIDs for each, I end up 
with 256 entries in the LUT. If I keep adding VLANs (note the limit is 
32, but I can remove an old one and put a new one without losing the LUT 
entries of the old), then the LUT keeps just taking on entries.

Considering this, do you agree with the mapping I suggested in the 
previous email?

| SVL: {FID, EFID, MAC} -> index
| IVL: {VID, EFID, MAC} -> index

There doesn't seem to be any 4-bit resolution to the VID key when doing 
an IVL lookup.

> 
>> Could it be that the ivl_svl switch simply controls how this
>> look-up-table index is computed? That is to say:
>>
>> SVL: {FID, EFID, MAC} -> index
>> IVL: {VID, EFID, MAC} -> index
>>
>> I tried the following scenario:
>>
>> 	# add VLAN 100/101
>> 	bridge vlan add vid 100 dev swp2
>> 	bridge vlan add vid 101 dev swp2
>>
>> 	# send VID 100 frame from another host on the network
>> 	mausezahn eth2 -Q 100 -c 1 -a '00:00:aa:aa:aa:aa' -t tcp
>>
>> 	# dump HW FDB
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>>
>> 	# send VID 101 frame this time
>> 	mausezahn eth2 -Q 101 -c 1 -a '00:00:aa:aa:aa:aa' -t tcp
>>
>> 	# dump HW FDB
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>>
>> I then tested this out with:
>>
>> - IVL, FID=0
>>
>> 	##### send frame on VLAN 100
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	0388    00:00:aa:aa:aa:aa       100     2       0       0
>> 	##### send frame on VLAN 101
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	0388    00:00:aa:aa:aa:aa       100     2       0       0
>> 	0420    00:00:aa:aa:aa:aa       101     2       0       0
>>
>> - IVL, FID=9
>>
>> 	##### send frame on VLAN 100
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	0388    00:00:aa:aa:aa:aa       100     2       9       0
>> 	##### send frame on VLAN 101
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	0388    00:00:aa:aa:aa:aa       100     2       9       0
>> 	0420    00:00:aa:aa:aa:aa       101     2       9       0
>>
>> - SVL, FID=0
>>
>> 	##### send frame on VLAN 100
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	1280    00:00:aa:aa:aa:aa       100     2       0       0
>> 	##### send frame on VLAN 101
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	1280    00:00:aa:aa:aa:aa       101     2       0       0
>>
>> - SVL, FID=9
>>
>> 	##### send frame on VLAN 100
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	1056    00:00:aa:aa:aa:aa       100     2       9       0
>> 	##### send frame on VLAN 101
>> 	cat /sys/kernel/debug/rtl8365mb/lut_dump
>> 	addr    mac                     vid_fid spa     fid     efid
>> 	1056    00:00:aa:aa:aa:aa       101     2       9       0
>>
>> Some observations:
>>
>> - with IVL, index is the same for FID=0,9
>> - with SVL, index is different for FID=0,9
>> - with IVL, index is different for VID=100,101
>> - with SVL, index is the same for VID=100,101
> 
> Yes, good job investigating, this seems to support the theory that when
> a VLAN table entry is configured for IVL, the FID (actually vid_fid in
> your dumps) is the VID, otherwise it's the FID from the VLAN table entry.

Right. This also justifies the name vid_fid in the vendor driver.

> 
>> In particular, with IVL, the FID is stored in the table but it does not
>> seem to affect the index.
> 
> It's probably there so that you don't need to flush the LUT and
> reinstall everything when you change a VLAN table entry from IVL to SVL/IVL.
> 
>> I _think_ I can look up the FDB by VID, not just FID - I still have to
>> confirm that but I think it depends on whether the particular VLAN is in
>> IVL or SVL mode.
>>
>> But either way, there are bound to be collisions given the way the
>> look-up-table works. If the driver is asked to offload two FDB entries
>> which map to the same look-up-table entry (i.e. same index), can't it
>> just error out on the second request? Something like "I see this entry
>> is already occupied by a static (offloaded) FDB entry, so I can't
>> satisfy this request".
> 
> Yes, in case of hash collisions between unrelated entries on a full row,
> returning -ENOSPC is clearly okay. This case is more interesting because
> the LUT entries are not unrelated. I was commenting under the assumption
> that you will need to give switchdev the impression that you are
> offloading entries via IVL (so you should accept two FDB entries for the
> same MAC DA in different VIDs, as long as they point towards the same
> destination port) because that's how the hardware is going to treat them.
> The only problematic case is when switchdev asks one FDB in one VLAN to
> go one way, and another in another VLAN to go another way.
> 
> [ by the way you can't propagate errors from .port_fdb_add to switchdev,
>    and to the bridge, sorry ]

OK, but I guess returning -ENOSPC in .port_fdb_add is the best a DSA 
driver can do, right?

> 
> Anyway, doesn't matter, it's clearer now that you don't have to care
> about this, I don't think you should use the SVL or SVL/IVL modes for
> anything, just program all VLAN table entries with IVL=true, and set the
> EFID based on dp->bridge_num.

Cool, glad we're on the same page!

> 
>>> And most importantly, do you see the FID bits in the tagger in the
>>> receive path as well?
>>
>> No, I don't, which is kind of strange. But is it a problem?
> 
> Not really, no.
>