[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87turejclo.fsf@waldekranz.com>
Date: Mon, 18 Jan 2021 19:58:59 +0100
From: Tobias Waldekranz <tobias@...dekranz.com>
To: Vladimir Oltean <olteanv@...il.com>
Cc: davem@...emloft.net, kuba@...nel.org, andrew@...n.ch,
vivien.didelot@...il.com, f.fainelli@...il.com, roopa@...dia.com,
nikolay@...dia.com, netdev@...r.kernel.org, jiri@...nulli.us,
idosch@...sch.org, stephen@...workplumber.org
Subject: Re: [RFC net-next 2/7] net: bridge: switchdev: Include local flag in FDB notifications
On Sun, Jan 17, 2021 at 21:30, Vladimir Oltean <olteanv@...il.com> wrote:
> Hi Tobias,
>
> On Sat, Jan 16, 2021 at 02:25:10AM +0100, Tobias Waldekranz wrote:
>> Some switchdev drivers, notably DSA, ignore all dynamically learned
>> address notifications (!added_by_user) as these are autonomously added
>> by the switch. Previously, such a notification was indistinguishable
>> from a local address notification. Include a local bit in the
>> notification so that the two classes can be discriminated.
>>
>> This allows DSA-like devices to add local addresses to the hardware
>> FDB (with the CPU as the destination), thereby avoiding flows towards
>> the CPU being flooded by the switch as unknown unicast.
>>
>> Signed-off-by: Tobias Waldekranz <tobias@...dekranz.com>
>> ---
>
> In an ideal world, the BR_FDB_LOCAL bit of an FDB entry is what you
> would probably want to use as an indication that the packet must be
> delivered upstream by the hardware, considering that this is what the
> software data path does:
>
> br_handle_frame_finish:
> if (test_bit(BR_FDB_LOCAL, &dst->flags))
> return br_pass_frame_up(skb);
That was my thinking, yes.
> However, we are not in an ideal world, but in a cacophony of nonsensical
> flags that must be passed to the 'bridge fdb add' command. For example,
> I noticed this usage pattern in your patch 6/7:
>
> | br0
> | / \
> | swp0 dummy0
> |
> | $ bridge fdb add 02:00:de:ad:00:01 dev dummy0 vlan 1 master
>
> Do you know that this command doesn't do what you think it does
> (assuming that 02:00:de:ad:00:01 is not the MAC address of dummy0)?
>
> The command you wrote will add a _local_ FDB entry on dummy0.
> I tried it on a DSA switch interface (swp0):
>
> $ bridge fdb add 02:00:de:ad:00:01 dev swp0 vlan 1 master
> [ 3162.165561] rtnl_fdb_add: addr 02:00:de:ad:00:01 vid 1 ndm_state NUD_NOARP|NUD_PERMANENT
> [ 3162.172487] fdb_add_entry: fdb 02:00:de:ad:00:01 state NUD_NOARP|NUD_PERMANENT, fdb_to_nud NUD_REACHABLE, flags 0x0
> [ 3162.180515] mscc_felix 0000:00:00.5 swp0: local fdb: 02:00:de:ad:00:01 vid 1
>
> So, after your patches, this command syntax will stop adding a
> front-facing FDB entry on swp0. It will add a CPU-facing FDB entry
> instead.
Ah I see, no I was not aware of that. I just saw that the entry towards
the CPU was added to the ATU, which it would in both cases. I.e. from
the switch's POV, in this setup:
br0
/ \ (A)
swp0 dummy0
(B)
A "local" entry like (A), or a "static" entry like (B) means the same
thing to the switch: "it is somewhere behind my CPU-port".
> You know why the bridge neighbour state is NUD_NOARP|NUD_PERMANENT in
> rtnl_fdb_add? Well, because iproute2 set it that way:
>
> /* Assume permanent */
> if (!(req.ndm.ndm_state&(NUD_PERMANENT|NUD_REACHABLE)))
> req.ndm.ndm_state |= NUD_PERMANENT;
>
> See iproute2 commit 0849e60a10d0 ("bridge: manage VXLAN based forwarding
> tables"). I know so little about VXLAN's use of the bridge command, that
> I cannot tell why it was decided to "assume permanent" (which seems to
> have changed the default behavior for everybody).
>
> Otherwise said, even if not mentioned in the man page, the default FDB
> entry type is NUD_PERMANENT (which, in short, means a "local" entry, see
> a more detailed explanation at the end).
>
> The man page just says:
>
> bridge fdb add - add a new fdb entry
> This command creates a new fdb entry.
>
> LLADDR the Ethernet MAC address.
>
> dev DEV
> the interface to which this address is associated.
>
> local - is a local permanent fdb entry
>
> static - is a static (no arp) fdb entry
>
> which is utterly misleading and useless. It does not say:
> (a) what a local FDB entry is
> (b) that if neither "local" or "static"|"dynamic" is specified,
> "local" is default
>
> This already creates pretty bad premises. You would have needed to
> explicitly add "static" to your command. Not only you, but in fact also
> thousands of other people who already have switchdev deployments using
> the 'bridge fdb' command.
>
> So, in short, if everybody with switchdev hardware used the 'bridge fdb'
> command correctly so far, your series would have been great. But in
> fact, nobody did (me included). So we need to be more creative.
>
> For example, there's that annoying "self" flag.
> As far as I understand, there is zero reason for a DSA driver to use the
> "self" flag, since that means "bypass the bridge FDB and just call the
> .ndo_fdb_add of the device driver, which in the case of DSA is
> dsa_legacy_fdb_add". Instead you would just use the "master" flag, which
> makes the operation be propagated through br_fdb_add and the software
> FDB has a chance to be updated.
>
> Only that there's no one preventing you from using the 'self' and
> 'master' flags together. Which means that the FDB would be offloaded to
> the device twice: once through the SWITCHDEV_FDB_ADD_TO_DEVICE
> notification emitted by br_fdb_add, and once through dsa_legacy_fdb_add.
> Contradict me if I'm wrong, but I'm thinking that you may have missed
> this detail that bridge fdb addresses are implicitly 'local' because you
> also used some 'self master' commands, and the "to-CPU" address
> installed through switchdev was immediately overwritten by the correct
> one installed through dsa_legacy_fdb_add.
>
> So I think there is a very real issue in that the FDB entries with the
> is_local bit was never specified to switchdev thus far, and now suddenly
> is. I'm sorry, but what you're saying in the commit message, that
> "!added_by_user has so far been indistinguishable from is_local" is
> simply false.
Alright, so how do you do it? Here is the struct:
struct switchdev_notifier_fdb_info {
struct switchdev_notifier_info info; /* must be first */
const unsigned char *addr;
u16 vid;
u8 added_by_user:1,
offloaded:1;
};
Which field separates a local address on swp0 from a dynamically learned
address on swp0?
> What I'm saying is that some of the is_local addresses should have been
> rejected by DSA from day one, like this one:
>
> bridge fdb add dev swp0 00:01:02:03:04:05 master
>
> but nonetheless DSA is happy to accept it anyway, because switchdev
> doesn't tell it it's local. Yay.
Yes that is a real problem, for sure.
> It looks like Mellanox have been telling their users to explicitly use
> the "static" keyword when they mean a static FDB entry:
> https://github.com/Mellanox/mlxsw/wiki/Bridge#forwarding-database-configuration
> which, I mean, is great for them, but pretty much sucks for everybody
> else, because Documentation/networking/switchdev.rst just says:
>
> The switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump
> to support static FDB entries installed to the device. Static bridge FDB
> entries are installed, for example, using iproute2 bridge cmd::
>
> bridge fdb add ADDR dev DEV [vlan VID] [self]
>
> which of course is completely bogus anyway (it indicates 'self', it
> doesn't indicate 'master' at all, it doesn't say anything about 'static').
>
> Look, I'd be more than happy to accept that I'm the only idiot who
> misread the existing documentation on 'bridge fdb', but I fear that this
> is far from true. If we want to make progress with this patch, some user
> space breakage will be necessary - and I think I'm in favour of doing
> that.
Trust me, you are not. There is a running joke about being able to
describe what "master" and "self" really means here at the office :)
Ok, so just to see if I understand this correctly:
The situation today it that `bridge fdb add ADDR dev DEV master` results
in flows towards ADDR being sent to:
1. DEV if DEV belongs to a DSA switch.
2. To the host if DEV was a non-offloaded interface.
With this series applied both would result in (2) which, while
idiosyncratic, is as intended. But this of course runs the risk of
breaking existing scripts which rely on the current behavior.
Correct?
Powered by blists - more mailing lists