[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f6d3b34c-49d5-4d20-9e76-6c5158e56acd@blackwall.org>
Date: Mon, 23 Sep 2024 16:03:25 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Thomas Martitz <tmartitz-oss@....de>, Roopa Prabhu <roopa@...dia.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Cc: Johannes Nixdorf <jnixdorf-oss@....de>, bridge@...ts.linux.dev,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH net-next] net: bridge: drop packets with a local
source
On 9/23/24 10:26, Thomas Martitz wrote:
> Am 22.09.24 um 20:22 schrieb Nikolay Aleksandrov:
>> On 9/20/24 16:28, Thomas Martitz wrote:
>>> Am 20.09.24 um 08:42 schrieb Nikolay Aleksandrov:
>>>> On 19/09/2024 14:13, Thomas Martitz wrote:
>>>>> Am 19.09.24 um 12:33 schrieb Nikolay Aleksandrov:
>>>>>> On 19/09/2024 11:58, Thomas Martitz wrote:
>>>>>>> Currently, there is only a warning if a packet enters the bridge
>>>>>>> that has the bridge's or one port's MAC address as source.
>>>>>>>
>>>>>>> Clearly this indicates a network loop (or even spoofing) so we
>>>>>>> generally do not want to process the packet. Therefore, move the check
>>>>>>> already done for 802.1x scenarios up and do it unconditionally.
>>>>>>>
>>>>>>> For example, a common scenario we see in the field:
>>>>>>> In a accidental network loop scenario, if an IGMP join
>>>>>>> loops back to us, it would cause mdb entries to stay indefinitely
>>>>>>> even if there's no actual join from the outside. Therefore
>>>>>>> this change can effectively prevent multicast storms, at least
>>>>>>> for simple loops.
>>>>>>>
>>>>>>> Signed-off-by: Thomas Martitz <tmartitz-oss@....de>
>>>>>>> ---
>>>>>>> net/bridge/br_fdb.c | 4 +---
>>>>>>> net/bridge/br_input.c | 17 ++++++++++-------
>>>>>>> 2 files changed, 11 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>
>>>>>> Absolutely not, I'm sorry but we're not all going to take a performance hit
>>>>>> of an additional lookup because you want to filter src address. You can filter
>>>>>> it in many ways that won't affect others and don't require kernel changes
>>>>>> (ebpf, netfilter etc). To a lesser extent there is also the issue where we might
>>>>>> break some (admittedly weird) setup.
>>>>>>
>>>>>
>>>>> Hello Nikolay,
>>>>>
>>>>> thanks for taking a look at the patch. I expected concerns, therefore the RFC state.
>>>>>
>>>>> So I understand that performance is your main concern. Some users might
>>>>> be willing to pay for that cost, however, in exchange for increased
>>>>> system robustness. May I suggest per-bridge or even per-port flags to
>>>>> opt-in to this behavior? We'd set this from our userspace. This would
>>>>> also address the concern to not break weird, existing setups.
>>>>>
>>>>
>>>> That is the usual way these things are added, as opt-in. A flag sounds good
>>>> to me, if you're going to make it per-bridge take a look at the bridge bool
>>>> opts, they were added for such cases.
>>>>
>>>
>>> Alright. I'll approach this. It may take a little while because the LPC
>>> talks are so amazing that I don't want to miss anything.
>>>
>>> I'm currently considering a per-bridge flag because that's fits our use
>>> case. A per-port flag would also work, though, and may fit the code
>>> there better because it's already checking for other port flags
>>> (BR_PORT_LOCKED, BR_LEARNING). Do you have a preference?
>>>
>>
>> Hi,
>> Sorry for the delayed response, but I was traveling over the weekend
>> and I got some more time to think about this. There is a more subtle
>> problem with this change - you're introducing packet filtering based on
>> fdb flags in the bridge, but it's not the bridge's job to filter
>> packets. We have filtering subsystems - netfilter, tc or ebpf, if they
>> lack some functionality you need to achieve this, then extend them.
>> Just because it's easy to hard-code this packet filter in the bridge
>> doesn't make it right, use the right subsystem if you want to filter.
>> For example you can extend nft's bridge matching capabilities.
>> More below.
>
> Hi,
>
> Alright, I understand that you basically object to the whole idea of
> filtering in the bridge code directly (based on fdb flags). While that
> makes some sense, I found that basically the same filter that I already
> exists for mac802.11 use cases:
>
> } else if (READ_ONCE(fdb_src->dst) != p ||
> test_bit(BR_FDB_LOCAL, &fdb_src->flags)) { <-- drop if local source on ingress
> /* FDB mismatch. Drop the packet without roaming. */
> goto drop;
>
> In fact, this very code motivated me because I'm just adding one more
> condition to an existing drop mechanism after all. In this area there
> are also further drops based on fdb flags.
>
I was expecting you'd bring this code up :) but you've also taken it out
of context. It is a part of a larger feature which also creates locked
fdbs, and enforces certain policies which give user-space a chance to
authenticate user macs, i.e. MAC Authentication Bypass. Creating and
managing such fdbs can only be done from the bridge but also with the
help of user-space.
> Anyway, dropping isn't actually my main intent, although it's a welcome
> side effect because it immediately stops loops. What I'm after most is
> avoiding local proccessing, both in the IGMP/MLD snooping code and up in
> the stack. In my opinion, it would be good if the bridge code can be more
> resilient against loops (and spoofers) by not processing its own packets
> as if it came from somebody else. My main issue is: the IGMP/MLD snooping
> code becomes convinced that there are subscribers on the network even if
> here aren't, just by processing IGMP/MLD joins that were send out a moment
> ago. That said, we could still decide to forward these packets and not
> filter them completely.
>
> And I still think that should also be the default, especially if we block
> only local processing but not forwarding. You don't feel this robustness
> is not necessary (or consider the performance impact too high) then I
> accept that and withdraw my proposal. I just thought it would be a useful
> addition to the bridge's out-of-the-box stability.
>
You can decide to filter, I don't mind, but do it within the filtering
subsystems and not in the bridge. This is better for everyone -
interested parties would take the performance hit, but also you get more
flexibility and matching opportunities, tomorrow you might decide to
have more complex criteria.
> All that said, I'll explore a netfilter solution (see below) to avoid
> maintaing out-of-tree patches.
>
>
>>
>>
>>> Although that would risk breaking existing weird set-ups. So unless you
>>> signal preference for this I will not persue that any further.
>>>
>>>
>>>>> This would be analogous to the check added for MAB in 2022
>>>>> (commit a35ec8e38cdd "bridge: Add MAC Authentication Bypass (MAB) support").
>>>>>
>>>>> While there are maybe other methods, only in the bridge code I may
>>>>> access the resulting FDB to test for the BR_FDB_LOCAL flag. There's
>>>>> typically not only a single MAC adress to check for, but such a local
>>>>> FDB is maintained for the enslaved port's MACs as well. Replicating
>>>>> the check outside of the bridge receive code would be orders more
>>>>> complex. For example, you need to update the filter each time a port is
>>>>> added or removed from the bridge.
>>>>>
>>>>
>>>> That is not entirely true, you can make a solution that dynamically compares
>>>> the mac addresses of net devices with src mac of incoming frames, you may need
>>>> to keep a list of the ports themselves or use ebpf though. It isn't complicated
>>>> at all, you just need to keep that list updated when adding/removing ports
>>>> you can even do it with a simple ip monitor and a bash script as a poc, there's nothing
>>>> complicated about it and we won't have to maintain another bridge option forever.
>>>
>>> I'm really trying to be open-minded about other possible ways, but I'm struggling.
>>>
>>> For one, you know we're making a firmware for our home routers. We control all the
>>> code, from boot-loader to kernel to user space, so it's often both easier and more
>>> reliable to make small modifications to the kernel than to come up with complex
>>> user space. In other words, we don't have any eBPF tooling in place currently and
>>> that would be a major disruption to our workflow. We don't even use LLVM, just GCC
>>> everywhere. I'd have to justify introducing all the eBPF tooling and processes (in
>>> order to avoid having a small patch to kernel) to my colleagues and my manager. I
>>> don't think that'd work out well. I'm pretty sure other companies in our field are
>>> in the same situation.
>>
>> That really is your problem, it doesn't change the fact it can be solved
>> using eBPF or netfilter. I'm sorry but this is not an argument for this
>> mailing list or for accepting a patch. It really is a pretty simple
>> solution - take ipmonitor (from iproute2/ip), strip all and just look
>> for NEWLINK then act on master changes: on new master add port/mac to
>> table and vice-versa. What's so complex? You can also do it with
>> netfilter and nftables, just update a matching nft table on master
>> changes. Moreover these events are not usually many. In fact since you
>> control user-space entirely I'd add it to the enslave/release pieces in
>> whatever network management tools you're using, so when an interface is
>> enslaved its mac is added to that filter and removed when it's released,
>> then you won't need to have a constantly running process to monitor,
>> even simpler.
>>
>> Actually it took me about 15 minutes to get a working solution to this
>> problem just by reusing the ipmonitor and iproute2 netlink code with a
>> nft table hooked on port's ingress. It is that simple, but I'd prefer to
>> do it in the network manager on port add/del and avoid monitoring
>> altogether.
>
>
> Impressive!
>
>
>>
>>>
>>> Furthermore, from what I understand, an eBPF filter would not perform as good
>>> (performance also matters for us!) because there is no hook at this point. I'd need
>>> to hook earlier, perhaps using XDP (?), and that might have to process many more
>>> packets than those that enter the bridge. On the user space side, I'd need to have
>>> a daemon that update bpf maps or something like that to keep the list updated. I'm
>>> new to eBPF, so sorry if it seems more complex to me than it is.
>>
>> It will process the same amount of packets that the bridge would.
>
>
> If we add vlan devices the tagged packets that don't enter the bridge would still be
> processed by eBPF.
>
This processing could come down to a simple conditional statement
depending on how much vlan filtering you need. You wouldn't notice
its impact if implemented properly.
> On top, we have also a custom hook that may consume packets for other reasons before
> they enter the bridge. But that's not your problem.
>
>
>>
>>>
>>> For netfilter, I looked into that also, but the NF_BR_LOCAL_IN hook is too late. One
>>> of the biggest problems we try to solve is that looping IGMP packets enter the bridge
>>> and acually refresh MDBs that should normally timeout (we send JOINs for the addresses
>>> out but the MDB should only refresh when JOINs from other systems are received). Then,
>>> even if the filter location would fit, I'd effectively just re-implement the bridge's
>>> FDB lookup which rings bells that it's not an effective approach.
>>>
>>> So both alternatives you projected are not a good fit to the actual problem and may
>>> require vastly more complex user space.
>>
>> Is that all the research? You read 2 minutes of webpages and diagonally
>> scanned some source code, did you see the other bridge netfilter
>> hooks? You can extend netfilter and match in any of them if you insist
>> on having a kernel solution. For example match in NF_BR_PRE_ROUTING.
>> You can extend nft's bridge support and match anything you need.
>
>
> Thank you very much for this! I literally looked for NF_HOOK in br_input.c
> to find suitable entry points and the NF_BR_PRE_ROUTING hook didn't occur
> to me (it's handled differently). I really should have looked more carefully
> over the entire file.
>
> Also, I should have known it anyway, I'm working with the bridge code for
> quite a long time already and considered myself experienced in this area
> (have to reconsider this now...).
>
> So sorry for my incomplete research, NF_BR_PRE_ROUTING seems like a nice
> fit actually. I'll explore this further, and assuming this works, we can
> drop my proposal altogether. From a first look it should work, altough we
> wouldn't be able to just block out local processing (can just have drop or
> not) if need arises and we have to re-implement MAC lookup.
>
> I have to apologize for wasting your time but at least I have lerned a lesson.
>
> Best regards.
No need to apologize, it's good to have these discussions and to make
things clearer.
Cheers,
Nik
Powered by blists - more mailing lists