[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58d3e95f-ee23-06c7-b690-64fe42b9c56b@blackwall.org>
Date: Mon, 29 May 2023 16:17:35 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Ido Schimmel <idosch@...dia.com>, netdev@...r.kernel.org,
bridge@...ts.linux-foundation.org
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
edumazet@...gle.com, taras.chornyi@...ision.eu, saeedm@...dia.com,
leon@...nel.org, petrm@...dia.com, vladimir.oltean@....com,
claudiu.manoil@....com, alexandre.belloni@...tlin.com,
UNGLinuxDriver@...rochip.com, jhs@...atatu.com, xiyou.wangcong@...il.com,
jiri@...nulli.us, roopa@...dia.com, simon.horman@...igine.com
Subject: Re: [PATCH net-next v2 1/8] skbuff: bridge: Add layer 2 miss
indication
On 29/05/2023 14:48, Ido Schimmel wrote:
> For EVPN non-DF (Designated Forwarder) filtering we need to be able to
> prevent decapsulated traffic from being flooded to a multi-homed host.
> Filtering of multicast and broadcast traffic can be achieved using the
> following flower filter:
>
> # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop
>
> Unlike broadcast and multicast traffic, it is not currently possible to
> filter unknown unicast traffic. The classification into unknown unicast
> is performed by the bridge driver, but is not visible to other layers
> such as tc.
>
> Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear
> the bit whenever a packet enters the bridge (received from a bridge port
> or transmitted via the bridge) and set it if the packet did not match an
> FDB or MDB entry. If there is no skb extension and the bit needs to be
> cleared, then do not allocate one as no extension is equivalent to the
> bit being cleared. The bit is not set for broadcast packets as they
> never perform a lookup and therefore never incur a miss.
>
> A bit that is set for every flooded packet would also work for the
> current use case, but it does not allow us to differentiate between
> registered and unregistered multicast traffic, which might be useful in
> the future.
>
> To keep the performance impact to a minimum, the marking of packets is
> guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not
> touched and an skb extension is not allocated. Instead, only a
> 5 bytes nop is executed, as demonstrated below for the call site in
> br_handle_frame().
>
> Before the patch:
>
> ```
> memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
> c37b09: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12)
> c37b10: 00 00
>
> p = br_port_get_rcu(skb->dev);
> c37b12: 49 8b 44 24 10 mov 0x10(%r12),%rax
> memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
> c37b17: 49 c7 44 24 30 00 00 movq $0x0,0x30(%r12)
> c37b1e: 00 00
> c37b20: 49 c7 44 24 38 00 00 movq $0x0,0x38(%r12)
> c37b27: 00 00
> ```
>
> After the patch (when static key is disabled):
>
> ```
> memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
> c37c29: 49 c7 44 24 28 00 00 movq $0x0,0x28(%r12)
> c37c30: 00 00
> c37c32: 49 8d 44 24 28 lea 0x28(%r12),%rax
> c37c37: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax)
> c37c3e: 00
> c37c3f: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax)
> c37c46: 00
>
> #ifdef CONFIG_HAVE_JUMP_LABEL_HACK
>
> static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
> {
> asm_volatile_goto("1:"
> c37c47: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> br_tc_skb_miss_set(skb, false);
>
> p = br_port_get_rcu(skb->dev);
> c37c4c: 49 8b 44 24 10 mov 0x10(%r12),%rax
> ```
>
> Subsequent patches will extend the flower classifier to be able to match
> on the new 'l2_miss' bit and enable / disable the static key when
> filters that match on it are added / deleted.
>
> Signed-off-by: Ido Schimmel <idosch@...dia.com>
> ---
>
> Notes:
> v2:
> * Use tc skb extension instead of adding a bit to the skb.
> * Do not mark broadcast packets as they never perform a lookup and
> therefore never incur a miss.
>
> include/linux/skbuff.h | 1 +
> net/bridge/br_device.c | 1 +
> net/bridge/br_forward.c | 3 +++
> net/bridge/br_input.c | 1 +
> net/bridge/br_private.h | 27 +++++++++++++++++++++++++++
> 5 files changed, 33 insertions(+)
>
Nice approach.
Acked-by: Nikolay Aleksandrov <razor@...ckwall.org>
Powered by blists - more mailing lists