[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c8e6c501-7006-d051-872c-3e86cf627ed3@blackwall.org>
Date: Thu, 13 Jul 2023 12:46:08 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Ido Schimmel <idosch@...dia.com>, netdev@...r.kernel.org,
bridge@...ts.linux-foundation.org
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
edumazet@...gle.com, roopa@...dia.com, dsahern@...il.com, petrm@...dia.com,
taspelund@...dia.com
Subject: Re: [RFC PATCH net-next 3/4] bridge: Add backup nexthop ID support
On 13/07/2023 12:33, Nikolay Aleksandrov wrote:
> On 13/07/2023 10:09, Ido Schimmel wrote:
>> Add a new bridge port attribute that allows attaching a nexthop object
>> ID to an skb that is redirected to a backup bridge port with VLAN
>> tunneling enabled.
>>
>> Specifically, when redirecting a known unicast packet, read the backup
>> nexthop ID from the bridge port that lost its carrier and set it in the
>> bridge control block of the skb before forwarding it via the backup
>> port. Note that reading the ID from the bridge port should not result in
>> a cache miss as the ID is added next to the 'backup_port' field that was
>> already accessed. After this change, the 'state' field still stays on
>> the first cache line, together with other data path related fields such
>> as 'flags and 'vlgrp':
>>
>> struct net_bridge_port {
>> struct net_bridge * br; /* 0 8 */
>> struct net_device * dev; /* 8 8 */
>> netdevice_tracker dev_tracker; /* 16 0 */
>> struct list_head list; /* 16 16 */
>> long unsigned int flags; /* 32 8 */
>> struct net_bridge_vlan_group * vlgrp; /* 40 8 */
>> struct net_bridge_port * backup_port; /* 48 8 */
>> u32 backup_nhid; /* 56 4 */
>> u8 priority; /* 60 1 */
>> u8 state; /* 61 1 */
>> u16 port_no; /* 62 2 */
>> /* --- cacheline 1 boundary (64 bytes) --- */
>> [...]
>> } __attribute__((__aligned__(8)));
>>
>> When forwarding an skb via a bridge port that has VLAN tunneling
>> enabled, check if the backup nexthop ID stored in the bridge control
>> block is valid (i.e., not zero). If so, instead of attaching the
>> pre-allocated metadata (that only has the tunnel key set), allocate a
>> new metadata, set both the tunnel key and the nexthop object ID and
>> attach it to the skb.
>>
>> By default, do not dump the new attribute to user space as a value of
>> zero is an invalid nexthop object ID.
>>
>> The above is useful for EVPN multihoming. When one of the links
>> composing an Ethernet Segment (ES) fails, traffic needs to be redirected
>> towards the host via one of the other ES peers. For example, if a host
>> is multihomed to three different VTEPs, the backup port of each ES link
>> needs to be set to the VXLAN device and the backup nexthop ID needs to
>> point to an FDB nexthop group that includes the IP addresses of the
>> other two VTEPs. The VXLAN driver will extract the ID from the metadata
>> of the redirected skb, calculate its flow hash and forward it towards
>> one of the other VTEPs. If the ID does not exist, or represents an
>> invalid nexthop object, the VXLAN driver will drop the skb. This
>> relieves the bridge driver from the need to validate the ID.
>>
>> Signed-off-by: Ido Schimmel <idosch@...dia.com>
>> ---
>> include/uapi/linux/if_link.h | 1 +
>> net/bridge/br_forward.c | 1 +
>> net/bridge/br_netlink.c | 12 ++++++++++++
>> net/bridge/br_private.h | 3 +++
>> net/bridge/br_vlan_tunnel.c | 15 +++++++++++++++
>> net/core/rtnetlink.c | 2 +-
>> 6 files changed, 33 insertions(+), 1 deletion(-)
>>
[snip]
>> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
>> index 6116eba1bd89..9d7bc8b96b53 100644
>> --- a/net/bridge/br_forward.c
>> +++ b/net/bridge/br_forward.c
>> @@ -154,6 +154,7 @@ void br_forward(const struct net_bridge_port *to,
>> backup_port = rcu_dereference(to->backup_port);
>> if (unlikely(!backup_port))
>> goto out;
>> + BR_INPUT_SKB_CB(skb)->backup_nhid = READ_ONCE(to->backup_nhid);
>> to = backup_port;
>> }
>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>> index 05c5863d2e20..10f0d33d8ccf 100644
>> --- a/net/bridge/br_netlink.c
>> +++ b/net/bridge/br_netlink.c
>> @@ -211,6 +211,7 @@ static inline size_t br_port_info_size(void)
>> + nla_total_size(sizeof(u8)) /* IFLA_BRPORT_MRP_IN_OPEN */
>> + nla_total_size(sizeof(u32)) /* IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT */
>> + nla_total_size(sizeof(u32)) /* IFLA_BRPORT_MCAST_EHT_HOSTS_CNT */
>> + + nla_total_size(sizeof(u32)) /* IFLA_BRPORT_BACKUP_NHID */
>> + 0;
>> }
>> @@ -319,6 +320,10 @@ static int br_port_fill_attrs(struct sk_buff *skb,
>> backup_p->dev->ifindex);
>> rcu_read_unlock();
>> + if (p->backup_nhid &&
>> + nla_put_u32(skb, IFLA_BRPORT_BACKUP_NHID, p->backup_nhid))
>> + return -EMSGSIZE;
>> +
>
> READ_ONCE(), see the comment above backup_port :)
>
Actually the updates are done with port lock as well.
This should be fine. Patch looks good.
Powered by blists - more mailing lists