linux-kernel - Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1964582-a96a-4b46-afb7-0cdfa8208ef6@blackwall.org>
Date: Wed, 3 Sep 2025 13:08:24 +0300
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Linus Lüssing <linus.luessing@...3.blue>
Cc: Jakub Kicinski <kuba@...nel.org>, bridge@...ts.linux.dev,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 Ido Schimmel <idosch@...dia.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 Simon Horman <horms@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Eric Dumazet <edumazet@...gle.com>, "David S . Miller"
 <davem@...emloft.net>, Kuniyuki Iwashima <kuniyu@...gle.com>,
 Stanislav Fomichev <sdf@...ichev.me>, Xiao Liang <shaw.leon@...il.com>
Subject: Re: [PATCH 0/9] net: bridge: reduce multicast checks in fast path

On 9/2/25 23:16, Linus Lüssing wrote:
> Hi Nik, thanks for the suggestions and review again!
> 
> 
> On Fri, Aug 29, 2025 at 07:23:24PM +0300, Nikolay Aleksandrov wrote:
>>
>> a few notes for v2:
>> - please use READ/WRTE_ONCE() for variables that are used without locking
> 
> Just to understand the issue, otherwise the data path would assume
> an old inactive or active state for a prolonged time after toggling?
> Or what would happen?
>
> 

It's more about marking these as used without locking so KCSAN won't flag them and also
to clearly show people that intent.

>> - please make locking symmetric, I saw that br_multicast_open() expects the lock to be already held, while
>>    __br_multicast_stop() takes it itself
> 
> I think that's what I tried before submitting. Initially wanted
> to have the locking inside, but then it would deadlock on
> br_multicast_toggle()->br_multicast_open()->... as this is the one
> place where br_multicast_open() is called with the multicast spinlock
> already held.
> 
> On the other hand, moving the spinlock out of / around
> __br_multicast_stop() would lead to a sleeping-while-atomic bug
> when calling timer_delete_sync() in there. And if I were to change
> these to a timer_delete() I guess there is no way to do the sync
> part after unlocking? There is no equivalent to something like the
> flush_workqueue() / drain_workqueue() for workqueues, but for
> simple timers instead, right?
> 
> I would also love to go for the first approach, taking the
> spinlock inside of __br_multicast_open(). But not quite sure how
> to best and safely change br_multicast_toggle() then.
> 
> 

Well, this is not an easy one to solve, would require more thought and
changes to get the proper locking, but it certainly shouldn't be left
asymmetric - having one take the lock, and expecting that it's already taken
for the other, that is definitely unacceptable. Please spend more time on this
and think about how it can be changed. Right now I don't have the time to dig
in and make a proper proposal, but I'm happy to review and discuss proposed
solutions.

>> - is the mcast lock really necessary, would atomic ops do for this tracking?
> 
> Hm, not sure. We'd be checking multiple variables before toggling
> the new brmctx->ip{4,6}_active. As the checked variables can
> change from multiple contexts couldn't the following happen then?
> 
> 
> Start: ip*_active = false, snooping_enabled = false,
>         vlan_snooping_enabled = true, vlan{id:42}->snooping_enabled = false
> 
> Thread A)                     Thread B)
> --------------------------------------------------------------------------
>                                br_multicast_toggle(br, 1, ...)
> 			      -> loads vlan{id:42}->snooping_enabled: false
> --------------------------------------------------------------------------
> br_multicast_toggle_one_vlan(vlan{id:42}, true)
> -> vlan->priv_flags: set per-vlan-mc-snooping to true
> -> br_multicast_update_active()
>     checks snooping_enabled: false
>     -> keeping vlan's ip*_active at false
> --------------------------------------------------------------------------
>                                -> sets snooping_enabled: true
>                                -> br_multicast_update_active()
> 			         -> checks vlan{id:42}->snooping_enabled:
> 				    false (from earlier load above)
>                                   -> keeping vlan's ip*_active at false
> 
> Result: vlan's ip*_active is still false even though it should be
> true now?
> 
> .o(Or were netlink and sysfs handlers somehow ensured to never run in
> parallel?)
> 

They are, netlink and sysfs are supposed to take the same locks so they
cannot run in parallel changing options.

> 
> I'm not that versed with atomic's and explicit memory barriers,
> could that prevent something like that even if multiple variables
> are involved? Only used lockless atomic_t's for atomic_inc()/_dec() so far.
> And otherwise used explicit locking.
> 
> 
> 
>> - can you provide the full view somewhere, how would this tracking be used? I fear
>>    there might still be races.
> 
> My original switchdev integration plan so far was roughly still the same
> as in the previous pull-request:
> 
> https://patchwork.kernel.org/project/netdevbpf/patch/20250522195952.29265-5-linus.luessing@c0d3.blue/
> 
> And using it in rtl83xx in OpenWrt like this:
> https://github.com/openwrt/openwrt/pull/18780/commits/708bbc4b73bc90cd13839c613e6634aa5faeeace#diff-b5c9a9cc66e207d77fea5d063dca2cce20cf4ae3a28fc0a5d5eebd47a024d5c3
> 
> But haven't updated these yet onto this PR, will do that later.

Thanks.