[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3742218.iIbC2pHGDl@sven-l14>
Date: Sun, 01 Jun 2025 15:10:50 +0200
From: Sven Eckelmann <sven@...fation.org>
To: Marek Lindner <marek.lindner@...lbox.org>,
Simon Wunderlich <sw@...onwunderlich.de>,
Antonio Quartulli <antonio@...delbit.com>,
Matthias Schiffer <mschiffer@...verse-factory.net>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
b.a.t.m.a.n@...ts.open-mesh.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH batadv 4/5] batman-adv: remove global hardif list
On Sunday, 1 June 2025 11:26:25 CEST Matthias Schiffer wrote:
[...]
> > And saying this, the `batadv_hardif_get_by_netdev` call was also used to
> > retrieve additional information about alll kind of interfaces - even when they
> > are not used by batman-adv directly. For example for figuring out if it is a
> > wifi interface(for the TT wifi flag). With you change here, you are basically
> > breaking this functionality because you now require that the netdev is a lower
> > interface of batman-adv. Therefore, things like:
> >
> >
> > ┌──────┐
> > ┌───────────┼br-lan├──────┐
> > │ └──────┘ │
> > │ │
> > │ │
> > ┌─▼─┐ ┌──▼─┐
> > │ap0│ │bat0│
> > └───┘ └──┬─┘
> > │
> > │
> > ┌──▼──┐
> > │mesh0│
> > └─────┘
> >
> >
> > Is not handled anymore correctly in TT because ap0 is not a lower interface of
> > any batadv mesh interface. And as result, the ap-isolation feature of TT
> > will break.
> >
> > Kind regards,
> > Sven
>
> Hmm, this is a tricky one. Only having the hardifs around while they're
> used for meshing means we need some other way to determine the wifi flags -
> but doing it on demand for every batadv_tt_local_add() seems like it could
> be used to facilitate a DoS on the RTNL by causing large numbers of TT
> entries to be added, as the lock needs to be held for resolving the iflink.
Uhm, using a mutex in this place is a bad idea. If batadv_tt_local_add is
called from the non-batadv_interface_tx context then rtnl_lock is already
held - which is not the biggest problem because we can handle this with more
code. But when it is called from the batadv_interface_tx context then it is
usually in a context which doesn't allow sleeping. Here an example output when
adding an rtnl_lock/rtnl_unlock in this place:
[ 9.141427][ T43] =============================
[ 9.141835][ T43] WARNING: suspicious RCU usage
[ 9.142213][ T43] 6.15.0+ #1 Tainted: G O
[ 9.142630][ T43] -----------------------------
[ 9.142981][ T43] ./include/linux/rcupdate.h:409 Illegal context switch in RCU read-side critical section!
[ 9.143674][ T43]
[ 9.143674][ T43] other info that might help us debug this:
[ 9.143674][ T43]
[ 9.144334][ T43]
[ 9.144334][ T43] rcu_scheduler_active = 2, debug_locks = 1
[ 9.144904][ T43] 6 locks held by kworker/1:2/43:
[ 9.145255][ T43] #0: ffff888007be2558 ((wq_completion)mld){+.+.}-{0:0}, at: process_one_work+0xcee/0x1420
[ 9.145944][ T43] #1: ffff88800792fd38 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_one_work+0x798/0x1420
[ 9.146713][ T43] #2: ffff88800a58e5a8 (&idev->mc_lock){+.+.}-{4:4}, at: mld_ifc_work+0x2a/0x200
[ 9.147319][ T43] #3: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: mld_sendpack+0x17f/0xc00
[ 9.147949][ T43] #4: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: ip6_finish_output2+0x294/0x1650
[ 9.148621][ T43] #5: ffffffff834050c0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x18a/0xff0
[ 9.149286][ T43]
[ 9.149286][ T43] stack backtrace:
[ 9.149743][ T43] CPU: 1 UID: 0 PID: 43 Comm: kworker/1:2 Tainted: G O 6.15.0+ #1 NONE
[ 9.149747][ T43] Tainted: [O]=OOT_MODULE
[ 9.149748][ T43] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[ 9.149751][ T43] Workqueue: mld mld_ifc_work
[ 9.149754][ T43] Call Trace:
[ 9.149756][ T43] <TASK>
[ 9.149759][ T43] dump_stack_lvl+0x6f/0xa0
[ 9.149764][ T43] lockdep_rcu_suspicious.cold+0x4e/0x8b
[ 9.149768][ T43] __might_resched+0x26a/0x380
[ 9.149771][ T43] ? rcu_read_unlock+0x80/0x80
[ 9.149773][ T43] ? batadv_primary_if_get_selected+0x320/0x320 [batman_adv]
[ 9.149786][ T43] ? mark_held_locks+0x40/0x70
[ 9.149791][ T43] __mutex_lock+0x113/0x1be0
[ 9.149795][ T43] ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.149807][ T43] ? batadv_bla_get_backbone_gw+0xad1/0xdf0 [batman_adv]
[ 9.149819][ T43] ? mutex_lock_io_nested+0x18d0/0x18d0
[ 9.149824][ T43] ? batadv_bla_claim_dump_entry.isra.0+0x6d0/0x6d0 [batman_adv]
[ 9.149835][ T43] ? ret_from_fork_asm+0x11/0x20
[ 9.149841][ T43] ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.149852][ T43] ? batadv_bla_rx+0xe00/0xe00 [batman_adv]
[ 9.149862][ T43] batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.149878][ T43] ? batadv_tt_global_hash_count+0x110/0x110 [batman_adv]
[ 9.149892][ T43] batadv_interface_tx+0x4b4/0x1820 [batman_adv]
[ 9.149905][ T43] ? batadv_skb_head_push+0x220/0x220 [batman_adv]
[ 9.149917][ T43] ? skb_csum_hwoffload_help+0x650/0x650
[ 9.149922][ T43] dev_hard_start_xmit+0x15c/0x640
[ 9.149926][ T43] ? validate_xmit_skb.isra.0+0x62/0x4a0
[ 9.149930][ T43] __dev_queue_xmit+0x44d/0xff0
[ 9.149933][ T43] ? netdev_core_pick_tx+0x230/0x230
[ 9.149938][ T43] ip6_finish_output2+0x7f8/0x1650
[ 9.149942][ T43] ? icmp6_dst_alloc+0x30a/0x480
[ 9.149946][ T43] mld_sendpack+0x5de/0xc00
[ 9.149951][ T43] ? mld_report_work+0x620/0x620
[ 9.149957][ T43] ? mld_send_cr+0x4ff/0x7f0
[ 9.149961][ T43] mld_ifc_work+0x32/0x200
[ 9.149965][ T43] process_one_work+0x814/0x1420
[ 9.149971][ T43] ? pwq_dec_nr_in_flight+0x540/0x540
[ 9.149977][ T43] ? assign_work+0x168/0x240
[ 9.149980][ T43] worker_thread+0x618/0x1010
[ 9.149985][ T43] ? __kthread_parkme+0xf7/0x260
[ 9.149989][ T43] ? process_one_work+0x1420/0x1420
[ 9.149991][ T43] kthread+0x3bb/0x760
[ 9.149994][ T43] ? kvm_sched_clock_read+0x11/0x20
[ 9.149997][ T43] ? local_clock_noinstr+0x4e/0xe0
[ 9.150000][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.150002][ T43] ? __lock_release+0x154/0x2a0
[ 9.150005][ T43] ? ret_from_fork+0x1b/0x70
[ 9.150010][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.150012][ T43] ret_from_fork+0x31/0x70
[ 9.150015][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.150018][ T43] ret_from_fork_asm+0x11/0x20
[ 9.150025][ T43] </TASK>
[ 9.150026][ T43]
[ 9.171355][ T43] =============================
[ 9.171360][ T43] WARNING: suspicious RCU usage
[ 9.171363][ T43] 6.15.0+ #1 Tainted: G O
[ 9.171366][ T43] -----------------------------
[ 9.171367][ T43] kernel/sched/core.c:8780 Illegal context switch in RCU-bh read-side critical section!
[ 9.171371][ T43]
[ 9.171371][ T43] other info that might help us debug this:
[ 9.171371][ T43]
[ 9.171372][ T43]
[ 9.171372][ T43] rcu_scheduler_active = 2, debug_locks = 1
[ 9.171374][ T43] 6 locks held by kworker/1:2/43:
[ 9.171377][ T43] #0: ffff888007be2558 ((wq_completion)mld){+.+.}-{0:0}, at: process_one_work+0xcee/0x1420
[ 9.171392][ T43] #1: ffff88800792fd38 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_one_work+0x798/0x1420
[ 9.171402][ T43] #2: ffff88800a58e5a8 (&idev->mc_lock){+.+.}-{4:4}, at: mld_ifc_work+0x2a/0x200
[ 9.171413][ T43] #3: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: mld_sendpack+0x17f/0xc00
[ 9.171423][ T43] #4: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: ip6_finish_output2+0x294/0x1650
[ 9.171433][ T43] #5: ffffffff834050c0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x18a/0xff0
[ 9.171459][ T43]
[ 9.171459][ T43] stack backtrace:
[ 9.171462][ T43] CPU: 1 UID: 0 PID: 43 Comm: kworker/1:2 Tainted: G O 6.15.0+ #1 NONE
[ 9.171467][ T43] Tainted: [O]=OOT_MODULE
[ 9.171468][ T43] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[ 9.171470][ T43] Workqueue: mld mld_ifc_work
[ 9.171474][ T43] Call Trace:
[ 9.171476][ T43] <TASK>
[ 9.171479][ T43] dump_stack_lvl+0x6f/0xa0
[ 9.171484][ T43] lockdep_rcu_suspicious.cold+0x4e/0x8b
[ 9.171490][ T43] __might_resched+0x336/0x380
[ 9.171493][ T43] ? rcu_read_unlock+0x80/0x80
[ 9.171496][ T43] ? batadv_primary_if_get_selected+0x320/0x320 [batman_adv]
[ 9.171508][ T43] ? mark_held_locks+0x40/0x70
[ 9.171513][ T43] __mutex_lock+0x113/0x1be0
[ 9.171517][ T43] ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.171529][ T43] ? batadv_bla_get_backbone_gw+0xad1/0xdf0 [batman_adv]
[ 9.171541][ T43] ? mutex_lock_io_nested+0x18d0/0x18d0
[ 9.171546][ T43] ? batadv_bla_claim_dump_entry.isra.0+0x6d0/0x6d0 [batman_adv]
[ 9.171557][ T43] ? ret_from_fork_asm+0x11/0x20
[ 9.171564][ T43] ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.171574][ T43] ? batadv_bla_rx+0xe00/0xe00 [batman_adv]
[ 9.171585][ T43] batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.171601][ T43] ? batadv_tt_global_hash_count+0x110/0x110 [batman_adv]
[ 9.171616][ T43] batadv_interface_tx+0x4b4/0x1820 [batman_adv]
[ 9.171629][ T43] ? batadv_skb_head_push+0x220/0x220 [batman_adv]
[ 9.171641][ T43] ? skb_csum_hwoffload_help+0x650/0x650
[ 9.171647][ T43] dev_hard_start_xmit+0x15c/0x640
[ 9.171650][ T43] ? validate_xmit_skb.isra.0+0x62/0x4a0
[ 9.171654][ T43] __dev_queue_xmit+0x44d/0xff0
[ 9.171657][ T43] ? netdev_core_pick_tx+0x230/0x230
[ 9.171663][ T43] ip6_finish_output2+0x7f8/0x1650
[ 9.171667][ T43] ? icmp6_dst_alloc+0x30a/0x480
[ 9.171671][ T43] mld_sendpack+0x5de/0xc00
[ 9.171676][ T43] ? mld_report_work+0x620/0x620
[ 9.171682][ T43] ? mld_send_cr+0x4ff/0x7f0
[ 9.171686][ T43] mld_ifc_work+0x32/0x200
[ 9.171690][ T43] process_one_work+0x814/0x1420
[ 9.171696][ T43] ? pwq_dec_nr_in_flight+0x540/0x540
[ 9.171702][ T43] ? assign_work+0x168/0x240
[ 9.171706][ T43] worker_thread+0x618/0x1010
[ 9.171710][ T43] ? __kthread_parkme+0xf7/0x260
[ 9.171715][ T43] ? process_one_work+0x1420/0x1420
[ 9.171717][ T43] kthread+0x3bb/0x760
[ 9.171720][ T43] ? kvm_sched_clock_read+0x11/0x20
[ 9.171723][ T43] ? local_clock_noinstr+0x4e/0xe0
[ 9.171727][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.171729][ T43] ? __lock_release+0x154/0x2a0
[ 9.171732][ T43] ? ret_from_fork+0x1b/0x70
[ 9.171736][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.171739][ T43] ret_from_fork+0x31/0x70
[ 9.171742][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.171745][ T43] ret_from_fork_asm+0x11/0x20
[ 9.171752][ T43] </TASK>
[ 9.171754][ T43] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
[ 9.171756][ T43] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 43, name: kworker/1:2
[ 9.171759][ T43] preempt_count: 202, expected: 0
[ 9.171761][ T43] 6 locks held by kworker/1:2/43:
[ 9.171763][ T43] #0: ffff888007be2558 ((wq_completion)mld){+.+.}-{0:0}, at: process_one_work+0xcee/0x1420
[ 9.171774][ T43] #1: ffff88800792fd38 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_one_work+0x798/0x1420
[ 9.171784][ T43] #2: ffff88800a58e5a8 (&idev->mc_lock){+.+.}-{4:4}, at: mld_ifc_work+0x2a/0x200
[ 9.171794][ T43] #3: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: mld_sendpack+0x17f/0xc00
[ 9.171804][ T43] #4: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: ip6_finish_output2+0x294/0x1650
[ 9.171813][ T43] #5: ffffffff834050c0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x18a/0xff0
[ 9.171823][ T43] CPU: 1 UID: 0 PID: 43 Comm: kworker/1:2 Tainted: G O 6.15.0+ #1 NONE
[ 9.171827][ T43] Tainted: [O]=OOT_MODULE
[ 9.171828][ T43] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[ 9.171829][ T43] Workqueue: mld mld_ifc_work
[ 9.171832][ T43] Call Trace:
[ 9.171833][ T43] <TASK>
[ 9.171834][ T43] dump_stack_lvl+0x6f/0xa0
[ 9.171838][ T43] __might_resched.cold+0x160/0x1bc
[ 9.171842][ T43] ? rcu_read_unlock+0x80/0x80
[ 9.171844][ T43] ? batadv_primary_if_get_selected+0x320/0x320 [batman_adv]
[ 9.171855][ T43] ? mark_held_locks+0x40/0x70
[ 9.171859][ T43] __mutex_lock+0x113/0x1be0
[ 9.171863][ T43] ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.171874][ T43] ? batadv_bla_get_backbone_gw+0xad1/0xdf0 [batman_adv]
[ 9.171886][ T43] ? mutex_lock_io_nested+0x18d0/0x18d0
[ 9.171891][ T43] ? batadv_bla_claim_dump_entry.isra.0+0x6d0/0x6d0 [batman_adv]
[ 9.171902][ T43] ? ret_from_fork_asm+0x11/0x20
[ 9.171908][ T43] ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.171919][ T43] ? batadv_bla_rx+0xe00/0xe00 [batman_adv]
[ 9.171929][ T43] batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[ 9.171945][ T43] ? batadv_tt_global_hash_count+0x110/0x110 [batman_adv]
[ 9.171960][ T43] batadv_interface_tx+0x4b4/0x1820 [batman_adv]
[ 9.171972][ T43] ? batadv_skb_head_push+0x220/0x220 [batman_adv]
[ 9.171984][ T43] ? skb_csum_hwoffload_help+0x650/0x650
[ 9.171990][ T43] dev_hard_start_xmit+0x15c/0x640
[ 9.171993][ T43] ? validate_xmit_skb.isra.0+0x62/0x4a0
[ 9.171997][ T43] __dev_queue_xmit+0x44d/0xff0
[ 9.172000][ T43] ? netdev_core_pick_tx+0x230/0x230
[ 9.172006][ T43] ip6_finish_output2+0x7f8/0x1650
[ 9.172010][ T43] ? icmp6_dst_alloc+0x30a/0x480
[ 9.172013][ T43] mld_sendpack+0x5de/0xc00
[ 9.172018][ T43] ? mld_report_work+0x620/0x620
[ 9.172024][ T43] ? mld_send_cr+0x4ff/0x7f0
[ 9.172029][ T43] mld_ifc_work+0x32/0x200
[ 9.172032][ T43] process_one_work+0x814/0x1420
[ 9.172039][ T43] ? pwq_dec_nr_in_flight+0x540/0x540
[ 9.172044][ T43] ? assign_work+0x168/0x240
[ 9.172048][ T43] worker_thread+0x618/0x1010
[ 9.172053][ T43] ? __kthread_parkme+0xf7/0x260
[ 9.172056][ T43] ? process_one_work+0x1420/0x1420
[ 9.172059][ T43] kthread+0x3bb/0x760
[ 9.172061][ T43] ? kvm_sched_clock_read+0x11/0x20
[ 9.172065][ T43] ? local_clock_noinstr+0x4e/0xe0
[ 9.172069][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.172072][ T43] ? __lock_release+0x154/0x2a0
[ 9.172076][ T43] ? ret_from_fork+0x1b/0x70
[ 9.172080][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.172084][ T43] ret_from_fork+0x31/0x70
[ 9.172089][ T43] ? kthread_is_per_cpu+0xc0/0xc0
[ 9.172092][ T43] ret_from_fork_asm+0x11/0x20
[ 9.172100][ T43] </TASK>
So, even getting the parent (see `ASSERT_RTNL` in
`netdev_master_upper_dev_get`) of the lower interface is a no-go at that
point.
> One option might be to add a cache for the wifi flag (and possible other
> information, I'll have to check if there is anything else), but store it in
> the mesh interface, only for interfaces that are bridged with the mesh.
> Cache entries could be created on demand when a local TT entry is added for
> an unknown IIF; when to remove cache entries is something I'll have to
> figure out.
>
> Simpler ideas how to solve this are welcome :)
Having something like a simple (rcu)hash(table) (yes, similar to the global
hardif list), which only stores entries for non-mesh netdev's when they are
(above) a wifi interface, might be enough. It is only for the
"ap-isolation" feature but I guess that someone will not be happy if we
break it.
Kind regards,
Sven
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists