lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3742218.iIbC2pHGDl@sven-l14>
Date: Sun, 01 Jun 2025 15:10:50 +0200
From: Sven Eckelmann <sven@...fation.org>
To: Marek Lindner <marek.lindner@...lbox.org>,
 Simon Wunderlich <sw@...onwunderlich.de>,
 Antonio Quartulli <antonio@...delbit.com>,
 Matthias Schiffer <mschiffer@...verse-factory.net>
Cc: "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 b.a.t.m.a.n@...ts.open-mesh.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH batadv 4/5] batman-adv: remove global hardif list

On Sunday, 1 June 2025 11:26:25 CEST Matthias Schiffer wrote:
[...]
> > And saying this, the `batadv_hardif_get_by_netdev` call was also used to
> > retrieve additional information about alll kind of interfaces - even when they
> > are not used by batman-adv directly. For example for figuring out if it is a
> > wifi interface(for the TT wifi flag). With you change here, you are basically
> > breaking this functionality because you now require that the netdev is a lower
> > interface of batman-adv. Therefore, things like:
> > 
> > 
> >                     ┌──────┐
> >         ┌───────────┼br-lan├──────┐
> >         │           └──────┘      │
> >         │                         │
> >         │                         │
> >       ┌─▼─┐                    ┌──▼─┐
> >       │ap0│                    │bat0│
> >       └───┘                    └──┬─┘
> >                                   │
> >                                   │
> >                                ┌──▼──┐
> >                                │mesh0│
> >                                └─────┘
> >                                          
> >                                          
> > Is not handled anymore correctly in TT because ap0 is not a lower interface of
> > any batadv mesh interface. And as result, the ap-isolation feature of TT
> > will break.
> > 
> > Kind regards,
> > 	Sven
> 
> Hmm, this is a tricky one. Only having the hardifs around while they're 
> used for meshing means we need some other way to determine the wifi flags - 
> but doing it on demand for every batadv_tt_local_add() seems like it could 
> be used to facilitate a DoS on the RTNL by causing large numbers of TT 
> entries to be added, as the lock needs to be held for resolving the iflink.

Uhm, using a mutex in this place is a bad idea. If batadv_tt_local_add is 
called from the non-batadv_interface_tx context then rtnl_lock is already 
held - which is not the biggest problem because we can handle this with more 
code. But when it is called from the batadv_interface_tx context then it is 
usually in a context which doesn't allow sleeping. Here an example output when 
adding an rtnl_lock/rtnl_unlock in this place:

[    9.141427][   T43] =============================
[    9.141835][   T43] WARNING: suspicious RCU usage
[    9.142213][   T43] 6.15.0+ #1 Tainted: G           O       
[    9.142630][   T43] -----------------------------
[    9.142981][   T43] ./include/linux/rcupdate.h:409 Illegal context switch in RCU read-side critical section!
[    9.143674][   T43] 
[    9.143674][   T43] other info that might help us debug this:
[    9.143674][   T43] 
[    9.144334][   T43] 
[    9.144334][   T43] rcu_scheduler_active = 2, debug_locks = 1
[    9.144904][   T43] 6 locks held by kworker/1:2/43:
[    9.145255][   T43]  #0: ffff888007be2558 ((wq_completion)mld){+.+.}-{0:0}, at: process_one_work+0xcee/0x1420
[    9.145944][   T43]  #1: ffff88800792fd38 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_one_work+0x798/0x1420
[    9.146713][   T43]  #2: ffff88800a58e5a8 (&idev->mc_lock){+.+.}-{4:4}, at: mld_ifc_work+0x2a/0x200
[    9.147319][   T43]  #3: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: mld_sendpack+0x17f/0xc00
[    9.147949][   T43]  #4: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: ip6_finish_output2+0x294/0x1650
[    9.148621][   T43]  #5: ffffffff834050c0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x18a/0xff0
[    9.149286][   T43] 
[    9.149286][   T43] stack backtrace:
[    9.149743][   T43] CPU: 1 UID: 0 PID: 43 Comm: kworker/1:2 Tainted: G           O        6.15.0+ #1 NONE 
[    9.149747][   T43] Tainted: [O]=OOT_MODULE
[    9.149748][   T43] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[    9.149751][   T43] Workqueue: mld mld_ifc_work
[    9.149754][   T43] Call Trace:
[    9.149756][   T43]  <TASK>
[    9.149759][   T43]  dump_stack_lvl+0x6f/0xa0
[    9.149764][   T43]  lockdep_rcu_suspicious.cold+0x4e/0x8b
[    9.149768][   T43]  __might_resched+0x26a/0x380
[    9.149771][   T43]  ? rcu_read_unlock+0x80/0x80
[    9.149773][   T43]  ? batadv_primary_if_get_selected+0x320/0x320 [batman_adv]
[    9.149786][   T43]  ? mark_held_locks+0x40/0x70
[    9.149791][   T43]  __mutex_lock+0x113/0x1be0
[    9.149795][   T43]  ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.149807][   T43]  ? batadv_bla_get_backbone_gw+0xad1/0xdf0 [batman_adv]
[    9.149819][   T43]  ? mutex_lock_io_nested+0x18d0/0x18d0
[    9.149824][   T43]  ? batadv_bla_claim_dump_entry.isra.0+0x6d0/0x6d0 [batman_adv]
[    9.149835][   T43]  ? ret_from_fork_asm+0x11/0x20
[    9.149841][   T43]  ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.149852][   T43]  ? batadv_bla_rx+0xe00/0xe00 [batman_adv]
[    9.149862][   T43]  batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.149878][   T43]  ? batadv_tt_global_hash_count+0x110/0x110 [batman_adv]
[    9.149892][   T43]  batadv_interface_tx+0x4b4/0x1820 [batman_adv]
[    9.149905][   T43]  ? batadv_skb_head_push+0x220/0x220 [batman_adv]
[    9.149917][   T43]  ? skb_csum_hwoffload_help+0x650/0x650
[    9.149922][   T43]  dev_hard_start_xmit+0x15c/0x640
[    9.149926][   T43]  ? validate_xmit_skb.isra.0+0x62/0x4a0
[    9.149930][   T43]  __dev_queue_xmit+0x44d/0xff0
[    9.149933][   T43]  ? netdev_core_pick_tx+0x230/0x230
[    9.149938][   T43]  ip6_finish_output2+0x7f8/0x1650
[    9.149942][   T43]  ? icmp6_dst_alloc+0x30a/0x480
[    9.149946][   T43]  mld_sendpack+0x5de/0xc00
[    9.149951][   T43]  ? mld_report_work+0x620/0x620
[    9.149957][   T43]  ? mld_send_cr+0x4ff/0x7f0
[    9.149961][   T43]  mld_ifc_work+0x32/0x200
[    9.149965][   T43]  process_one_work+0x814/0x1420
[    9.149971][   T43]  ? pwq_dec_nr_in_flight+0x540/0x540
[    9.149977][   T43]  ? assign_work+0x168/0x240
[    9.149980][   T43]  worker_thread+0x618/0x1010
[    9.149985][   T43]  ? __kthread_parkme+0xf7/0x260
[    9.149989][   T43]  ? process_one_work+0x1420/0x1420
[    9.149991][   T43]  kthread+0x3bb/0x760
[    9.149994][   T43]  ? kvm_sched_clock_read+0x11/0x20
[    9.149997][   T43]  ? local_clock_noinstr+0x4e/0xe0
[    9.150000][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.150002][   T43]  ? __lock_release+0x154/0x2a0
[    9.150005][   T43]  ? ret_from_fork+0x1b/0x70
[    9.150010][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.150012][   T43]  ret_from_fork+0x31/0x70
[    9.150015][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.150018][   T43]  ret_from_fork_asm+0x11/0x20
[    9.150025][   T43]  </TASK>
[    9.150026][   T43] 
[    9.171355][   T43] =============================
[    9.171360][   T43] WARNING: suspicious RCU usage
[    9.171363][   T43] 6.15.0+ #1 Tainted: G           O       
[    9.171366][   T43] -----------------------------
[    9.171367][   T43] kernel/sched/core.c:8780 Illegal context switch in RCU-bh read-side critical section!
[    9.171371][   T43] 
[    9.171371][   T43] other info that might help us debug this:
[    9.171371][   T43] 
[    9.171372][   T43] 
[    9.171372][   T43] rcu_scheduler_active = 2, debug_locks = 1
[    9.171374][   T43] 6 locks held by kworker/1:2/43:
[    9.171377][   T43]  #0: ffff888007be2558 ((wq_completion)mld){+.+.}-{0:0}, at: process_one_work+0xcee/0x1420
[    9.171392][   T43]  #1: ffff88800792fd38 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_one_work+0x798/0x1420
[    9.171402][   T43]  #2: ffff88800a58e5a8 (&idev->mc_lock){+.+.}-{4:4}, at: mld_ifc_work+0x2a/0x200
[    9.171413][   T43]  #3: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: mld_sendpack+0x17f/0xc00
[    9.171423][   T43]  #4: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: ip6_finish_output2+0x294/0x1650
[    9.171433][   T43]  #5: ffffffff834050c0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x18a/0xff0
[    9.171459][   T43] 
[    9.171459][   T43] stack backtrace:
[    9.171462][   T43] CPU: 1 UID: 0 PID: 43 Comm: kworker/1:2 Tainted: G           O        6.15.0+ #1 NONE 
[    9.171467][   T43] Tainted: [O]=OOT_MODULE
[    9.171468][   T43] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[    9.171470][   T43] Workqueue: mld mld_ifc_work
[    9.171474][   T43] Call Trace:
[    9.171476][   T43]  <TASK>
[    9.171479][   T43]  dump_stack_lvl+0x6f/0xa0
[    9.171484][   T43]  lockdep_rcu_suspicious.cold+0x4e/0x8b
[    9.171490][   T43]  __might_resched+0x336/0x380
[    9.171493][   T43]  ? rcu_read_unlock+0x80/0x80
[    9.171496][   T43]  ? batadv_primary_if_get_selected+0x320/0x320 [batman_adv]
[    9.171508][   T43]  ? mark_held_locks+0x40/0x70
[    9.171513][   T43]  __mutex_lock+0x113/0x1be0
[    9.171517][   T43]  ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.171529][   T43]  ? batadv_bla_get_backbone_gw+0xad1/0xdf0 [batman_adv]
[    9.171541][   T43]  ? mutex_lock_io_nested+0x18d0/0x18d0
[    9.171546][   T43]  ? batadv_bla_claim_dump_entry.isra.0+0x6d0/0x6d0 [batman_adv]
[    9.171557][   T43]  ? ret_from_fork_asm+0x11/0x20
[    9.171564][   T43]  ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.171574][   T43]  ? batadv_bla_rx+0xe00/0xe00 [batman_adv]
[    9.171585][   T43]  batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.171601][   T43]  ? batadv_tt_global_hash_count+0x110/0x110 [batman_adv]
[    9.171616][   T43]  batadv_interface_tx+0x4b4/0x1820 [batman_adv]
[    9.171629][   T43]  ? batadv_skb_head_push+0x220/0x220 [batman_adv]
[    9.171641][   T43]  ? skb_csum_hwoffload_help+0x650/0x650
[    9.171647][   T43]  dev_hard_start_xmit+0x15c/0x640
[    9.171650][   T43]  ? validate_xmit_skb.isra.0+0x62/0x4a0
[    9.171654][   T43]  __dev_queue_xmit+0x44d/0xff0
[    9.171657][   T43]  ? netdev_core_pick_tx+0x230/0x230
[    9.171663][   T43]  ip6_finish_output2+0x7f8/0x1650
[    9.171667][   T43]  ? icmp6_dst_alloc+0x30a/0x480
[    9.171671][   T43]  mld_sendpack+0x5de/0xc00
[    9.171676][   T43]  ? mld_report_work+0x620/0x620
[    9.171682][   T43]  ? mld_send_cr+0x4ff/0x7f0
[    9.171686][   T43]  mld_ifc_work+0x32/0x200
[    9.171690][   T43]  process_one_work+0x814/0x1420
[    9.171696][   T43]  ? pwq_dec_nr_in_flight+0x540/0x540
[    9.171702][   T43]  ? assign_work+0x168/0x240
[    9.171706][   T43]  worker_thread+0x618/0x1010
[    9.171710][   T43]  ? __kthread_parkme+0xf7/0x260
[    9.171715][   T43]  ? process_one_work+0x1420/0x1420
[    9.171717][   T43]  kthread+0x3bb/0x760
[    9.171720][   T43]  ? kvm_sched_clock_read+0x11/0x20
[    9.171723][   T43]  ? local_clock_noinstr+0x4e/0xe0
[    9.171727][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.171729][   T43]  ? __lock_release+0x154/0x2a0
[    9.171732][   T43]  ? ret_from_fork+0x1b/0x70
[    9.171736][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.171739][   T43]  ret_from_fork+0x31/0x70
[    9.171742][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.171745][   T43]  ret_from_fork_asm+0x11/0x20
[    9.171752][   T43]  </TASK>
[    9.171754][   T43] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
[    9.171756][   T43] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 43, name: kworker/1:2
[    9.171759][   T43] preempt_count: 202, expected: 0
[    9.171761][   T43] 6 locks held by kworker/1:2/43:
[    9.171763][   T43]  #0: ffff888007be2558 ((wq_completion)mld){+.+.}-{0:0}, at: process_one_work+0xcee/0x1420
[    9.171774][   T43]  #1: ffff88800792fd38 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_one_work+0x798/0x1420
[    9.171784][   T43]  #2: ffff88800a58e5a8 (&idev->mc_lock){+.+.}-{4:4}, at: mld_ifc_work+0x2a/0x200
[    9.171794][   T43]  #3: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: mld_sendpack+0x17f/0xc00
[    9.171804][   T43]  #4: ffffffff83405120 (rcu_read_lock){....}-{1:3}, at: ip6_finish_output2+0x294/0x1650
[    9.171813][   T43]  #5: ffffffff834050c0 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x18a/0xff0
[    9.171823][   T43] CPU: 1 UID: 0 PID: 43 Comm: kworker/1:2 Tainted: G           O        6.15.0+ #1 NONE 
[    9.171827][   T43] Tainted: [O]=OOT_MODULE
[    9.171828][   T43] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.1 11/11/2019
[    9.171829][   T43] Workqueue: mld mld_ifc_work
[    9.171832][   T43] Call Trace:
[    9.171833][   T43]  <TASK>
[    9.171834][   T43]  dump_stack_lvl+0x6f/0xa0
[    9.171838][   T43]  __might_resched.cold+0x160/0x1bc
[    9.171842][   T43]  ? rcu_read_unlock+0x80/0x80
[    9.171844][   T43]  ? batadv_primary_if_get_selected+0x320/0x320 [batman_adv]
[    9.171855][   T43]  ? mark_held_locks+0x40/0x70
[    9.171859][   T43]  __mutex_lock+0x113/0x1be0
[    9.171863][   T43]  ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.171874][   T43]  ? batadv_bla_get_backbone_gw+0xad1/0xdf0 [batman_adv]
[    9.171886][   T43]  ? mutex_lock_io_nested+0x18d0/0x18d0
[    9.171891][   T43]  ? batadv_bla_claim_dump_entry.isra.0+0x6d0/0x6d0 [batman_adv]
[    9.171902][   T43]  ? ret_from_fork_asm+0x11/0x20
[    9.171908][   T43]  ? batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.171919][   T43]  ? batadv_bla_rx+0xe00/0xe00 [batman_adv]
[    9.171929][   T43]  batadv_tt_local_add+0x3d4/0x1d20 [batman_adv]
[    9.171945][   T43]  ? batadv_tt_global_hash_count+0x110/0x110 [batman_adv]
[    9.171960][   T43]  batadv_interface_tx+0x4b4/0x1820 [batman_adv]
[    9.171972][   T43]  ? batadv_skb_head_push+0x220/0x220 [batman_adv]
[    9.171984][   T43]  ? skb_csum_hwoffload_help+0x650/0x650
[    9.171990][   T43]  dev_hard_start_xmit+0x15c/0x640
[    9.171993][   T43]  ? validate_xmit_skb.isra.0+0x62/0x4a0
[    9.171997][   T43]  __dev_queue_xmit+0x44d/0xff0
[    9.172000][   T43]  ? netdev_core_pick_tx+0x230/0x230
[    9.172006][   T43]  ip6_finish_output2+0x7f8/0x1650
[    9.172010][   T43]  ? icmp6_dst_alloc+0x30a/0x480
[    9.172013][   T43]  mld_sendpack+0x5de/0xc00
[    9.172018][   T43]  ? mld_report_work+0x620/0x620
[    9.172024][   T43]  ? mld_send_cr+0x4ff/0x7f0
[    9.172029][   T43]  mld_ifc_work+0x32/0x200
[    9.172032][   T43]  process_one_work+0x814/0x1420
[    9.172039][   T43]  ? pwq_dec_nr_in_flight+0x540/0x540
[    9.172044][   T43]  ? assign_work+0x168/0x240
[    9.172048][   T43]  worker_thread+0x618/0x1010
[    9.172053][   T43]  ? __kthread_parkme+0xf7/0x260
[    9.172056][   T43]  ? process_one_work+0x1420/0x1420
[    9.172059][   T43]  kthread+0x3bb/0x760
[    9.172061][   T43]  ? kvm_sched_clock_read+0x11/0x20
[    9.172065][   T43]  ? local_clock_noinstr+0x4e/0xe0
[    9.172069][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.172072][   T43]  ? __lock_release+0x154/0x2a0
[    9.172076][   T43]  ? ret_from_fork+0x1b/0x70
[    9.172080][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.172084][   T43]  ret_from_fork+0x31/0x70
[    9.172089][   T43]  ? kthread_is_per_cpu+0xc0/0xc0
[    9.172092][   T43]  ret_from_fork_asm+0x11/0x20
[    9.172100][   T43]  </TASK>

So, even getting the parent (see `ASSERT_RTNL` in 
`netdev_master_upper_dev_get`) of the lower interface is a no-go at that 
point.


> One option might be to add a cache for the wifi flag (and possible other 
> information, I'll have to check if there is anything else), but store it in 
> the mesh interface, only for interfaces that are bridged with the mesh. 
> Cache entries could be created on demand when a local TT entry is added for 
> an unknown IIF; when to remove cache entries is something I'll have to 
> figure out.
> 
> Simpler ideas how to solve this are welcome :)

Having something like a simple (rcu)hash(table) (yes, similar to the global 
hardif list), which only stores entries for non-mesh netdev's when they are 
(above) a wifi interface, might be enough. It is only for the  
"ap-isolation" feature but I guess that someone will not be happy if we 
break it.

Kind regards,
	Sven
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ