lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2754825.KlZ2vcFHjT@sven-desktop>
Date: Sat, 27 Sep 2025 19:21:51 +0200
From: Sven Eckelmann <sven@...fation.org>
To: Marek Lindner <marek.lindner@...lbox.org>,
 Simon Wunderlich <sw@...onwunderlich.de>,
 Antonio Quartulli <antonio@...delbit.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, b.a.t.m.a.n@...ts.open-mesh.org,
 Network Development <netdev@...r.kernel.org>,
 Linus Lüssing <linus.luessing@...3.blue>,
 Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Subject:
 Re: unregister_netdevice: waiting for batadv_slave_0 to become free. Usage
 count = 2

On Tuesday, 23 September 2025 16:45:48 CEST Tetsuo Handa wrote:
> On 2025/09/22 23:09, Tetsuo Handa wrote:
> > I suspect that batadv_hard_if_event_meshif() has something to do upon
> > NETDEV_UNREGISTER event because batadv_hard_if_event_meshif() receives
> > NETDEV_POST_INIT / NETDEV_REGISTER / NETDEV_UNREGISTER / NETDEV_PRE_UNINIT
> > events when this reproducer is executed, but I don't know what to do...
> 
> With a change show bottom, the reproducer no longer triggers this problem.
> But is this change correct?

Thanks for the writeup. Unfortunately, I tried to read this multiple times but 
you don't make it easy to figure out what the problem is.

Please don't take the following paragraphs the wrong way - I was just not able 
to really figure out what you meant. I have therefore just added my thoughts 
on each paragraph.

At the end of this mail, I tried to give a shorter summary about the interface 
states and what I think is the actual problem.

> Commit 9e6b5648bbc4 ("batman-adv: Fix duplicated OGMs on NETDEV_UP")
> replaced batadv_iv_iface_activate() (which is called via iface.activate()
>  from batadv_hardif_activate_interface()) with batadv_iv_iface_enabled()
> (which is called via iface.enabled() from batadv_hardif_enable_interface()).
> But that commit missed that batadv_hardif_activate_interface() is called from
> both batadv_hardif_enable_interface() and batadv_hard_if_event().

Why did it miss it? This is the plan in the mentioned commit.

What is the relevant effect which is creating the problem here?


> Since batadv_iv_ogm_schedule_buff() updates if_status to BATADV_IF_ACTIVE
> only when if_status was BATADV_IF_TO_BE_ACTIVATED, we need to call
> batadv_iv_ogm_schedule_buff() from batadv_iv_ogm_schedule() from
> batadv_iv_iface_enabled() via iface.enabled() with
> if_status == BATADV_IF_TO_BE_ACTIVATED if we want iface.enabled() from
> batadv_hardif_enable_interface() to update if_status to BATADV_IF_ACTIVE.

This basically says: Because of A <- B <- C, we need to have C to B to get A. 
But not really what effect this would have.

> But when IFF_UP is not set upon creation, batadv_hardif_enable_interface()
> does not call batadv_hardif_activate_interface(), which means that
> if_status remains BATADV_IF_INACTIVE despite
> batadv_iv_ogm_schedule_buff() is called via iface.enabled().

It must stay inactive when it is down. But the periodic OGM scheduling is 
still needed to have the OGM queued for active interfaces.


> And when IFF_UP is set after creation, batadv_hard_if_event() calls
> batadv_hardif_activate_interface(). But despite "Interface activated: %s\n"
> message being printed, if_status remains BATADV_IF_TO_BE_ACTIVATED because
> iface.activate == NULL due to above-mentioned commit.

This is not necessarily true. It will simply be switched to BATADV_IF_ACTIVATE 
with the next OGM because batadv_iv_ogm_schedule is rescheduled all the time.

> Since we need to call iface.enabled() instead of iface.activate() so that
> batadv_iv_ogm_schedule_buff() will update if_status to BATADV_IF_ACTIVE,
> move iface.enabled() from batadv_hardif_enable_interface() to
> batadv_hardif_activate_interface().

Why do we need to switch to this state in the first place? I didn't find this 
explanation here.

If I would add your change, we would suddenly have multiple scheduled OGMs 
again, right? Because it basically an oddly written revert of 
commit 9e6b5648bbc4 ("batman-adv: Fix duplicated OGMs on NETDEV_UP").


If this would actually the problem, why would the BATADV_CMD_GET_NEIGHBORS 
request be required? I would have expected that following would show the same 
problem when the switch of the state is the problem:

    rmmod batman-adv || true
    modprobe batman-adv
 
    ip link add batadv_slave_0 type veth peer name veth0_to_batadv
    
    ip link add batadv0 type batadv
    batctl meshif batadv0 it 1000000
    ip link set down master batadv0 dev batadv_slave_0
    ip link set up address 00a:a:aa:aa:aa:aa dev batadv_slave_0
    ip link del dev batadv_slave_0


Let me rephrase what happens with the state of an interface. Just so we are on 
the same page:

* an interface detected by batman-adv is starting with the state 
  BATADV_IF_NOT_IN_USE
* an active interface has the state BATADV_IF_ACTIVE
* when an interface is added to a batman-adv meshif, it is first set to the 
  state BATADV_IF_TO_BE_ACTIVATED
  - this is done by batadv_hardif_activate_interface()
  - there are some dependencies to the UP state of the netdev - see below
* the transition from BATADV_IF_TO_BE_ACTIVATED to BATADV_IF_ACTIVE is 
  algorithm specific
  - IV: scheduling of the OGM buffer
  - V: "directly" after BATADV_IF_TO_BE_ACTIVATED is set


The transition from state BATADV_IF_NOT_IN_USE to BATADV_IF_TO_BE_ACTIVATED 
only happens when the interface is actually UP when .ndo_add_slave is called. 
Otherwise, batman-adv postpones the call to batadv_hardif_activate_interface() 
until the NETDEV_UP event is received.

As mentioned earlier the B.A.T.M.A.N. IV algorithm implementation is (in your 
reproducer) responsible for switching the interface state from 
BATADV_IF_TO_BE_ACTIVATED to BATADV_IF_ACTIVE. And it will only do 
this when the interface was actually in the state BATADV_IF_TO_BE_ACTIVATED. 
There is a comment above the statement which explains the details.

And exactly this "delay" makes it more likely that operations are started 
while an interface is in the transition state of BATADV_IF_TO_BE_ACTIVATED. 
Simply because the periodic OGM scheduling isn't triggered immediately - it 
continues with its normal periodicity.
 

The question would now be: what is the actual problem? For example, with 
following change, the problem also seems to be gone:

diff --git i/net/batman-adv/originator.c w/net/batman-adv/originator.c
index c84420cb..f82dce20 100644
--- i/net/batman-adv/originator.c
+++ w/net/batman-adv/originator.c
@@ -763,7 +763,7 @@ int batadv_hardif_neigh_dump(struct sk_buff *msg, struct netlink_callback *cb)
 	bat_priv = netdev_priv(mesh_iface);
 
 	primary_if = batadv_primary_if_get_selected(bat_priv);
-	if (!primary_if || primary_if->if_status != BATADV_IF_ACTIVE) {
+	if (!primary_if) {
 		ret = -ENOENT;
 		goto out_put_mesh_iface;
 	}


And now we are most likely on the right path... primary_if is holding a 
reference to an hardif and therefore also a reference to the netdev. And the 
error handling is only taking care of releasing the reference to the meshif 
but not the primary_if.

I will later send a fix for this with you and 
syzbot+881d65229ca4f9ae8c84@...kaller.appspotmail.com as Reported-by. Is this 
okay for you?



As said, earlier - I am really thankful that you worked on this. So please 
don't interpret this as a harsh criticism. I just had really big problems to 
figure out what you wanted to say in these paragraphs. I usually prefer 
something which is easier to consume.

Regards,
	Sven
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ