[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f999251-8132-414e-9ea1-f83ecc41dfdd@I-love.SAKURA.ne.jp>
Date: Sun, 28 Sep 2025 10:06:05 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Sven Eckelmann <sven@...fation.org>,
Marek Lindner <marek.lindner@...lbox.org>,
Simon Wunderlich <sw@...onwunderlich.de>,
Antonio Quartulli <antonio@...delbit.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
b.a.t.m.a.n@...ts.open-mesh.org,
Network Development <netdev@...r.kernel.org>,
Linus Lüssing <linus.luessing@...3.blue>
Subject: Re: unregister_netdevice: waiting for batadv_slave_0 to become free.
Usage count = 2
Thank you for responding quickly.
On 2025/09/28 2:21, Sven Eckelmann wrote:
> The question would now be: what is the actual problem?
Sorry, my explanation was not clear enough.
What I thought as a problem is the difference between
netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, "batadv0", 0, 0);
//netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, 0, &macaddr, ETH_ALEN);
and
netlink_device_change(&nlmsg, sock, "batadv_slave_0", false, "batadv0", 0, 0);
netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, 0, &macaddr, ETH_ALEN);
. The former makes hard_iface->if_status == BATADV_IF_ACTIVE while the latter makes
hard_iface->if_status == BATADV_IF_TO_BE_ACTIVATED (because batadv_iv_ogm_schedule_buff()
is not called).
This difference makes operations that depend on hard_iface->if_status == BATADV_IF_ACTIVE
impossible to work properly. You can confirm this difference using diff show below.
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -761,6 +761,7 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface,
batadv_check_known_mac_addr(hard_iface);
+ pr_info("step 1: %d\n", hard_iface->if_status);
if (batadv_hardif_is_iface_up(hard_iface))
batadv_hardif_activate_interface(hard_iface);
else
@@ -768,10 +769,12 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface,
"Not using interface %s (retrying later): interface not active\n",
hard_iface->net_dev->name);
+ pr_info("step 2: %d\n", hard_iface->if_status);
batadv_hardif_recalc_extra_skbroom(mesh_iface);
if (bat_priv->algo_ops->iface.enabled)
bat_priv->algo_ops->iface.enabled(hard_iface);
+ pr_info("step 3: %d\n", hard_iface->if_status);
out:
return 0;
@@ -961,7 +964,9 @@ static int batadv_hard_if_event(struct notifier_block *this,
switch (event) {
case NETDEV_UP:
+ pr_info("step 4: %d\n", hard_iface->if_status);
batadv_hardif_activate_interface(hard_iface);
+ pr_info("step 5: %d\n", hard_iface->if_status);
break;
case NETDEV_GOING_DOWN:
case NETDEV_DOWN:
The former case:
batman_adv: batadv0: Adding interface: batadv_slave_0
batman_adv: batadv0: The MTU of interface batadv_slave_0 is too small (1500) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1560 would solve the problem.
batman_adv: step 1: 2
batman_adv: batadv0: Interface activated: batadv_slave_0
batman_adv: step 2: 4
batman_adv: step 3: 3
batman_adv: batadv0: Interface deactivated: batadv_slave_0
batman_adv: batadv0: Removing interface: batadv_slave_0
The latter case:
batman_adv: step 1: 2
batman_adv: batadv0: Not using interface batadv_slave_0 (retrying later): interface not actve
batman_adv: step 2: 2
batman_adv: step 3: 2
batman_adv: step 4: 2
batman_adv: batadv0: Interface activated: batadv_slave_0
batman_adv: step 5: 4
batman_adv: batadv0: Interface deactivated: batadv_slave_0
batman_adv: batadv0: Removing interface: batadv_slave_0
> --- i/net/batman-adv/originator.c
> +++ w/net/batman-adv/originator.c
> @@ -763,7 +763,7 @@ int batadv_hardif_neigh_dump(struct sk_buff *msg, struct netlink_callback *cb)
> bat_priv = netdev_priv(mesh_iface);
>
> primary_if = batadv_primary_if_get_selected(bat_priv);
> - if (!primary_if || primary_if->if_status != BATADV_IF_ACTIVE) {
> + if (!primary_if) {
> ret = -ENOENT;
> goto out_put_mesh_iface;
> }
>
>
> And now we are most likely on the right path... primary_if is holding a
> reference to an hardif and therefore also a reference to the netdev. And the
> error handling is only taking care of releasing the reference to the meshif
> but not the primary_if.
Ah, indeed. Since batadv_primary_if_get_selected() is calling kref_get_unless_zero(),
primary_if->if_status != BATADV_IF_ACTIVE case needs to call kref_put().
Also, this matches my what I thought as a problem (BATADV_IF_TO_BE_ACTIVATED prevents
operations that depends on BATADV_IF_ACTIVE from working as expected).
> I will later send a fix for this with you and
> syzbot+881d65229ca4f9ae8c84@...kaller.appspotmail.com as Reported-by. Is this
> okay for you?
Yes, the reproducer no longer shows the problem.
Thank you.
Powered by blists - more mailing lists