lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f999251-8132-414e-9ea1-f83ecc41dfdd@I-love.SAKURA.ne.jp>
Date: Sun, 28 Sep 2025 10:06:05 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Sven Eckelmann <sven@...fation.org>,
        Marek Lindner <marek.lindner@...lbox.org>,
        Simon Wunderlich <sw@...onwunderlich.de>,
        Antonio Quartulli <antonio@...delbit.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
        b.a.t.m.a.n@...ts.open-mesh.org,
        Network Development <netdev@...r.kernel.org>,
        Linus Lüssing <linus.luessing@...3.blue>
Subject: Re: unregister_netdevice: waiting for batadv_slave_0 to become free.
 Usage count = 2

Thank you for responding quickly.

On 2025/09/28 2:21, Sven Eckelmann wrote:
> The question would now be: what is the actual problem? 

Sorry, my explanation was not clear enough.

What I thought as a problem is the difference between

	netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, "batadv0", 0, 0);
	//netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, 0, &macaddr, ETH_ALEN);

and

	netlink_device_change(&nlmsg, sock, "batadv_slave_0", false, "batadv0", 0, 0);
	netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, 0, &macaddr, ETH_ALEN);

. The former makes hard_iface->if_status == BATADV_IF_ACTIVE while the latter makes
hard_iface->if_status == BATADV_IF_TO_BE_ACTIVATED (because batadv_iv_ogm_schedule_buff()
is not called).

This difference makes operations that depend on hard_iface->if_status == BATADV_IF_ACTIVE
impossible to work properly. You can confirm this difference using diff show below.

--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -761,6 +761,7 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface,

        batadv_check_known_mac_addr(hard_iface);

+       pr_info("step 1: %d\n", hard_iface->if_status);
        if (batadv_hardif_is_iface_up(hard_iface))
                batadv_hardif_activate_interface(hard_iface);
        else
@@ -768,10 +769,12 @@ int batadv_hardif_enable_interface(struct batadv_hard_iface *hard_iface,
                           "Not using interface %s (retrying later): interface not active\n",
                           hard_iface->net_dev->name);

+       pr_info("step 2: %d\n", hard_iface->if_status);
        batadv_hardif_recalc_extra_skbroom(mesh_iface);

        if (bat_priv->algo_ops->iface.enabled)
                bat_priv->algo_ops->iface.enabled(hard_iface);
+       pr_info("step 3: %d\n", hard_iface->if_status);

 out:
        return 0;
@@ -961,7 +964,9 @@ static int batadv_hard_if_event(struct notifier_block *this,

        switch (event) {
        case NETDEV_UP:
+               pr_info("step 4: %d\n", hard_iface->if_status);
                batadv_hardif_activate_interface(hard_iface);
+               pr_info("step 5: %d\n", hard_iface->if_status);
                break;
        case NETDEV_GOING_DOWN:
        case NETDEV_DOWN:

The former case:

  batman_adv: batadv0: Adding interface: batadv_slave_0
  batman_adv: batadv0: The MTU of interface batadv_slave_0 is too small (1500) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1560 would solve the problem.
  batman_adv: step 1: 2
  batman_adv: batadv0: Interface activated: batadv_slave_0
  batman_adv: step 2: 4
  batman_adv: step 3: 3
  batman_adv: batadv0: Interface deactivated: batadv_slave_0
  batman_adv: batadv0: Removing interface: batadv_slave_0

The latter case:

  batman_adv: step 1: 2
  batman_adv: batadv0: Not using interface batadv_slave_0 (retrying later): interface not actve
  batman_adv: step 2: 2
  batman_adv: step 3: 2
  batman_adv: step 4: 2
  batman_adv: batadv0: Interface activated: batadv_slave_0
  batman_adv: step 5: 4
  batman_adv: batadv0: Interface deactivated: batadv_slave_0
  batman_adv: batadv0: Removing interface: batadv_slave_0



> --- i/net/batman-adv/originator.c
> +++ w/net/batman-adv/originator.c
> @@ -763,7 +763,7 @@ int batadv_hardif_neigh_dump(struct sk_buff *msg, struct netlink_callback *cb)
>  	bat_priv = netdev_priv(mesh_iface);
>  
>  	primary_if = batadv_primary_if_get_selected(bat_priv);
> -	if (!primary_if || primary_if->if_status != BATADV_IF_ACTIVE) {
> +	if (!primary_if) {
>  		ret = -ENOENT;
>  		goto out_put_mesh_iface;
>  	}
> 
> 
> And now we are most likely on the right path... primary_if is holding a 
> reference to an hardif and therefore also a reference to the netdev. And the 
> error handling is only taking care of releasing the reference to the meshif 
> but not the primary_if.

Ah, indeed. Since batadv_primary_if_get_selected() is calling kref_get_unless_zero(),
primary_if->if_status != BATADV_IF_ACTIVE case needs to call kref_put().
Also, this matches my what I thought as a problem (BATADV_IF_TO_BE_ACTIVATED prevents
operations that depends on BATADV_IF_ACTIVE from working as expected).

> I will later send a fix for this with you and 
> syzbot+881d65229ca4f9ae8c84@...kaller.appspotmail.com as Reported-by. Is this
> okay for you?

Yes, the reproducer no longer shows the problem.

Thank you.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ