[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2450943.NG923GbCHz@sven-desktop>
Date: Sun, 28 Sep 2025 09:50:02 +0200
From: Sven Eckelmann <sven@...fation.org>
To: Marek Lindner <marek.lindner@...lbox.org>,
Simon Wunderlich <sw@...onwunderlich.de>,
Antonio Quartulli <antonio@...delbit.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, b.a.t.m.a.n@...ts.open-mesh.org,
Network Development <netdev@...r.kernel.org>,
Linus Lüssing <linus.luessing@...3.blue>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Subject:
Re: unregister_netdevice: waiting for batadv_slave_0 to become free. Usage
count = 2
On Sunday, 28 September 2025 03:06:05 CEST Tetsuo Handa wrote:
> Thank you for responding quickly.
>
> On 2025/09/28 2:21, Sven Eckelmann wrote:
> > The question would now be: what is the actual problem?
>
> Sorry, my explanation was not clear enough.
It was long and contained a lot of things - but not what the actual problem is.
It is necessary to read a lot of inline calltraces with subclauses - and then
by reading between the lines, we must figure out what you actually wanted to
say.
It is no problem to not know the underlying problem. But all these absolute
statements, accusations and overly detailed statement made me think that I am
just too stupid to get it and you must be right.
> What I thought as a problem is the difference between
>
> netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, "batadv0", 0, 0);
> //netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, 0, &macaddr, ETH_ALEN);
>
> and
>
> netlink_device_change(&nlmsg, sock, "batadv_slave_0", false, "batadv0", 0, 0);
> netlink_device_change(&nlmsg, sock, "batadv_slave_0", true, 0, &macaddr, ETH_ALEN);
>
> . The former makes hard_iface->if_status == BATADV_IF_ACTIVE while the latter makes
> hard_iface->if_status == BATADV_IF_TO_BE_ACTIVATED (because batadv_iv_ogm_schedule_buff()
> is not called).
>
> This difference makes operations that depend on hard_iface->if_status == BATADV_IF_ACTIVE
> impossible to work properly. You can confirm this difference using diff show below.
This is again (in my opinion) this kind of (odd) absolute statement again.
"impossible to work properly" - this sounds like BATADV_IF_TO_BE_ACTIVATED is
an state which you cannot escape. And that functions/operations depend on
BATADV_IF_ACTIVE. Both statements are not really true.
BATADV_IF_TO_BE_ACTIVATED is a transient state and some algorithm depending
code is responsible to automatically get it in the BATADV_IF_ACTIVE state.
This is somewhat important here because the first time I read your second
mail, I was under the impression that something in the reproducer showed that
the state would be stuck. I searched rather hard in the code but couldn't find
the reason for this. Only much later, I decided to ignore all this and look
what the reproducer is actually doing. And also ignore commit 9e6b5648bbc4
("batman-adv: Fix duplicated OGMs on NETDEV_UP") - because it was impossible
for me to reproduce it on this commit.
And regarding the functions/operations which "impossible to work properly":
called functions must "work properly" independent of the state. Just what
they are doing as work can be different depending on the state. But maybe this
is a case of "glass is half full" vs "glass is half empty".
The problem is therefore that some function broke this "promise". Your second
mail (and the patch) was then basically saying "BATADV_IF_TO_BE_ACTIVATED" must
not exist and we must directly go to BATADV_IF_ACTIVE. (Even if this is in my
opinion not the right statement) it never said why it must not exist and what
broke because of "BATADV_IF_TO_BE_ACTIVATED".
The inline calltraces with detailed statements in subclauses make it
harder to digest. Some small high level statements like
"I don't know exactly what the underlying problem is but skipping
BATADV_IF_TO_BE_ACTIVATED in batadv_hardif_activate_interface() seems to work
around the problem. I suspect that some function is not handling
BATADV_IF_TO_BE_ACTIVATED correctly. Maybe some kind of race condition between
switching to BATADV_IF_ACTIVE and executing some specific code. Here are my
detailed notes:"
would have helped me not to get stuck too long in the interpretation of
paragraphs. But at the same time, would have given a lot of pointers in the
right direction. But maybe I would have been stuck anyway - no idea.
Anyway, I hope we found the problem now and thanks for the help.
Regards,
Sven
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists