[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a0734a8e-5681-4fd1-8cf0-bcb63a43f897@lunn.ch>
Date: Tue, 19 Sep 2023 14:36:58 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Johannes Berg <johannes@...solutions.net>
Cc: netdev@...r.kernel.org, linux-wireless@...r.kernel.org
Subject: Re: netif_carrier_on() race
> All of this makes sense since you need to hold RTNL for all those state
> changes/notifier chains, but it does lead to the first race/consistency
> problem: if you query at just the right time you can see carrier being
> on, however, if the carrier is actually removed again and the linkwatch
> work didn't run yet, there might never be an event for the carrier on,
> iow, you might have:
>
> netif_carrier_on()
> query from userspace and see carrier on
> netif_carrier_off()
> linkwatch work runs and sends only carrier off event
>
> This is at least a bit confusing, but not really my main problem here,
> though it did in fact happen to me as well, in a fashion.
That is interesting. Copper Ethernet PHYs might have the opposite
problem. The status bit about link is latching low. This means that if
the link is lost and then very quickly comes back again, you always
get to see the lost and then restored link. So maybe we have:
netif_carrier_off()
query from userspace and see carrier off
netif_carrier_oon()
linkwatch work runs and sends only carrier on event
???
Maybe we want linkwatch to keep the old and the new state. If there is
a change reported while there is still work queued, which flips a bit
back to its old state, it needs to block until the work is actually
done?
> Possible solution #2:
> ---------------------
> Another - more feasible - option might be to say OK, so the associated
> event (and a few friends) are the problem, so we can queue those events
> in cfg80211, and only release them on NETDEV_CHANGE notifier call.
> That's from netdev_state_change() after dev_activate() in linkwatch
> work. We'd want to pair it with netif_carrier_on() so we actually expect
> the event to come, and maybe give netif_carrier_on() a return value
> indicating that it scheduled - or else check using carrier_up_count
> perhaps?
Probably not an issue with 802.11, but sometimes drivers do odd things
with the carrier because of the BMC. The BMC can have a side channel
into the hosts NIC, which allows it to share the hosts PHY and RJ45
socket. So that the BMC can send/receive frames while the host NIC is
admin down, the carrier might actually be up. And requests to down the
carrier are ignored.
As i said, probably irrelevant to 802.11, but if you try to make a
generic solution, you might need to watch out for this.
Andrew
Powered by blists - more mailing lists