lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <790f9d0914e2baba66c394f4e21ef118e44d9775.camel@sipsolutions.net>
Date: Tue, 26 Sep 2023 10:07:11 +0200
From: Johannes Berg <johannes@...solutions.net>
To: netdev@...r.kernel.org
Cc: linux-wireless@...r.kernel.org
Subject: Re: netif_carrier_on() race

Focusing on this part for a moment, because it affects not just
wireless:

> Then, in netif_carrier_on(), we immediately set the carrier on bit, so
> that you can actually immediately see this from userspace if you ask
> rtnetlink, however, it's not actually immediately _functional_ - it
> still needs to schedule and run the linkwatch work first, to call
> dev_activate() to change the (TX queue) qdisc(s) away from noop.
> 
> Also, even though you can already query the carrier state and see it on,
> the actual rtnetlink _event_ for this only happens from the linkwatch
> work as well, via netdev_state_change().
> 
> All of this makes sense since you need to hold RTNL for all those state
> changes/notifier chains, but it does lead to the first race/consistency
> problem: if you query at just the right time you can see carrier being
> on, however, if the carrier is actually removed again and the linkwatch
> work didn't run yet, there might never be an event for the carrier on,
> iow, you might have:
> 
>  netif_carrier_on()
>  query from userspace and see carrier on
>  netif_carrier_off()
>  linkwatch work runs and sends only carrier off event

and also because, as Andrew mentioned, you can have the exact opposite
problem...

It can actually happen that something _else_ sends an event, so even if
userspace does't query but waits for a carrier on event, you could end
up with:

* netif_carrier_on()

* something else triggers netdev_state_change(),
  even userspace setting link alias

  netdev_state_change()
   -> sends an rtnetlink event saying carrier is on

* userspace transmits but frames are dropped

* linkwatch work runs and enables qdiscs only now



To address this issue, we could introduce a new state, say
__LINK_STATE_CARRIER_COMPLETE or something like that, which is used when
communicating carrier state to userspace, and only set/cleared in
dev_activate()/dev_deactivate(). That way, any events to userspace (or
userspace querying) wouldn't show the carrier state until it's actually
fully reflected in software (qdiscs) too.

This doesn't fully solve _my_ (wifi) problem, but perhaps lets me work
around it in userspace by querying for the carrier state, if it's
reflected correctly ("fully ready to transmit") then we can do that.
Right now, we can't even do that.

But would it break something else? There's also a way to query it via
ethtool, which perhaps should _not_ be converted to _COMPLETE since you
could argue the ethtool link state is about the physical link, and with
this change the carrier becomes about the logical link in a fashion?


johannes

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ