[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cf4178e970d2737e7ba866ebc83a7ec30ca8ad1.camel@kernel.org>
Date: Wed, 23 Sep 2020 22:49:37 -0700
From: Saeed Mahameed <saeed@...nel.org>
To: David Miller <davem@...emloft.net>
Cc: hkallweit1@...il.com, geert+renesas@...der.be,
f.fainelli@...il.com, andrew@...n.ch, kuba@...nel.org,
gaku.inami.xh@...esas.com, yoshihiro.shimoda.uh@...esas.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Revert "net: linkwatch: add check for netdevice being
present to linkwatch_do_dev"
On Wed, 2020-09-23 at 17:23 -0700, David Miller wrote:
> From: David Miller <davem@...emloft.net>
> Date: Wed, 23 Sep 2020 17:21:25 -0700 (PDT)
>
> > If an async code path tests 'present', gets true, and then the RTNL
> > holding synchronous code path puts the device into D3hot
> immediately
> > afterwards, the async code path will still continue and access the
> > chips registers and fault.
>
> Wait, is the sequence:
>
> ->ndo_stop()
> mark device not present and put into D3hot
> triggers linkwatch event
> ...
> ->ndo_get_stats64()
>
> ???
>
I assume it is, since normally device drivers do carrier_off() on
ndo_stop()
1) One problematic sequence would be
(for drivers doing D3hot on ndo_stop())
__dev_close_many()
->ndo_stop()
netif_device_detach() //Mark !present;
... D3hot
carrier_off()->linkwatch_event()
... // !present && IFF_UP
2) Another problematic scenario which i see is repeated in many
drivers:
shutdown/suspend()
rtnl_lock()
netif_device_detach()//Mark !present;
stop()->carrier_off()->linkwatch_event()
// at this point device is still IFF_UP and !present
// due to the early detach above..
rtnl_unlock();
For scenario 1) we can fix by marking IFF_UP at the beginning, but for
2), i think we need to fix the drivers to detach only after stop :(
> Then yeah we might have to clear IFF_UP at the beginning of taking
> a netdev down.
Powered by blists - more mailing lists