netdev - Re: [PATCH] Revert "net: linkwatch: add check for netdevice being present to linkwatch_do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <14f41724-ce45-c2c0-a49c-1e379dba0cb5@gmail.com>
Date:   Wed, 23 Sep 2020 21:58:59 +0200
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Saeed Mahameed <saeed@...nel.org>,
        Geert Uytterhoeven <geert+renesas@...der.be>,
        Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>
Cc:     Gaku Inami <gaku.inami.xh@...esas.com>,
        Yoshihiro Shimoda <yoshihiro.shimoda.uh@...esas.com>,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Revert "net: linkwatch: add check for netdevice being
 present to linkwatch_do_dev"

On 23.09.2020 20:35, Saeed Mahameed wrote:
> On Wed, 2020-09-23 at 13:49 +0200, Heiner Kallweit wrote:
>> On 18.09.2020 19:58, Saeed Mahameed wrote:
>>> On Tue, 2020-09-01 at 17:02 +0200, Geert Uytterhoeven wrote:
>>>> This reverts commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c.
>>>>
>>>> Inami-san reported that this commit breaks bridge support in a
>>>> Xen
>>>> environment, and that reverting it fixes this.
>>>>
>>>> During system resume, bridge ports are no longer enabled, as that
>>>> relies
>>>> on the receipt of the NETDEV_CHANGE notification.  This
>>>> notification
>>>> is
>>>> not sent, as netdev_state_change() is no longer called.
>>>>
>>>> Note that the condition this commit intended to fix never existed
>>>> upstream, as the patch triggering it and referenced in the commit
>>>> was
>>>> never applied upstream.  Hence I can confirm s2ram on
>>>> r8a73a4/ape6evm
>>>> and sh73a0/kzm9g works fine before/after this revert.
>>>>
>>>> Reported-by Gaku Inami <gaku.inami.xh@...esas.com>
>>>> Signed-off-by: Geert Uytterhoeven <geert+renesas@...der.be>
>>>> ---
>>>>  net/core/link_watch.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>>>> index 75431ca9300fb9c4..c24574493ecf95e6 100644
>>>> --- a/net/core/link_watch.c
>>>> +++ b/net/core/link_watch.c
>>>> @@ -158,7 +158,7 @@ static void linkwatch_do_dev(struct
>>>> net_device
>>>> *dev)
>>>>  	clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
>>>>  
>>>>  	rfc2863_policy(dev);
>>>> -	if (dev->flags & IFF_UP && netif_device_present(dev)) {
>>>> +	if (dev->flags & IFF_UP) {
>>>
>>> So with your issue the devices is both IFF_UP and !present ? how so
>>> ?
>>> I think you should look into that.
>>>
>>> I am ok with removing the "dev present" check from here just
>>> because we
>>> shouldn't  be expecting IFF_UP && !present .. such thing must be a
>>> bug
>>> somewhere else.
>>>
>>>>  		if (netif_carrier_ok(dev))
>>>>  			dev_activate(dev);
>>>>  		else
>>
>> In __dev_close_many() we call ndo_stop() whilst IFF_UP is still set.
>> ndo_stop() may detach the device and bring down the PHY, resulting in
>> an
> 
> Why would a driver detach the device on ndo_stop() ?
> seems like this is the bug you need to be chasing ..
> which driver is doing this ? 
> 
Some drivers set the device to PCI D3hot at the end of ndo_stop()
to save power (using e.g. Runtime PM). Marking the device as detached
makes clear to to the net core that the device isn't accessible any
longer.

>> async link change event that calls dev_get_stats(). The latter call
>> may
>> have a problem if the device is detached. In a first place I'd
>> consider
>> such a case a network driver bug (ndo_get_stats/64 should check for
>> device presence if depending on it).
> 
> Device drivers should avoid presence check as much as possible
> especially in ndo, this check must be performed by the stack.
> 
That's a question I also stumbled across. For the ethtool ops
dev_ethtool() checks whether device is present.
But for ndo that's not always the case, e.g. dev_get_stats()
doesn't check for device presence before calling ndo_get_stats()
or ndo_get_stats64().
To a certain extent I can understand this behavior, because drivers
may just use internal data structures in ndo ops instead of accessing
the device.

>> The additional check in linkwatch_do_dev() was meant to protect from
>> such
>> driver issues.
>