[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DB8PR04MB6795F537BABC94F5893FC271E6F29@DB8PR04MB6795.eurprd04.prod.outlook.com>
Date: Thu, 5 Aug 2021 11:11:31 +0000
From: Joakim Zhang <qiangqing.zhang@....com>
To: Grygorii Strashko <grygorii.strashko@...com>,
Heiner Kallweit <hkallweit1@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
David Miller <davem@...emloft.net>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Linux PM list <linux-pm@...r.kernel.org>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>
Subject: RE: [PATCH net-next 1/4] ethtool: runtime-resume netdev parent before
ethtool ioctl ops
> -----Original Message-----
> From: Grygorii Strashko <grygorii.strashko@...com>
> Sent: 2021年8月5日 16:21
> To: Heiner Kallweit <hkallweit1@...il.com>; Jakub Kicinski
> <kuba@...nel.org>; David Miller <davem@...emloft.net>
> Cc: netdev@...r.kernel.org; Linux PM list <linux-pm@...r.kernel.org>;
> Andrew Lunn <andrew@...n.ch>; Florian Fainelli <f.fainelli@...il.com>
> Subject: Re: [PATCH net-next 1/4] ethtool: runtime-resume netdev parent
> before ethtool ioctl ops
>
>
>
> On 04/08/2021 22:33, Heiner Kallweit wrote:
> > On 04.08.2021 10:43, Grygorii Strashko wrote:
> >>
> >>
> >> On 04/08/2021 00:32, Heiner Kallweit wrote:
> >>> On 03.08.2021 22:41, Grygorii Strashko wrote:
> >>>>
> >>>>
> >>>> On 01/08/2021 13:36, Heiner Kallweit wrote:
> >>>>> If a network device is runtime-suspended then:
> >>>>> - network device may be flagged as detached and all ethtool ops
> >>>>> (even if not
> >>>>> accessing the device) will fail because
> >>>>> netif_device_present() returns
> >>>>> false
> >>>>> - ethtool ops may fail because device is not accessible (e.g.
> >>>>> because being
> >>>>> in D3 in case of a PCI device)
> >>>>>
> >>>>> It may not be desirable that userspace can't use even simple
> >>>>> ethtool ops that not access the device if interface or link is
> >>>>> down. To be more friendly to userspace let's ensure that device is
> >>>>> runtime-resumed when executing the respective ethtool op in kernel.
> >>>>>
> >>>>> Signed-off-by: Heiner Kallweit <hkallweit1@...il.com>
> >>>>> ---
> >>>>> net/ethtool/ioctl.c | 18 +++++++++++++++---
> >>>>> 1 file changed, 15 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index
> >>>>> baa5d1004..b7ff9abe7 100644
> >>>>> --- a/net/ethtool/ioctl.c
> >>>>> +++ b/net/ethtool/ioctl.c
> >>>>> @@ -23,6 +23,7 @@
> >>>>> #include <linux/rtnetlink.h>
> >>>>> #include <linux/sched/signal.h>
> >>>>> #include <linux/net.h>
> >>>>> +#include <linux/pm_runtime.h>
> >>>>> #include <net/devlink.h>
> >>>>> #include <net/xdp_sock_drv.h>
> >>>>> #include <net/flow_offload.h>
> >>>>> @@ -2589,7 +2590,7 @@ int dev_ethtool(struct net *net, struct
> >>>>> ifreq *ifr)
> >>>>> int rc;
> >>>>> netdev_features_t old_features;
> >>>>> - if (!dev || !netif_device_present(dev))
> >>>>> + if (!dev)
> >>>>> return -ENODEV;
> >>>>> if (copy_from_user(ðcmd, useraddr, sizeof(ethcmd)))
> >>>>> @@ -2645,10 +2646,18 @@ int dev_ethtool(struct net *net, struct
> >>>>> ifreq *ifr)
> >>>>> return -EPERM;
> >>>>> }
> >>>>> + if (dev->dev.parent)
> >>>>> + pm_runtime_get_sync(dev->dev.parent);
> >>>>
> >>>> the PM Runtime should allow to wake up parent when child is resumed if
> everything is configured properly.
> >>>>
> >>> Not sure if there's any case yet where the netdev-embedded device is
> power-managed.
> >>> Typically only the parent (e.g. a PCI device) is.
> >>>
> >>>> rpm_resume()
> >>>> ...
> >>>> if (!parent && dev->parent) {
> >>>> --> here
> >>>>
> >>> Currently we don't get that far because we will bail out here already:
> >>>
> >>> else if (dev->power.disable_depth > 0)
> >>> retval = -EACCES;
> >>>
> >>> If netdev-embedded device isn't power-managed then disable_depth is 1.
> >>
> >> Right. But if pm_runtime_enable() is added for ndev->dev then PM
> >> runtime will start working for it and should handle parent properly -
> >> from my experience, every time any code need manipulate with "parent" or
> smth. else to make PM runtime working it means smth. is wrong.
> >>
> >> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index
> >> f6197774048b..33b72b788aa2 100644
> >> --- a/net/core/net-sysfs.c
> >> +++ b/net/core/net-sysfs.c
> >> @@ -1963,6 +1963,7 @@ int netdev_register_kobject(struct net_device
> >> *ndev)
> >> }
> >>
> >> pm_runtime_set_memalloc_noio(dev, true);
> >> + pm_runtime_enable(dev);
> >>
> >> return error;
> >> }
> >>
> >>
> >>>
> >>>> So, hence PM runtime calls are moved to from drivers to net_core
> >>>> wouldn't be more correct approach to enable PM runtime for netdev->dev
> and lets PM runtime do the job?
> >>>>
> >>> Where would netdev->dev be runtime-resumed so that
> netif_device_present() passes?
> >>
> >> That's the biggest issues here. Some driver uses
> >> netif_device_detach() in PM runtime and, this way, introduces custom
> dependency between Core device PM (runtime) sate and Net core, other driver
> does not do.
> >> Does it means every driver with PM runtime now have to be updated to
> indicate it PM state to Net core with netif_device_detach()?
> >
> > No, that's not needed.
> >
> >> Why? Why return value from pm_runtime_get calls is not enough?
> >>
> >> Believe me it's terrible idea to introduce custom PM state dependency
> >> between PM runtime and Net core, for example it took years to sync
> properly System wide suspend and PM runtime which are separate framworks.
> >>
> >> By the way netif_device_detach() during System Wide suspend is looks
> >> perfectly valid, because entering System wide Suspend should prohibit
> >> any access to netdev at some stage. And that's what 99% of network
> >> drivers are doing (actually I can find only ./realtek/r8169_main.c
> >> which abuse netif_device_detach() function and, I assume, it is your
> >> case)
> >>
> > Actually I was inspired by the Intel drivers, see e.g.
> > __igc_shutdown(). They also detach the netdevice on runtime suspend.
> > One reason is that several core functions check for device presence
> > before e.g. calling a ndo callback. Example: dev_set_mtu_ext()
>
> right and also:
> - netlink - which you've hacked already
> - 8021q:
> vlan_dev_ioctl/vlan_dev_neigh_setup/vlan_add_rx_filter_info/vlan_kill_rx_filte
> r_info
Yes, there are many place need to do such check. I always face a problem that where I need to
runtime-resume the device, is there any suggestion? I always add it when an issue came out.
What confuse me it that, is there any document describe that which .ndo callback should be called
with interface up, instead .ndo callback _CAN_ be called with interface down? I think this can help
us decide when we need runtime-resume device.
After leaning all your discussion, from my point of view, it seems not a good choice to add RPM
to net core. It had better handled by driver itself.
>
> > Same applies for __dev_set_rx_mode(). Therefore I wondered whether
> > cpsw_ndo_set_rx_mode()
> > - that does not include runtime-resuming the device - may be called
> > when device is runtime-suspended, e.g. if interface is up, but link is down.
>
> CPSW doesn't manage PM runtime in link status handler, as it has only on/off
> state and off state can cause full context loss restore of which is expensive and
> hard to implement. And for most of netdev drivers no aggressive PM runtime is
> implemented exactly because of that (mac/vlan/fdb/mdb/...). Common
> patterns:
>
> (a)
> .probe
> -get
> .remove
> -put
>
> (b)
> .probe
> -get
> -put
> .open
> -get
> .close
> -put
> .protect places which may be called when netif is down
>
> The CPSW follows (b) and so cpsw_ndo_set_rx_mode() can't be called when
> when device is runtime-suspended.
>
> I assume, some hw like PCI, can have more PM states and in some of them
> keep HW context intact.
>
>
> >
> >>> Wouldn't we then need RPM ops for the parent (e.g. PCI) and for
> netdev->dev?
> >>
> >> No. as I know - netdev->dev can be declared as
> >> pm_runtime_no_callbacks(&adap->dev);
> >> I2C adapter might be a good example to check.
> >>
> >>> E.g. the parent runtime-resume can be triggered by a PCI PME, then
> >>> it would have to resume netdev->dev.
> >>>
> >>>> But, to be honest, I'm not sure adding PM runtime manipulation to
> >>>> the net core is a good idea -
> >>>
> >>> The TI CPSW driver runtime-resumes the device in begin ethtool op
> >>> and suspends it in complete. This pattern is used in more than one
> >>> driver and may be worth being moved to the core.
> >>
> >> I'm not against code refactoring and optimization, but in my opinion
> >> it has to be done right from the beginning or not done at all.
> >>
> >>>
> >>>> at minimum it might be tricky and required very careful approach
> (especially in err path).
> >>>> For example, even in this patch you do not check return value of
> >>>> pm_runtime_get_sync() and in commit bd869245a3dc ("net: core: try to
> runtime-resume detached device in __dev_open") also actualy.
> >>>
> >>> The pm_runtime_get_sync() calls are attempts here. We don't want to
> >>> bail out if a device doesn't support RPM.
> >>
> >> And if 'parent' is not supporting PM runtime - it, as i see, should be handled
> by PM runtime core properly.
> >>
> >> I agree that checking the return code could make sense, but then we
> >> would
> >>> have to be careful which error codes we consider as failed.
> >>
> >> huh. you can't 'try' pm_runtime_get_sync() and then align on
> >> netif_device_present() :(
> >>
> >> might be, some how, it will work for r8169_main, but will not work for
> others.
> >> - no checking pm_runtime_get_sync() err code will cause PM runtime
> >> 'usage_count' leak
> >
> > No. pm_runtime_get_sync() always bumps the usage count, no matter
> whether it fails or not.
>
>
> > This makes it easy to deal with this. The problem you describe exists
> > with pm_runtime_resume_and_get(). That's why I wondered whether we
> > should annotate this function as __must_check. See here:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Flinux-pm%2FCAJZ5v0gps0C2923VqM8876npvhcETsyN%2BajAk
> BKX5k
> >
> f49J0%2BMg%40mail.gmail.com%2FT%2F%23t&data=04%7C01%7Cqiang
> qing.zh
> >
> ang%40nxp.com%7C392a231908da48658f4108d957e9f47c%7C686ea1d3bc2b
> 4c6fa92
> >
> cd99c5c301635%7C0%7C0%7C637637484629499490%7CUnknown%7CTWFpb
> GZsb3d8eyJ
> >
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> C1000
> >
> &sdata=lbHf9xDmV7w7QkolIsc%2FdmN%2FPKcOCkb2quaMa1JWI2g%3D
> &rese
> > rved=0
> >
> >> - no checking pm_runtime_get_sync() err may cause to continue( for TI
> >> CPSW for example) with
> >> device in undefined PM state ("disabled" or "half-enabled") and so crash
> later.
> >>
> > I'd say 95% of rpm callers don't check the return value. I'm not
> > saying this is a good thing, but obviously it doesn't cause relevant harm.
>
> this is completely wrong assumption as PM errors cause silent stuck, undefined
> behavior or dumps (sometimes delayed) which is terribly hard to root cause.
>
> yes. many drivers do not check, but over last few years more and more strict
> policies applied to avoid that and in many case no checking return code - is red
> flag and patch reject.
> Don't like that phrase ;), but "It doesn't mean that incorrect code has to be
> copy-pasted all over the places"
>
> this is correct get pattern for get:
> ret = pm_runtime_get_sync(&pdev->dev);
> if (ret < 0) {
> pm_runtime_put_noidle(&pdev->dev);
> return ret;
> }
Yes, it’s the correct usage for pm_runtime_get_sync.
Best Regards,
Joakim Zhang
> My strong opinion
> - PM runtime return code must be checked.
> - get rid of netif_device_detach() in r8169
>
> by the way, have you tried below test with your driver (not sure how it works
> for you):
>
> .rtl_open
> - pm_runtime_get_sync
> - pm_runtime_put_sync - usage_count == 0 .r8169_phylink_handler
> - pm_request_resume - why async? still usage_count == 0 .some ethtool
> request to go through dev_ethtool()
> - pm_runtime_get_sync
> - pm_runtime_put - async, usage_count == 0
> ^ would not it put r8169 in runtime-suspended state while link is still UP?
>
>
> >
> >>
> >>
> >>>
> >>>>
> >>>>
> >>>> The TI CPSW driver may also be placed in non reachable state when
> >>>> netdev is closed (and even lose context), but we do not use
> >>>> netif_device_detach() (so netdev is accessible through
> >>>> netdev_ops/ethtool_ops), but instead wake up device by runtime PM for
> allowed operations or just save requested configuration which is applied at
> netdev->open() time then.
> >>>> I feel that using netif_device_detach() in PM runtime sounds like a
> >>>> too heavy approach ;)
> >>>>
> >>> That's not a rare pattern when suspending or runtime-suspending to
> >>> prevent different types of access to a not accessible device. But yes, it's
> relatively big hammer ..
> >>
> >> Again, netif_device_detach() seems correct for System wide suspend,
> >> but in my opinion - it's not correct for PM runtime.
> >>
> >> Sry, with all do respect, first corresponding driver has to be fixed and not
> Net core hacked to support it.
> >>
> >> Further decisions is up to maintainers.
> >>
> >>
> >>>
> >>>> huh, see it's merged already, so...
> >>>>
> >>>>> +
> >>>>> + if (!netif_device_present(dev)) {
> >>>>> + rc = -ENODEV;
> >>>>> + goto out;
> >>>>> + }
> >>>>> +
> >>>>> if (dev->ethtool_ops->begin) {
> >>>>> rc = dev->ethtool_ops->begin(dev);
> >>>>> - if (rc < 0)
> >>>>> - return rc;
> >>>>> + if (rc < 0)
> >>>>> + goto out;
> >>>>> }
> >>>>> old_features = dev->features;
> >>>>> @@ -2867,6 +2876,9 @@ int dev_ethtool(struct net *net, struct
> >>>>> ifreq *ifr)
> >>>>> if (old_features != dev->features)
> >>>>> netdev_features_change(dev);
> >>>>> +out:
> >>>>> + if (dev->dev.parent)
> >>>>> + pm_runtime_put(dev->dev.parent);
> >>>>> return rc;
> >>>>> }
> >>>>>
> >>>>
> >>>
> >>
> >
>
> --
> Best regards,
> grygorii
Powered by blists - more mailing lists