[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZU3OaybyLfrAa/0@linux.intel.com>
Date: Wed, 3 Jan 2024 11:30:17 +0100
From: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Johannes Berg <johannes@...solutions.net>, netdev@...r.kernel.org,
Heiner Kallweit <hkallweit1@...il.com>,
Johannes Berg <johannes.berg@...el.com>,
Marc MERLIN <marc@...lins.org>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL
On Wed, Dec 06, 2023 at 08:44:48AM -0800, Jakub Kicinski wrote:
> On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote:
> > As reported by Marc MERLIN, at least one driver (igc) wants or
> > needs to acquire the RTNL inside suspend/resume ops, which can
> > be called from here in ethtool if runtime PM is enabled.
> >
> > Allow this by doing runtime PM transitions without the RTNL
> > held. For the ioctl to have the same operations order, this
> > required reworking the code to separately check validity and
> > do the operation. For the netlink code, this now has to do
> > the runtime_pm_put a bit later.
>
> I was really, really hoping that this would serve as a motivation
> for Intel to sort out the igb/igc implementation. The flow AFAICT
> is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC
> back down immediately (!?) then it schedules a link check from a work
It's not like that. pm_runtime_put() in igc_open() does not disable device.
It calls runtime_idle callback which check if there is link and if is
not, schedule device suspend in 5 second, otherwise device stays running.
Work watchdog_task runs periodically and also check for link changes.
> queue, which opens it again (!?). It's a source of never ending bugs.
Maybe there are issues there and igc pm runtime implementation needs
improvements, with lockings or otherwise. Some folks are looking at this.
But I think for this particular deadlock problem reverting of below commits
should be considered:
bd869245a3dc net: core: try to runtime-resume detached device in __dev_open
f32a21376573 ethtool: runtime-resume netdev parent before ethtool ioctl ops
First, the deadlock should be addressed also in older kernels and
refactoring is not really backportable fix.
Second, I don't think network stack should do any calls to pm_runtime* .
This should be fully device driver specific, as this depends on device
driver implementation of power saving. IMHO if it's desirable to
resume disabled device on requests from user space, then
netif_device_detach() should not be used in runtime suspend.
Thoughts ?
Regards
Stanislaw
Powered by blists - more mailing lists