lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZU3OaybyLfrAa/0@linux.intel.com>
Date: Wed, 3 Jan 2024 11:30:17 +0100
From: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Johannes Berg <johannes@...solutions.net>, netdev@...r.kernel.org,
	Heiner Kallweit <hkallweit1@...il.com>,
	Johannes Berg <johannes.berg@...el.com>,
	Marc MERLIN <marc@...lins.org>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL

On Wed, Dec 06, 2023 at 08:44:48AM -0800, Jakub Kicinski wrote:
> On Wed,  6 Dec 2023 11:39:32 +0100 Johannes Berg wrote:
> > As reported by Marc MERLIN, at least one driver (igc) wants or
> > needs to acquire the RTNL inside suspend/resume ops, which can
> > be called from here in ethtool if runtime PM is enabled.
> > 
> > Allow this by doing runtime PM transitions without the RTNL
> > held. For the ioctl to have the same operations order, this
> > required reworking the code to separately check validity and
> > do the operation. For the netlink code, this now has to do
> > the runtime_pm_put a bit later.
> 
> I was really, really hoping that this would serve as a motivation
> for Intel to sort out the igb/igc implementation. The flow AFAICT
> is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC
> back down immediately (!?) then it schedules a link check from a work

It's not like that. pm_runtime_put() in igc_open() does not disable device.
It calls runtime_idle callback which check if there is link and if is
not, schedule device suspend in 5 second, otherwise device stays running.

Work watchdog_task runs periodically and also check for link changes.

> queue, which opens it again (!?). It's a source of never ending bugs.

Maybe there are issues there and igc pm runtime implementation needs
improvements, with lockings or otherwise. Some folks are looking at this. 
But I think for this particular deadlock problem reverting of below commits
should be considered:

bd869245a3dc net: core: try to runtime-resume detached device in __dev_open
f32a21376573 ethtool: runtime-resume netdev parent before ethtool ioctl ops

First, the deadlock should be addressed also in older kernels and
refactoring is not really backportable fix.

Second, I don't think network stack should do any calls to pm_runtime* .
This should be fully device driver specific, as this depends on device
driver implementation of power saving. IMHO if it's desirable to 
resume disabled device on requests from user space, then
netif_device_detach() should not be used in runtime suspend.

Thoughts ?

Regards
Stanislaw 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ