lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <ZZU3OaybyLfrAa/0@linux.intel.com> Date: Wed, 3 Jan 2024 11:30:17 +0100 From: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com> To: Jakub Kicinski <kuba@...nel.org> Cc: Johannes Berg <johannes@...solutions.net>, netdev@...r.kernel.org, Heiner Kallweit <hkallweit1@...il.com>, Johannes Berg <johannes.berg@...el.com>, Marc MERLIN <marc@...lins.org>, Przemek Kitszel <przemyslaw.kitszel@...el.com> Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL On Wed, Dec 06, 2023 at 08:44:48AM -0800, Jakub Kicinski wrote: > On Wed, 6 Dec 2023 11:39:32 +0100 Johannes Berg wrote: > > As reported by Marc MERLIN, at least one driver (igc) wants or > > needs to acquire the RTNL inside suspend/resume ops, which can > > be called from here in ethtool if runtime PM is enabled. > > > > Allow this by doing runtime PM transitions without the RTNL > > held. For the ioctl to have the same operations order, this > > required reworking the code to separately check validity and > > do the operation. For the netlink code, this now has to do > > the runtime_pm_put a bit later. > > I was really, really hoping that this would serve as a motivation > for Intel to sort out the igb/igc implementation. The flow AFAICT > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC > back down immediately (!?) then it schedules a link check from a work It's not like that. pm_runtime_put() in igc_open() does not disable device. It calls runtime_idle callback which check if there is link and if is not, schedule device suspend in 5 second, otherwise device stays running. Work watchdog_task runs periodically and also check for link changes. > queue, which opens it again (!?). It's a source of never ending bugs. Maybe there are issues there and igc pm runtime implementation needs improvements, with lockings or otherwise. Some folks are looking at this. But I think for this particular deadlock problem reverting of below commits should be considered: bd869245a3dc net: core: try to runtime-resume detached device in __dev_open f32a21376573 ethtool: runtime-resume netdev parent before ethtool ioctl ops First, the deadlock should be addressed also in older kernels and refactoring is not really backportable fix. Second, I don't think network stack should do any calls to pm_runtime* . This should be fully device driver specific, as this depends on device driver implementation of power saving. IMHO if it's desirable to resume disabled device on requests from user space, then netif_device_detach() should not be used in runtime suspend. Thoughts ? Regards Stanislaw
Powered by blists - more mailing lists