lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240105073001.15f2f3cb@kernel.org>
Date: Fri, 5 Jan 2024 07:30:01 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
Cc: Heiner Kallweit <hkallweit1@...il.com>, Johannes Berg
 <johannes@...solutions.net>, netdev@...r.kernel.org, Johannes Berg
 <johannes.berg@...el.com>, Marc MERLIN <marc@...lins.org>, Przemek Kitszel
 <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL

On Fri, 5 Jan 2024 12:53:42 +0100 Stanislaw Gruszka wrote:
> On Thu, Jan 04, 2024 at 08:16:56AM -0800, Jakub Kicinski wrote:
> > __dev_open() tries to resume as well, and is also under rtnl_lock.  
> 
> This one is plain 100% deadlock for igc (and igb before ac8c58f5b535)
> I'm opting for remove those rpm calls from __dev_open() and ethtool.

I don't know what gets powered down, exactly, in this device,
so I can't give you a concrete example. But usually there's
at least one ndo / ethtool callback which needs to resume
the device (and already holds rtnl_lock). Taking rtnl_lock
on the resume path is fundamentally broken. Removing the
rpm calls from the core is just going to lead to a whack-a-mole
of bugs in the drivers themselves.

IOW I look at the RPM calls in the core as a canary for people
doing the wrong thing :(

> > So that resume call somehow must never happen or users would see
> > -ENODEV? Sorry for the basic questions, the flow is confusing :S  
> 
> If we talk about situation before rpm calls were added to net core
> (i.e. < 5.9) there was open/ethtool -ENODEV error when igc/igb
> was runtime suspend due to netif_device_present() check fail.
> 
> That was by design, what for open the device and loose
> energy if there is no cable and device can not be used anyway ?

I think "link" means actual link up here, no? As opposed to no cable
plugged in. If I understand that right - the device would have to train
the link in DOWN state in order for the device to be opened?
That would be quite wasteful in terms of power.

Regardless, returning -ENODEV is really not how netdevs should behave.
That's what carrier reporting is for! :(

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ