[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZguXLO3DAX/2Y0/@linux.intel.com>
Date: Fri, 5 Jan 2024 17:29:16 +0100
From: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Heiner Kallweit <hkallweit1@...il.com>,
Johannes Berg <johannes@...solutions.net>, netdev@...r.kernel.org,
Johannes Berg <johannes.berg@...el.com>,
Marc MERLIN <marc@...lins.org>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL
On Fri, Jan 05, 2024 at 07:30:01AM -0800, Jakub Kicinski wrote:
> On Fri, 5 Jan 2024 12:53:42 +0100 Stanislaw Gruszka wrote:
> > On Thu, Jan 04, 2024 at 08:16:56AM -0800, Jakub Kicinski wrote:
> > > __dev_open() tries to resume as well, and is also under rtnl_lock.
> >
> > This one is plain 100% deadlock for igc (and igb before ac8c58f5b535)
> > I'm opting for remove those rpm calls from __dev_open() and ethtool.
>
> I don't know what gets powered down, exactly, in this device,
> so I can't give you a concrete example. But usually there's
> at least one ndo / ethtool callback which needs to resume
> the device (and already holds rtnl_lock). Taking rtnl_lock
> on the resume path is fundamentally broken.
I agree with that.
> Removing the
> rpm calls from the core is just going to lead to a whack-a-mole
> of bugs in the drivers themselves.
>
> IOW I look at the RPM calls in the core as a canary for people
> doing the wrong thing :(
Hmm, this one I don't understand, what other bugs could pop up
after reverting bd869245a3dcc and others that added rpm calls
for the net core?
> > > So that resume call somehow must never happen or users would see
> > > -ENODEV? Sorry for the basic questions, the flow is confusing :S
> >
> > If we talk about situation before rpm calls were added to net core
> > (i.e. < 5.9) there was open/ethtool -ENODEV error when igc/igb
> > was runtime suspend due to netif_device_present() check fail.
> >
> > That was by design, what for open the device and loose
> > energy if there is no cable and device can not be used anyway ?
>
> I think "link" means actual link up here, no? As opposed to no cable
> plugged in. If I understand that right - the device would have to train
> the link in DOWN state in order for the device to be opened?
> That would be quite wasteful in terms of power.
I ment no cable plugged. When igc device was runtime suspended, and
user connected the cable, user has to power device up via on > power/control
and then ip link set up.
> Regardless, returning -ENODEV is really not how netdevs should behave.
> That's what carrier reporting is for! :(
Ok, I can agrre with that. But I think this should be achived by not using
netif_device_detach() in rpm suspend, not by
if (!netif_device_present(dev)) {
/* may be detached because parent is runtime-suspended */
if (dev->dev.parent)
pm_runtime_resume(dev->dev.parent);
if (!netif_device_present(dev))
return -ENODEV;
}
Regards
Stanislaw
Powered by blists - more mailing lists