[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZZrbUPUCTtDcUFU@linux.intel.com>
Date: Thu, 4 Jan 2024 09:25:17 +0100
From: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Johannes Berg <johannes@...solutions.net>, netdev@...r.kernel.org,
Heiner Kallweit <hkallweit1@...il.com>,
Johannes Berg <johannes.berg@...el.com>,
Marc MERLIN <marc@...lins.org>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL
On Wed, Jan 03, 2024 at 03:34:05PM -0800, Jakub Kicinski wrote:
> On Wed, 3 Jan 2024 11:30:17 +0100 Stanislaw Gruszka wrote:
> > > I was really, really hoping that this would serve as a motivation
> > > for Intel to sort out the igb/igc implementation. The flow AFAICT
> > > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC
> > > back down immediately (!?) then it schedules a link check from a work
> >
> > It's not like that. pm_runtime_put() in igc_open() does not disable device.
> > It calls runtime_idle callback which check if there is link and if is
> > not, schedule device suspend in 5 second, otherwise device stays running.
>
> Hm, I missed the 5 sec delay there. Next question for me is - how does
> it not deadlock in the open?
>
> igc_open()
> __igc_open(resuming=false)
> if (!resuming)
> pm_runtime_get_sync(&pdev->dev);
>
> igc_resume()
> rtnl_lock()
If device was not suspended, pm_runtime_get_sync() will increase
dev->power.usage_count counter and cancel pending rpm suspend
request if any. There is race condition though, more about that
below.
If device was suspended, we could not get to igc_open() since it
was marked as detached and fail netif_device_present() check in
__dev_open(). That was the behaviour before bd869245a3dc.
There is small race window between with igc_open() and scheduled
runtime suspend, if at the same time dev_open() is done and
dev->power.suspend_timer expire:
open: pm_suspend_timer_fh:
rtnl_lock()
rpm_suspend()
igc_runtime_suspend()
__igc_shutdown()
rtnl_lock()
__igc_open()
pm_runtime_get_sync():
waits for rpm suspend callback done
This needs to be addressed, but it's not that this can happen
all the time. To trigger this someone has to remove the
cable and exactly after 5 seconds do ip link set up.
Regards
Stanislaw
Powered by blists - more mailing lists