lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZZZrbUPUCTtDcUFU@linux.intel.com>
Date: Thu, 4 Jan 2024 09:25:17 +0100
From: Stanislaw Gruszka <stanislaw.gruszka@...ux.intel.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Johannes Berg <johannes@...solutions.net>, netdev@...r.kernel.org,
	Heiner Kallweit <hkallweit1@...il.com>,
	Johannes Berg <johannes.berg@...el.com>,
	Marc MERLIN <marc@...lins.org>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>
Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL

On Wed, Jan 03, 2024 at 03:34:05PM -0800, Jakub Kicinski wrote:
> On Wed, 3 Jan 2024 11:30:17 +0100 Stanislaw Gruszka wrote:
> > > I was really, really hoping that this would serve as a motivation
> > > for Intel to sort out the igb/igc implementation. The flow AFAICT
> > > is ndo_open() starts the NIC, the calls pm_sus, which shuts the NIC
> > > back down immediately (!?) then it schedules a link check from a work  
> > 
> > It's not like that. pm_runtime_put() in igc_open() does not disable device.
> > It calls runtime_idle callback which check if there is link and if is
> > not, schedule device suspend in 5 second, otherwise device stays running.
> 
> Hm, I missed the 5 sec delay there. Next question for me is - how does
> it not deadlock in the open?
> 
> igc_open()
>   __igc_open(resuming=false)
>     if (!resuming)
>       pm_runtime_get_sync(&pdev->dev);
> 
> igc_resume()
>   rtnl_lock()

If device was not suspended, pm_runtime_get_sync() will increase
dev->power.usage_count counter and cancel pending rpm suspend
request if any. There is race condition though, more about that
below.

If device was suspended, we could not get to igc_open() since it
was marked as detached and fail netif_device_present() check in
__dev_open(). That was the behaviour before bd869245a3dc.

There is small race window between with igc_open() and scheduled
runtime suspend, if at the same time dev_open() is done and
dev->power.suspend_timer expire:

open:					pm_suspend_timer_fh:

rtnl_lock()
					rpm_suspend()
					  igc_runtime_suspend()
					   __igc_shutdown()
					     rtnl_lock()

__igc_open()
  pm_runtime_get_sync():
    waits for rpm suspend callback done

This needs to be addressed, but it's not that this can happen
all the time. To trigger this someone has to remove the
cable and exactly after 5 seconds do ip link set up. 

Regards
Stanislaw

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ