netdev - Re: hitting lockdep warning as of too early VF probe with 3.9-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201303101728.50883.jackm@dev.mellanox.co.il>
Date:	Sun, 10 Mar 2013 17:28:50 +0200
From:	Jack Morgenstein <jackm@....mellanox.co.il>
To:	Ming Lei <ming.lei@...onical.com>
Cc:	Or Gerlitz <or.gerlitz@...il.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	David Miller <davem@...emloft.net>,
	Roland Dreier <roland@...nel.org>,
	netdev <netdev@...r.kernel.org>, Yan Burman <yanb@...lanox.com>,
	Liran Liss <liranl@...lanox.com>
Subject: Re: hitting lockdep warning as of too early VF probe with 3.9-rc1

Hello, Ming, Greg, Roland, Dave, all...

>From a quick scan of ethernet drivers in Dave Miller's net-next git, I
notice that the following drivers (apart from the Mellanox mlx4 driver)
enable SRIOV during the PF probe:
  cisco enic (function "enic_probe")
  neterion vxge driver(function "vxge_probe")
  Solarflare efx driver (function "efx_pci_probe", which invokes "efx_sriov_init")
  emulex driver (function "be_probe" --> be_setup --> be_vf_setup)

It would seem that these drivers are susceptible to the nested probe/deadlock
race condition as well.

I believe that it is healthiest for everyone if the probe code in the kernel itself
would avoid such nested probe calls (rather than forcing vendors to deal
with this issue).  The kernel code is certainly aware
(or could easily track) that it is invoking the a driver's probe function
while that same probe function has already been invoked and has not yet returned!

-Jack

On Thursday 07 March 2013 04:03, Ming Lei wrote:
> On Thu, Mar 7, 2013 at 4:54 AM, Or Gerlitz <or.gerlitz@...il.com> wrote:
> > On Wed, Mar 6, 2013 at 4:43 AM, Ming Lei <ming.lei@...onical.com> wrote:
> >> You are adding one new PCI device inside another PCI device's probe(),
> >> so the new device will be probed, since PCI probe() is scheduled by
> >> work_on_cpu, then cause flush_work() called inside worker function,
> >> which might be a real deadlock.
> >
> > So if I understand correct, you recommend to somehow avoid this nested probing?
> 
> Yes, you might need to avoid the nested probing in your driver.
> 
> >
> >> I am wondering why this commit can cause the problem, since the PCI
> >> device will be probed with its driver if there is one driver for it. There is no
> >> any limit on when the driver should be loaded into system, either before
> >> device is added or after.
> >
> > FWIW to undertstanding the issue - the same driver (mlx4_core) is used
> > by the PF and VF, so the VF driver is already loaded at the time its
> > been added as new PCI device.
> >
> >> From driver core view, looks no wrong things are found.
> >
> > So this got me confused, you pointed on possible deadlock, are you
> > saying the deadlock wouldn't be the result of how the driver code is
> > going nor the commited we bisected?
> 
> My commit only affects the driver loading path, but your warning
> is hit in driver probe path triggered by device addition, so the lockdep
> warning should still be triggered without my commit since the two paths
> are totally independent, right?
> 
> Thanks,
> --
> Ming Lei
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html