lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170927214220.41216-1-gvaradar@cisco.com>
Date:   Wed, 27 Sep 2017 14:42:16 -0700
From:   Govindarajulu Varadarajan <gvaradar@...co.com>
To:     benve@...co.com, bhelgaas@...gle.com, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org, jlbec@...lplan.org, hch@....de,
        mingo@...hat.com, peterz@...radead.org
Cc:     Govindarajulu Varadarajan <gvaradar@...co.com>
Subject: [PATCH 0/4] pci aer: fix deadlock in do_recovery

I am seeing a dead lock while loading enic driver with sriov enabled.

CPU0					CPU1
---------------------------------------------------------------------
__driver_attach()
device_lock(&dev->mutex) <--- device mutex lock here
driver_probe_device()
pci_enable_sriov()
pci_iov_add_virtfn()
pci_device_add()
					aer_isr()		<--- pci aer error
					do_recovery()
					broadcast_error_message()
					pci_walk_bus()
					down_read(&pci_bus_sem) <--- rd sem
down_write(&pci_bus_sem) <-- stuck on wr sem
					report_error_detected()
					device_lock(&dev->mutex)<--- DEAD LOCK

This can also happen when aer error occurs while pci_dev->sriov_config() is
called.

Only fix I could think of is to lock &pci_bus_sem and try locking all
device->mutex under that pci_bus. If it fails, unlock all device->mutex
and &pci_bus_sem and try again. This approach seems to be hackish and I
do not have better solution. I would like to open the discussion for
this.

Path 1 and 2 are code refactoring for pci locking api. Patch 3 fixes the
issue.

With current fix, we hold mutex lock of parent device and all the
devices under the bus. This can exceed the size of held_locks in lockdep
if number of devices (VFs) exceed 48. Patch 4 extends this 63, max
supported by lockdep.

Govindarajulu Varadarajan (4):
  pci: introduce __pci_walk_bus for caller with pci_bus_sem held
  pci: code refactor pci_bus_lock/unlock/trylock
  pci aer: fix deadlock in do_recovery
  lockdep: make MAX_LOCK_DEPTH configurable from Kconfig

 drivers/pci/bus.c                  | 13 ++++++++--
 drivers/pci/pci.c                  | 38 ++++++++++++++++++++---------
 drivers/pci/pcie/aer/aerdrv_core.c | 50 ++++++++++++++++++++++++++++++--------
 fs/configfs/inode.c                |  2 +-
 include/linux/pci.h                | 18 ++++++++++++++
 include/linux/sched.h              |  3 +--
 kernel/locking/lockdep.c           | 13 +++++-----
 lib/Kconfig.debug                  | 10 ++++++++
 8 files changed, 115 insertions(+), 32 deletions(-)

-- 
2.14.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ