[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53fa618d-376f-2200-c8ba-e22ba004cdc0@oracle.com>
Date: Tue, 18 Jun 2019 15:10:33 -0700
From: Jane Chu <jane.chu@...cle.com>
To: Dan Williams <dan.j.williams@...el.com>, linux-nvdimm@...ts.01.org
Cc: Ira Weiny <ira.weiny@...el.com>, Dave Jiang <dave.jiang@...el.com>,
Keith Busch <keith.busch@...el.com>, stable@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will.deacon@....com>,
Ingo Molnar <mingo@...hat.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Erwin Tsaur <erwin.tsaur@...cle.com>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Vishal Verma <vishal.l.verma@...el.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/6] libnvdimm: Fix async operations and locking
On 6/11/2019 4:25 PM, Dan Williams wrote:
> The libnvdimm subsystem uses async operations to parallelize device
> probing operations and to allow sysfs to trigger device_unregister() on
> deleted namepsaces. A multithreaded stress test of the libnvdimm sysfs
> interface uncovered a case where device_unregister() is triggered
> multiple times, and the subsequent investigation uncovered a broken
> locking scenario.
>
> The lack of lockdep coverage for device_lock() stymied the debug. That
> is, until patch6 "driver-core, libnvdimm: Let device subsystems add
> local lockdep coverage" solved that with a shadow lock, with lockdep
> coverage, to mirror device_lock() operations. Given the time saved with
> shadow-lock debug-hack, patch6 attempts to generalize device_lock()
> debug facility that might be able to be carried upstream. Patch6 is
> staged at the end of this fix series in case it is contentious and needs
> to be dropped.
>
> Patch1 "drivers/base: Introduce kill_device()" could be achieved with
> local libnvdimm infrastructure. However, the existing 'dead' flag in
> 'struct device_private' aims to solve similar async register/unregister
> races so the fix in patch2 "libnvdimm/bus: Prevent duplicate
> device_unregister() calls" can be implemented with existing driver-core
> infrastructure.
>
> Patch3 is a rare lockdep warning that is intermittent based on
> namespaces racing ahead of the completion of probe of their parent
> region. It is not related to the other fixes, it just happened to
> trigger as a result of the async stress test.
>
> Patch4 and patch5 address an ABBA deadlock tripped by the stress test.
>
> These patches pass the failing stress test and the existing libnvdimm
> unit tests with CONFIG_PROVE_LOCKING=y and the new "dev->lockdep_mutex"
> shadow lock with no lockdep warnings.
>
> ---
>
> Dan Williams (6):
> drivers/base: Introduce kill_device()
> libnvdimm/bus: Prevent duplicate device_unregister() calls
> libnvdimm/region: Register badblocks before namespaces
> libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
> libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
> driver-core, libnvdimm: Let device subsystems add local lockdep coverage
>
>
> drivers/acpi/nfit/core.c | 28 ++++---
> drivers/acpi/nfit/nfit.h | 24 ++++++
> drivers/base/core.c | 30 ++++++--
> drivers/nvdimm/btt_devs.c | 16 ++--
> drivers/nvdimm/bus.c | 154 +++++++++++++++++++++++++++------------
> drivers/nvdimm/core.c | 10 +--
> drivers/nvdimm/dimm_devs.c | 4 +
> drivers/nvdimm/namespace_devs.c | 36 +++++----
> drivers/nvdimm/nd-core.h | 71 ++++++++++++++++++
> drivers/nvdimm/pfn_devs.c | 24 +++---
> drivers/nvdimm/pmem.c | 4 +
> drivers/nvdimm/region.c | 24 +++---
> drivers/nvdimm/region_devs.c | 12 ++-
> include/linux/device.h | 6 ++
> 14 files changed, 308 insertions(+), 135 deletions(-)
>
Tested-by: Jane Chu <jane.chu@...cle.com>
Specifically, running parallel ndctls creating/destroying namespaces in
multiple processes concurrently led to system panic, that has been
verified fixed by this patch series.
Thanks!
-jane
Powered by blists - more mailing lists