lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53fa618d-376f-2200-c8ba-e22ba004cdc0@oracle.com>
Date:   Tue, 18 Jun 2019 15:10:33 -0700
From:   Jane Chu <jane.chu@...cle.com>
To:     Dan Williams <dan.j.williams@...el.com>, linux-nvdimm@...ts.01.org
Cc:     Ira Weiny <ira.weiny@...el.com>, Dave Jiang <dave.jiang@...el.com>,
        Keith Busch <keith.busch@...el.com>, stable@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will.deacon@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Erwin Tsaur <erwin.tsaur@...cle.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Vishal Verma <vishal.l.verma@...el.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/6] libnvdimm: Fix async operations and locking

On 6/11/2019 4:25 PM, Dan Williams wrote:
> The libnvdimm subsystem uses async operations to parallelize device
> probing operations and to allow sysfs to trigger device_unregister() on
> deleted namepsaces. A multithreaded stress test of the libnvdimm sysfs
> interface uncovered a case where device_unregister() is triggered
> multiple times, and the subsequent investigation uncovered a broken
> locking scenario.
> 
> The lack of lockdep coverage for device_lock() stymied the debug. That
> is, until patch6 "driver-core, libnvdimm: Let device subsystems add
> local lockdep coverage" solved that with a shadow lock, with lockdep
> coverage, to mirror device_lock() operations. Given the time saved with
> shadow-lock debug-hack, patch6 attempts to generalize device_lock()
> debug facility that might be able to be carried upstream. Patch6 is
> staged at the end of this fix series in case it is contentious and needs
> to be dropped.
> 
> Patch1 "drivers/base: Introduce kill_device()" could be achieved with
> local libnvdimm infrastructure. However, the existing 'dead' flag in
> 'struct device_private' aims to solve similar async register/unregister
> races so the fix in patch2 "libnvdimm/bus: Prevent duplicate
> device_unregister() calls" can be implemented with existing driver-core
> infrastructure.
> 
> Patch3 is a rare lockdep warning that is intermittent based on
> namespaces racing ahead of the completion of probe of their parent
> region. It is not related to the other fixes, it just happened to
> trigger as a result of the async stress test.
> 
> Patch4 and patch5 address an ABBA deadlock tripped by the stress test.
> 
> These patches pass the failing stress test and the existing libnvdimm
> unit tests with CONFIG_PROVE_LOCKING=y and the new "dev->lockdep_mutex"
> shadow lock with no lockdep warnings.
> 
> ---
> 
> Dan Williams (6):
>        drivers/base: Introduce kill_device()
>        libnvdimm/bus: Prevent duplicate device_unregister() calls
>        libnvdimm/region: Register badblocks before namespaces
>        libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl()
>        libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
>        driver-core, libnvdimm: Let device subsystems add local lockdep coverage
> 
> 
>   drivers/acpi/nfit/core.c        |   28 ++++---
>   drivers/acpi/nfit/nfit.h        |   24 ++++++
>   drivers/base/core.c             |   30 ++++++--
>   drivers/nvdimm/btt_devs.c       |   16 ++--
>   drivers/nvdimm/bus.c            |  154 +++++++++++++++++++++++++++------------
>   drivers/nvdimm/core.c           |   10 +--
>   drivers/nvdimm/dimm_devs.c      |    4 +
>   drivers/nvdimm/namespace_devs.c |   36 +++++----
>   drivers/nvdimm/nd-core.h        |   71 ++++++++++++++++++
>   drivers/nvdimm/pfn_devs.c       |   24 +++---
>   drivers/nvdimm/pmem.c           |    4 +
>   drivers/nvdimm/region.c         |   24 +++---
>   drivers/nvdimm/region_devs.c    |   12 ++-
>   include/linux/device.h          |    6 ++
>   14 files changed, 308 insertions(+), 135 deletions(-)
> 

Tested-by: Jane Chu <jane.chu@...cle.com>

Specifically, running parallel ndctls creating/destroying namespaces in 
multiple processes concurrently led to system panic, that has been 
verified fixed by this patch series.

Thanks!
-jane

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ