lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 10 Oct 2017 16:48:57 +0000
From:   Trond Myklebust <trondmy@...marydata.com>
To:     "tj@...nel.org" <tj@...nel.org>
CC:     "bfields@...ldses.org" <bfields@...ldses.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "lorenzo.pieralisi@....com" <lorenzo.pieralisi@....com>,
        "jlayton@...chiereds.net" <jlayton@...chiereds.net>,
        "linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
        "jiangshanlai@...il.com" <jiangshanlai@...il.com>,
        "anna.schumaker@...app.com" <anna.schumaker@...app.com>
Subject: Re: net/sunrpc: v4.14-rc4 lockdep warning

On Tue, 2017-10-10 at 07:03 -0700, tj@...nel.org wrote:
> Hello, Trond.
> 
> On Mon, Oct 09, 2017 at 06:32:13PM +0000, Trond Myklebust wrote:
> > On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote:
> > > I have run into the lockdep warning below while running v4.14-
> > > rc3/rc4
> > > on an ARM64 defconfig Juno dev board - reporting it to check
> > > whether
> > > it is a known/genuine issue.
> > > 
> > > Please let me know if you need further debug data or need some
> > > specific tests.
> > > 
> > > [    6.209384]
> > > ======================================================
> > > [    6.215569] WARNING: possible circular locking dependency
> > > detected
> > > [    6.221755] 4.14.0-rc4 #54 Not tainted
> > > [    6.225503] --------------------------------------------------
> > > ----
> > > [    6.231689] kworker/4:0H/32 is trying to acquire lock:
> > > [    6.236830]  ((&task->u.tk_work)){+.+.}, at:
> > > [<ffff0000080e64cc>]
> > > process_one_work+0x1cc/0x3f0
> > > [    6.245472] 
> > >                but task is already holding lock:
> > > [    6.251309]  ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>]
> > > process_one_work+0x1cc/0x3f0
> > > [    6.259158] 
> > >                which lock already depends on the new lock.
> > > 
> > > [    6.267345] 
> > >                the existing dependency chain (in reverse order)
> > > is:
> 
> ..
> > Adding Tejun and Lai, since this looks like a workqueue locking
> > issue.
> 
> It looks a bit cryptic but it's warning against the following case.
> 
> 1. Memory pressure is high and rescuer kicks in for the xprtiod
>    workqueue.  There are no other kworkers serving the workqueue.
> 
> 2. The rescuer runs the xptr_destroy path and ends up calling
>    cancel_work_sync() on a work item which is queued on xprtiod.
> 
> 3. The work item is pending on the same workqueue and assuming that
>    memory pressure doesn't let off (let's say reclaim is trying to
>    kick off nfs pages), the only way it can get executed is by the
>    rescuer which is waiting for the work item - an A-B-A deadlock.
> 

Hi Tejun,

Thanks for the explanation. What I'm not really understanding here
though, is how the work item could be queued at all. We have a
wait_on_bit_lock() in xprt_destroy() that should mean the xprt-
>task_cleanup work item has completed running, and that it cannot be
requeued.

Is there a possibility that the flush_queue() might be triggered
despite the work item not being queued?

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@...marydata.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ