linux-kernel - Re: [LKP] d50d82faa0 [ 33.671845] WARNING: possible circular locking dependency detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20181107190558.812375161de4b5df413ea31b@linux-foundation.org>
Date:   Wed, 7 Nov 2018 19:05:58 -0800
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     kernel test robot <rong.a.chen@...el.com>,
        Mikulas Patocka <mpatocka@...hat.com>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-kernel@...r.kernel.org, LKP <lkp@...org>,
        Tejun Heo <tj@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Christoph Lameter <cl@...ux.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Pekka Enberg <penberg@...nel.org>
Subject: Re: [LKP] d50d82faa0 [ 33.671845] WARNING: possible circular
 locking dependency detected

On Wed, 7 Nov 2018 15:43:36 -0800 Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Tue, 23 Oct 2018 08:30:04 +0800 kernel test robot <rong.a.chen@...el.com> wrote:
> 
> > Greetings,
> > 
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > 
> > commit d50d82faa0c964e31f7a946ba8aba7c715ca7ab0
> > Author:     Mikulas Patocka <mpatocka@...hat.com>
> > AuthorDate: Wed Jun 27 23:26:09 2018 -0700
> > Commit:     Linus Torvalds <torvalds@...ux-foundation.org>
> > CommitDate: Thu Jun 28 11:16:44 2018 -0700
> > 
> >     slub: fix failure when we delete and create a slab cache
> 
> This is ugly.  Is there an alternative way of fixing the race which
> Mikulas attempted to address?  Possibly cancel the work and reuse the
> existing sysfs file, or is that too stupid to live?
> 
> 3b7b314053d021 ("slub: make sysfs file removal asynchronous") was
> pretty lame, really.  As mentioned,
> 
> : It'd be the cleanest to deal with the issue by removing sysfs files
> : without holding slab_mutex before the rest of shutdown; however, given
> : the current code structure, it is pretty difficult to do so.
> 
> Would be a preferable approach.
> 
> >     
> >     This uncovered a bug in the slub subsystem - if we delete a cache and
> >     immediatelly create another cache with the same attributes, it fails
> >     because of duplicate filename in /sys/kernel/slab/.  The slub subsystem
> >     offloads freeing the cache to a workqueue - and if we create the new
> >     cache before the workqueue runs, it complains because of duplicate
> >     filename in sysfs.

Alternatively, could we flush the workqueue before attempting to
(re)create the sysfs file?  Extra points for only doing this if the
first (re)creation attempt returned -EEXIST?