[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.1811120926240.3272@file01.intranet.prod.int.rdu2.redhat.com>
Date: Mon, 12 Nov 2018 09:29:20 -0500 (EST)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>
cc: kernel test robot <rong.a.chen@...el.com>,
Linux Memory Management List <linux-mm@...ck.org>,
linux-kernel@...r.kernel.org, LKP <lkp@...org>,
Tejun Heo <tj@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Christoph Lameter <cl@...ux.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Pekka Enberg <penberg@...nel.org>
Subject: Re: [LKP] d50d82faa0 [ 33.671845] WARNING: possible circular locking
dependency detected
On Wed, 7 Nov 2018, Andrew Morton wrote:
> On Wed, 7 Nov 2018 15:43:36 -0800 Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> > On Tue, 23 Oct 2018 08:30:04 +0800 kernel test robot <rong.a.chen@...el.com> wrote:
> >
> > > Greetings,
> > >
> > > 0day kernel testing robot got the below dmesg and the first bad commit is
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > >
> > > commit d50d82faa0c964e31f7a946ba8aba7c715ca7ab0
> > > Author: Mikulas Patocka <mpatocka@...hat.com>
> > > AuthorDate: Wed Jun 27 23:26:09 2018 -0700
> > > Commit: Linus Torvalds <torvalds@...ux-foundation.org>
> > > CommitDate: Thu Jun 28 11:16:44 2018 -0700
> > >
> > > slub: fix failure when we delete and create a slab cache
> >
> > This is ugly. Is there an alternative way of fixing the race which
> > Mikulas attempted to address? Possibly cancel the work and reuse the
> > existing sysfs file, or is that too stupid to live?
> >
> > 3b7b314053d021 ("slub: make sysfs file removal asynchronous") was
> > pretty lame, really. As mentioned,
> >
> > : It'd be the cleanest to deal with the issue by removing sysfs files
> > : without holding slab_mutex before the rest of shutdown; however, given
> > : the current code structure, it is pretty difficult to do so.
> >
> > Would be a preferable approach.
> >
> > >
> > > This uncovered a bug in the slub subsystem - if we delete a cache and
> > > immediatelly create another cache with the same attributes, it fails
> > > because of duplicate filename in /sys/kernel/slab/. The slub subsystem
> > > offloads freeing the cache to a workqueue - and if we create the new
> > > cache before the workqueue runs, it complains because of duplicate
> > > filename in sysfs.
>
> Alternatively, could we flush the workqueue before attempting to
> (re)create the sysfs file?
What if someone creates the slab cache from the workqueue?
> Extra points for only doing this if the
> first (re)creation attempt returned -EEXIST?
If it returns -EEXIST, it has already written the warning to the log.
Mikulas
Powered by blists - more mailing lists