lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060728171155.GA3739@localhost.localdomain>
Date:	Fri, 28 Jul 2006 10:11:55 -0700
From:	Ravikiran G Thirumalai <kiran@...lex86.org>
To:	Christoph Lameter <clameter@....com>
Cc:	Pekka Enberg <penberg@...helsinki.fi>, alokk@...softinc.com,
	tglx@...utronix.de, LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: [BUG] Lockdep recursive locking in kmem_cache_free

On Fri, Jul 28, 2006 at 07:53:56AM -0700, Christoph Lameter wrote:
> On Fri, 28 Jul 2006, Pekka Enberg wrote:
> 
> > > [   57.976447]  [<ffffffff802542fc>] __lock_acquire+0x8cc/0xcb0
> > > [   57.976562]  [<ffffffff80254a02>] lock_acquire+0x52/0x70
> > > [   57.976675]  [<ffffffff8028f201>] kmem_cache_free+0x141/0x210
> > > [   57.976790]  [<ffffffff804a6b74>] _spin_lock+0x34/0x50
> > > [   57.976903]  [<ffffffff8028f201>] kmem_cache_free+0x141/0x210
> > > [   57.977018]  [<ffffffff8028f388>] slab_destroy+0xb8/0xf0
> 
> Huh? _spin_lock calls kmem_cache_free?
> 
> >  cache_reap
> >  reap_alien	(grabs l3->alien[node]->lock)
> >  __drain_alien_cache
> >  free_block
> >  slab_destroy	(slab management off slab)
> >  kmem_cache_free
> >  __cache_free
> >  cache_free_alien (recursive attempt on l3->alien[node] lock)
> > 
> > Christoph?
> 
> This should not happen. __drain_alien_cache frees node local elements
> thus cache_free_alien should not be called. However, if the slab 
> management was allocated on a different node from the slab data then we 
> may have an issue. However, both slab managemnt and the slab data are 
> allocated on the same node (with alloc_pages_node() and kmalloc_node).

cache_free_alien could get called, but there is no recursion here:

1. reap_alien tries dropping remote objects freed by local node (A) to the 
remote node (B) shared array cache (choosing a remote node as indicated by the 
node rotor), to do this, it takes the local alien cache lock (A), and calls 
__drain_alien_cache. The remote object comes from a slab cache X say.

2. __drain_alien_cache. takes the remote node l3 lock (B), transfers as many
objects as shared array cache of the remote node can hold, and calls
free_block to free remaining objects that could not be dropped in into the
shared array cache of remote node (B).  Now free_block is being called from
(A) to free objects on (B). 

3. free_block calls slab_destroy for the slab belonging to B. calls
kmem_cache_free for the slab management, which calls __cache_free, and 
hence cache_free_alien().  Now since this is being called from A for a local
object of B, the check in cache_free_alien fails, and cache_free_alien
*does* get executed.  Since slab management of a slab from B, local to B is
freed from A, A tries to write to the local alien cache corresponding to B,
which comes from a slab cache Y.  There is a recursion if X and Y are the
same caches.   But that is not a possibility at all, as the off slab management
for a slab cache cannot come from the same slab cache.  So this looks like a
false positive from lockdep.  

tglx, does the machine boot without lockdep?  If yes, then this is a false 
positive IMO.

Thanks,
Kiran
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ