linux-kernel - Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1228392866.19553.36.camel@pc1117.cambridge.arm.com>
Date:	Thu, 04 Dec 2008 12:14:26 +0000
From:	Catalin Marinas <catalin.marinas@....com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support

Hi Paul,

On Wed, 2008-12-03 at 10:12 -0800, Paul E. McKenney wrote:
> On Thu, Nov 20, 2008 at 11:30:34AM +0000, Catalin Marinas wrote:
> > This patch adds the base support for the kernel memory leak
> > detector. It traces the memory allocation/freeing in a way similar to
> > the Boehm's conservative garbage collector, the difference being that
> > the unreferenced objects are not freed but only shown in
> > /sys/kernel/debug/memleak. Enabling this feature introduces an
> > overhead to memory allocations.
> 
> I have some concerns about your locking/RCU design, please see
> interspersed questions and comments.

Thanks for reviewing the locking. This was probably the hardest part of
kmemleak. It had several incarnations and slowly got towards a finer
grained locking.

I'll comment below as well but I'll first show my logic which I added as
comment at the top of the memleak.c file:

 * The following locks are used by kmemleak:
 * 
 * - object_list_lock (spinlock): protects the object_list. This is the main
 *   list holding the metadata (struct memleak_object) for the allocated
 *   objects. The memleak_object structures are added to the list in the
 *   create_object() function called from the memleak_alloc() callback. They
 *   are removed from the list in put_object() if the object->use_count is 0
 * - object_tree_lock (rwlock): protects the object_tree_root. When the
 *   metadata is created in create_object(), it is added to the object prio
 *   search tree and removed in delete_object() with this lock held
 *   (write_lock). This lock is also acquired (read_lock) in
 *   find_and_get_object() when an object needs to be looked up by a pointer
 *   value (either during scanning or when changing its properties like
 *   marking as false positive)
 * - memleak_object.lock (spinlock): protects a memleak_object. Modifications
 *   of the metadata (e.g. count) are protected by this lock. Note that some
 *   members of this structure may be protected by other means (atomic or
 *   object_list lock). This lock is also held when scanning the corresponding
 *   object to avoid the kernel freeing it via the memleak_free() callback.
 *   This is less heavyweight than holding a global lock like object_list_lock
 *   during scanning
 *
 * The only mutex used is scan_mutex. This ensures that only one thread may
 * scan the memory for unreferenced objects at a time. The gray_list contains
 * the objects which are already referenced or marked as false positives and
 * need to be scanned. This list is only modified during a scanning episode
 * when the scan_mutex is held. At the end of a scan, the gray_list is always
 * empty. Note that the memleak_object.use_count is incremented when an object
 * is added to the gray_list and therefore cannot be freed.
 *
 * Freeing a memleak_object is done via an RCU callback invoked from
 * put_object() when its use_count is 0 and after removing it from the
 * object_list. One of the reasons for the RCU is to delay the freeing and
 * avoid a recursive call into the allocator via kmem_cache_free(). Another
 * reason is to allow lock-less object_list traversal during memleak_scan().

> > +static void free_object_rcu(struct rcu_head *rcu)
> > +{
> > +	struct hlist_node *elem, *tmp;
> > +	struct memleak_scan_area *area;
> > +	struct memleak_object *object =
> > +		container_of(rcu, struct memleak_object, rcu);
> > +
> > +	/* once use_count is 0, there is no code accessing the object */
> 
> OK, so we won't pass free_object_rcu() to call_rcu() until use_count
> is equal to zero.  And once use_count is zero, it is never incremented.
> So the point of the RCU grace period is to ensure that all tasks
> who didn't quite call get_object() soon enough get done failing
> before we free up the object, correct?
> 
> Which means that get_object() needs to be under rcu_read_lock()...

My view here is that if use_count is 0, no other thread would be able to
use this object. It will also be removed from the object_list and hence
no other way to get this this object. I think it would have worked
pretty much OK without the RCU *but* the main reason for the grace
period is to avoid a recursive call into kmem_cache_free() since
put_object() is quite likely called via kmem_cache_free() ->
memleak_free() -> delete_object(). The alternative is to use a work
queue or something else but the RCU already had the infrastructure and I
also gained a lock-less object_list traversal in memleak_scan().

> > +static void put_object(struct memleak_object *object)
> > +{
> > +	unsigned long flags;
> > +
> > +	if (!atomic_dec_and_test(&object->use_count))
> > +		return;
> > +
> > +	/* should only get here after delete_object was called */
> > +	BUG_ON(object->flags & OBJECT_ALLOCATED);
> > +
> > +	spin_lock_irqsave(&object_list_lock, flags);
> 
> We also need to write-hold the object_tree_lock, not?

Not here, the memleak_object is removed from the object_tree in the
delete_object() function (via from the memleak_free callback). If it is
in the object_tree, it should have a use_count >= 1.

> > +	/* the last reference to this object */
> > +	list_del_rcu(&object->object_list);
> > +	call_rcu(&object->rcu, free_object_rcu);
> > +	spin_unlock_irqrestore(&object_list_lock, flags);
> > +}
> > +
> > +static struct memleak_object *find_and_get_object(unsigned long ptr, int alias)
> > +{
> > +	unsigned long flags;
> > +	struct memleak_object *object;
> > +
> > +	read_lock_irqsave(&object_tree_lock, flags);
> 
> Use of read_lock_irqsave() will work, but only if -all- deletions are
> protected by write-acquiring object_tree_lock.

Yes, deletions (in delete_object) are protected by
write_lock(object_tree_lock).

> Or unless there is an enclosing rcu_read_lock() around all calls to
> find_and_get_object().

Not needed in my opinion since if it is in the object_tree, it has
use_count >= 1 anyway and the RCU callback for freeing won't be invoked.

> > +static inline void create_object(unsigned long ptr, size_t size, int ref_count)
[...]
> > +	write_lock_irqsave(&object_tree_lock, flags);
> > +	node = prio_tree_insert(&object_tree_root, &object->tree_node);
> > +	if (node != &object->tree_node) {
> > +		unsigned long flags;
> > +
> > +		pr_warning("kmemleak: existing pointer\n");
> > +		dump_stack();
> > +
> > +		object = lookup_object(ptr, 1);
> > +		spin_lock_irqsave(&object->lock, flags);
> > +		dump_object_info(object);
> > +		spin_unlock_irqrestore(&object->lock, flags);
> > +
> > +		panic("kmemleak: cannot insert 0x%lx into the object search tree\n",
> > +		      ptr);
> > +	}
> > +	write_unlock_irqrestore(&object_tree_lock, flags);
> 
> In theory, you are OK here, but only because the below is adding
> an element (not removing it).  But this code is looking pretty dodgy.
> 
> What exactly does object_tree_lock protect?  What exactly does
> object_list protect?

See my explanation at the beginning. I separated the locking for the
object_list and the object_tree_root (prio search tree). I think I could
have held the object_tree_lock only for the prio_tree_insert() call.

> > +static void memleak_scan(void)
[...]
> > +		/* reset the reference count (whiten the object) */
> > +		object->count = 0;
> > +		if (color_gray(object) && get_object(object))
> > +			list_add_tail(&object->gray_list, &gray_list);
> 
> What prevents other tasks from concurrently adding other objects to
> gray_list?  Preventing such concurrent adds is absolutely required,
> otherwise gray_list will be corrupted.

There is scan_mutex for this.

> > +static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > +{
> > +	struct list_head *n;
> > +	struct memleak_object *next = NULL;
> > +	unsigned long flags;
> > +
> > +	++(*pos);
> > +	if (reported_leaks >= REPORTS_NR)
> > +		goto out;
> > +
> > +	spin_lock_irqsave(&object_list_lock, flags);
> 
> Using a spinlock instead of rcu_read_lock() is OK, but only if all
> updates are also protected by this same spinlock.  Which means that,
> given that find_and_get_object read-acquires object_tree_lock, deletions
> must be projected both by object_list_lock and by write-acquiring
> object_tree_lock.  Or all calls to memleak_seq_next need to be covered
> by rcu_read_lock().

The spin_lock here is only to retrieve the next object in the list but I
agree that even if the object_list modifications are protected by
object_list_lock, put_object could actually set the use_count to 0 and
get_object return in this function isn't checked. If get_object returns
successfully, I don't think an rcu_read_lock() is needed since
put_object() can no longer invoke free_object_rcu().

Thanks for your time.

-- 
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/