[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1228604850.8621.42.camel@localhost.localdomain>
Date: Sat, 06 Dec 2008 23:07:30 +0000
From: Catalin Marinas <catalin.marinas@....com>
To: paulmck@...ux.vnet.ibm.com
Cc: linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2.6.28-rc5 01/11] kmemleak: Add the base support
On Thu, 2008-12-04 at 08:55 -0800, Paul E. McKenney wrote:
> On Thu, Dec 04, 2008 at 12:14:26PM +0000, Catalin Marinas wrote:
> > On Wed, 2008-12-03 at 10:12 -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 20, 2008 at 11:30:34AM +0000, Catalin Marinas wrote:
> > > > This patch adds the base support for the kernel memory leak
> > > > detector. It traces the memory allocation/freeing in a way similar to
> > > > the Boehm's conservative garbage collector, the difference being that
> > > > the unreferenced objects are not freed but only shown in
> > > > /sys/kernel/debug/memleak. Enabling this feature introduces an
> > > > overhead to memory allocations.
> > >
> > > I have some concerns about your locking/RCU design, please see
> > > interspersed questions and comments.
[...]
> > I'll comment below as well but I'll first show my logic which I added as
> > comment at the top of the memleak.c file:
>
> This is very helpful, thank you! (And please accept my apologies if I
> missed seeing it in my earlier review.)
You haven't missed it, that's the first time I posted this text.
> > * The following locks are used by kmemleak:
> > *
> > * - object_list_lock (spinlock): protects the object_list. This is the main
> > * list holding the metadata (struct memleak_object) for the allocated
> > * objects. The memleak_object structures are added to the list in the
> > * create_object() function called from the memleak_alloc() callback. They
> > * are removed from the list in put_object() if the object->use_count is 0
>
> This part sounds good. I would also add that once object->use_count is
> zero, no one is allowed to increment it. Also, attempts to increment
> object->use_count must be under the protection of rcu_read_lock(),
> correct?
So, to make sure I understand it correctly, the rcu_read_lock() is
needed to protect between the point where the object pointer was
obtained to the get_object() point. Would it also work if
spin_lock_irqsave(object_list_lock) is used instead of rcu_read_lock()?
The call_rcu() in put_object is bracketed with object_list_lock.
BTW, I'll have a look if I could remove an object from the object_list
in delete_object() rather than waiting until put_object().
> > * - object_tree_lock (rwlock): protects the object_tree_root. When the
> > * metadata is created in create_object(), it is added to the object prio
> > * search tree and removed in delete_object() with this lock held
> > * (write_lock). This lock is also acquired (read_lock) in
> > * find_and_get_object() when an object needs to be looked up by a pointer
> > * value (either during scanning or when changing its properties like
> > * marking as false positive)
>
> Looks OK. I must confess that I am a bit fuzzy on the purpose of
> object_tree_root vs. object_list.
object_list holds all the memleak_objects in the system and it is
traversed when preparing the scanning and also when reporting the leaks.
object_tree_root is used to look-up memleak_objects by the allocated
memory block. In the past, this used to be a radix tree (with some
lockdep problems) and later a hash. I now use a prio tree because it
allows pointer ranges.
Kmemleak could probably iterate over the object_tree_root when reporting
but it is more convenient to report the leaks in the order they were
allocated (preserved by object_list) since one leak may trigger many
subsequent reports but they disappear once the first one is solved.
> > * - memleak_object.lock (spinlock): protects a memleak_object. Modifications
> > * of the metadata (e.g. count) are protected by this lock. Note that some
> > * members of this structure may be protected by other means (atomic or
> > * object_list lock). This lock is also held when scanning the corresponding
> > * object to avoid the kernel freeing it via the memleak_free() callback.
> > * This is less heavyweight than holding a global lock like object_list_lock
> > * during scanning
>
> OK, holding an object's lock can protect that object from deletion,
> but only after you actually acquire the lock. There must be some other
> mechanism preventing the object from being freed during the actual
> acquisition of the lock.
>
> Now this might be the object_list_lock, object_tree_lock, RCU, or some
> combination of the three, for example it might depend on how that object
> is looked up.
Correct. I'll have to review this again.
> > * Freeing a memleak_object is done via an RCU callback invoked from
> > * put_object() when its use_count is 0 and after removing it from the
> > * object_list. One of the reasons for the RCU is to delay the freeing and
> > * avoid a recursive call into the allocator via kmem_cache_free(). Another
> > * reason is to allow lock-less object_list traversal during memleak_scan().
>
> I did figure out the lock-less object_list traversal, but totally missed
> the fact that you were using RCU to prevent infinite recursion. Cute!
It wasn't documented, so pretty hard to guess.
> Also, the memleak_object must have been removed from object_tree before
> its use_count can possibly go to 0, correct?
Yes.
> > > > +static void free_object_rcu(struct rcu_head *rcu)
> > > > +{
> > > > + struct hlist_node *elem, *tmp;
> > > > + struct memleak_scan_area *area;
> > > > + struct memleak_object *object =
> > > > + container_of(rcu, struct memleak_object, rcu);
> > > > +
> > > > + /* once use_count is 0, there is no code accessing the object */
> > >
> > > OK, so we won't pass free_object_rcu() to call_rcu() until use_count
> > > is equal to zero. And once use_count is zero, it is never incremented.
> > > So the point of the RCU grace period is to ensure that all tasks
> > > who didn't quite call get_object() soon enough get done failing
> > > before we free up the object, correct?
> > >
> > > Which means that get_object() needs to be under rcu_read_lock()...
> >
> > My view here is that if use_count is 0, no other thread would be able to
> > use this object. It will also be removed from the object_list and hence
> > no other way to get this this object.
>
> What if some other CPU picked up a pointer to the object just before it
> was removed from the list? If that CPU was not under the protection of
> rcu_read_lock(), and if that CPU was delayed, then the object could be
> freed (and possibly re-allocated as something else) before the CPU got
> around to doing the atomic_inc_not_zero().
OK, I got it now.
> It looks to me that the code currently does the right thing here, just
> want to make sure I understand the locking and that we don't end up
> tempting someone later to break it. ;-)
I'll document it better and make sure it's clear for me as well.
> > > > +static void put_object(struct memleak_object *object)
> > > > +{
> > > > + unsigned long flags;
> > > > +
> > > > + if (!atomic_dec_and_test(&object->use_count))
> > > > + return;
> > > > +
> > > > + /* should only get here after delete_object was called */
> > > > + BUG_ON(object->flags & OBJECT_ALLOCATED);
> > > > +
> > > > + spin_lock_irqsave(&object_list_lock, flags);
> > >
> > > We also need to write-hold the object_tree_lock, not?
> >
> > Not here, the memleak_object is removed from the object_tree in the
> > delete_object() function (via from the memleak_free callback). If it is
> > in the object_tree, it should have a use_count >= 1.
>
> So the code never calls the last put_object() without first having
> called delete_object() to remove it from the object_tree? The "last"
> put_object() being the one that decrements object->use_count to zero.
Yes.
> > > > +static void *memleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > > > +{
> > > > + struct list_head *n;
> > > > + struct memleak_object *next = NULL;
> > > > + unsigned long flags;
> > > > +
> > > > + ++(*pos);
> > > > + if (reported_leaks >= REPORTS_NR)
> > > > + goto out;
> > > > +
> > > > + spin_lock_irqsave(&object_list_lock, flags);
> > >
> > > Using a spinlock instead of rcu_read_lock() is OK, but only if all
> > > updates are also protected by this same spinlock. Which means that,
> > > given that find_and_get_object read-acquires object_tree_lock, deletions
> > > must be projected both by object_list_lock and by write-acquiring
> > > object_tree_lock. Or all calls to memleak_seq_next need to be covered
> > > by rcu_read_lock().
> >
> > The spin_lock here is only to retrieve the next object in the list but I
> > agree that even if the object_list modifications are protected by
> > object_list_lock, put_object could actually set the use_count to 0 and
> > get_object return in this function isn't checked. If get_object returns
> > successfully, I don't think an rcu_read_lock() is needed since
> > put_object() can no longer invoke free_object_rcu().
>
> OK, so let me see if I understand:
>
> The memleak_object passed in via the "v" argument to
> memleak_seq_next() was get_object()-ed by some prior
> call, either an earlier memleak_seq_next() or presumably
> by memleak_seq_start().
Yes.
> memleak_seq_start() -does- do its scan under RCU protection,
> so looks OK.
>
> I believe you also need RCU protection in memleak_seq_next()
> to prevent the next memleak_object from disappearing
> during the traversal. Yes, you do greatly decrease the
> odds of this happening by having irqs disabled, but the
> fact is that RCU is within its rights to end a grace
> period during this time.
I'll try to make the memleak_seq_next() function use rcu_dereference()
and rcu_read_lock(). ATM, the "v" object has use_count >= 1 (from a
previous get_object) and the next pointer is accessed under
object_list_lock, so no modifications to the list (even put_object
acquires this lock when invoking call_rcu). There is still the bug with
not checking the get_object() return.
> Assuming that I do understand, as you say, if the get_object() in
> memleak_seq_next() fails, we could end up accessing freed-up memory on
> the next call to memleak_seq_next(), or even during the current one,
> assuming an aggressive RCU or an extended NMI, SMI, burst of ECC errors
> or some other delay. So I agree that it is necessary to check the return
> value of get_object().
Yes
> Also, I do see the put_object() call in memleak_seq_stop(), but it looks
> to me that this only does a put_object() on the last memleak_object.
> What does the put_object() on the earlier memleak_object structures that
> were scanned by memleak_seq_next? Or is there never more than one
> such object in a given list?
The previous objects' use_count are decremented in memleak_seq_next()
just before returning "next", so between seq_start-seq_next and
seq_next-seq_stop, there is only one object with an incremented
use_count. The memleak_seq_next() function may have two such objects for
a small period of time.
I actually added a test for this in memleak_scan() (if DEBUG is defined)
and I've never got any reports. There may be some situations when for
very short periods of time the use_count is > 1 at the beginning of a
scan, usually when one of the memleak_scan_area or memleak_ignore
callbacks are invoked.
I'll revise the locking a bit and re-post the patches this week.
Thanks.
--
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists