[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F9AFD28.2030801@hp.com>
Date: Fri, 27 Apr 2012 16:10:16 -0400
From: Waiman Long <waiman.long@...com>
To: Christoph Lameter <cl@...ux.com>
CC: "penberg@...nel.org" <penberg@...nel.org>,
"mpm@...enic.com" <mpm@...enic.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Morris, Donald George (HP-UX Cupertino)" <don.morris@...com>
Subject: Re: [PATCH] slub: prevent validate_slab() error due to race condition
On 4/27/2012 10:59 AM, Christoph Lameter wrote:
> On Thu, 26 Apr 2012, Waiman Long wrote:
>
>> The SLUB memory allocator was changed substantially from 3.0 to 3.1 by
>> replacing some of page locking codes for updating the free object list
>> of the slab with double-quadword atomic exchange (cmpxchg_double_slab)
>> or a pseudo one using a page lock when debugging is turned on. In the
>> normal case, that should be enough to make sure that the slab is in a
>> consistent state. However, when CONFIG_SLUB_DEBUG is turned on and the
>> Redzone debugging flag is set, the Redzone bytes are also used to mark
>> if an object is free or allocated. The extra state information in those
>> Redzone bytes is not protected by the cmpxchg_double_slab(). As a
>> result,
>> validate_slab() may report a Redzone error if the validation is
>> performed
>> while racing with a free to a debugged slab.
> Right. The problem is unique to validate_slab because no one else outside
> the slab has access to that object that is to be freed.
>
>> This patch fixes the BUG message by acquiring the node-level lock for
>> slabs flagged for debugging to avoid this possible racing condition.
>> The locking is done on the node-level lock instead of the more granular
>> page lock because the new code may speculatively acquire the node-level
>> lock later on. Acquiring the page lock and then the node lock may lead
>> to potential deadlock.
> Correct that would address the issue.
>
>> As the increment of slab node count and insertion of the new slab into
>> the partial or full slab list is not an atomic operation, there is a
>> small time window where the two may not match. This patch temporarily
>> works around this problem by allowing the node count to be one larger
>> than the number of slab presents in the lists. This workaround may not
>> work if more than one CPU is actively adding slab to the same node,
>> but it should be good enough to workaround the problem in most cases.
> Well yeah that is not a real fix. Its been racy for a long time.
>
>> To really fix the issue, the overall synchronization between debug slub
>> operations and slub validation needs a revisit.
> True. The sync between validation and concurrent operations was not in
> focus so far. Validation was used (at least by me) so far only to validate
> that the slab structures are still okay after running some tests.
>
> Lets just do this in steps. First patch here is a simple taking of the
> node lock in free_debug_processing. This is similar to what you have done
> but the changes are made to the debugging function instead. Then we can
> look at how to address the slab counting issue in a separate patch.
>
Thank for the quick response. I have no problem for moving the node-lock
taking into free_debug_processing. Of the 2 problems that are reported,
this is a more serious one and so need to be fixed sooner rather than
later. For the other one, we can take more time to find a better solution.
So are you going to integrate your change to the mainline?
> Subject: slub: Take node lock during object free checks
>
> This is needed for proper synchronization with validate_slab()
> as pointed out by Waiman Long<Waiman.Long@...com>
>
> Reported-by: Waiman Long<Waiman.Long@...com>
> Signed-off-by: Christoph Lameter<cl@...ux.com>
>
>
> ---
> mm/slub.c | 30 ++++++++++++++++++------------
> 1 file changed, 18 insertions(+), 12 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c 2012-04-27 09:40:00.000000000 -0500
> +++ linux-2.6/mm/slub.c 2012-04-27 09:50:15.000000000 -0500
> @@ -1082,13 +1082,13 @@ bad:
> return 0;
> }
>
> -static noinline int free_debug_processing(struct kmem_cache *s,
> - struct page *page, void *object, unsigned long addr)
> +static noinline struct kmem_cache_node *free_debug_processing(
> + struct kmem_cache *s, struct page *page, void *object,
> + unsigned long addr, unsigned long *flags)
> {
> - unsigned long flags;
> - int rc = 0;
> + struct kmem_cache_node *n = get_node(s, page_to_nid(page));
>
> - local_irq_save(flags);
> + spin_lock_irqsave(&n->list_lock, *flags);
> slab_lock(page);
>
> if (!check_slab(s, page))
> @@ -1126,15 +1126,19 @@ static noinline int free_debug_processin
> set_track(s, object, TRACK_FREE, addr);
> trace(s, page, object, 0);
> init_object(s, object, SLUB_RED_INACTIVE);
> - rc = 1;
> out:
> slab_unlock(page);
> - local_irq_restore(flags);
> - return rc;
> + /*
> + * Keep node_lock to preserve integrity
> + * until the object is actually freed
> + */
> + return n;
>
> fail:
> + slab_unlock(page);
> + spin_unlock_irqrestore(&n->list_lock, *flags);
> slab_fix(s, "Object at 0x%p not freed", object);
> - goto out;
> + return NULL;
> }
>
> static int __init setup_slub_debug(char *str)
> @@ -1227,8 +1231,9 @@ static inline void setup_object_debug(st
> static inline int alloc_debug_processing(struct kmem_cache *s,
> struct page *page, void *object, unsigned long addr) { return 0; }
>
> -static inline int free_debug_processing(struct kmem_cache *s,
> - struct page *page, void *object, unsigned long addr) { return 0; }
> +static inline struct kmem_cache_node *free_debug_processing(
> + struct kmem_cache *s, struct page *page, void *object,
> + unsigned long addr, unsigned long *flags) { return NULL; }
>
> static inline int slab_pad_check(struct kmem_cache *s, struct page *page)
> { return 1; }
> @@ -2445,7 +2450,8 @@ static void __slab_free(struct kmem_cach
>
> stat(s, FREE_SLOWPATH);
>
> - if (kmem_cache_debug(s)&& !free_debug_processing(s, page, x, addr))
> + if (kmem_cache_debug(s)&&
> + !(n = free_debug_processing(s, page, x, addr,&flags)))
> return;
>
> do {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists