linux-kernel - Re: [PATCH] slub: prevent validate_slab() error due to race condition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Apr 2012 16:10:16 -0400
From:	Waiman Long <waiman.long@...com>
To:	Christoph Lameter <cl@...ux.com>
CC:	"penberg@...nel.org" <penberg@...nel.org>,
	"mpm@...enic.com" <mpm@...enic.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Morris, Donald George (HP-UX Cupertino)" <don.morris@...com>
Subject: Re: [PATCH] slub: prevent validate_slab() error due to race condition

On 4/27/2012 10:59 AM, Christoph Lameter wrote:
> On Thu, 26 Apr 2012, Waiman Long wrote:
>
>> The SLUB memory allocator was changed substantially from 3.0 to 3.1 by
>> replacing some of page locking codes for updating the free object list
>> of the slab with double-quadword atomic exchange (cmpxchg_double_slab)
>> or a pseudo one using a page lock when debugging is turned on.  In the
>> normal case, that should be enough to make sure that the slab is in a
>> consistent state. However, when CONFIG_SLUB_DEBUG is turned on and the
>> Redzone debugging flag is set, the Redzone bytes are also used to mark
>> if an object is free or allocated. The extra state information in those
>> Redzone bytes is not protected by the cmpxchg_double_slab(). As a
>> result,
>> validate_slab() may report a Redzone error if the validation is
>> performed
>> while racing with a free to a debugged slab.
> Right. The problem is unique to validate_slab because no one else outside
> the slab has access to that object that is to be freed.
>
>> This patch fixes the BUG message by acquiring the node-level lock for
>> slabs flagged for debugging to avoid this possible racing condition.
>> The locking is done on the node-level lock instead of the more granular
>> page lock because the new code may speculatively acquire the node-level
>> lock later on. Acquiring the page lock and then the node lock may lead
>> to potential deadlock.
> Correct that would address the issue.
>
>> As the increment of slab node count and insertion of the new slab into
>> the partial or full slab list is not an atomic operation, there is a
>> small time window where the two may not match. This patch temporarily
>> works around this problem by allowing the node count to be one larger
>> than the number of slab presents in the lists. This workaround may not
>> work if more than one CPU is actively adding slab to the same node,
>> but it should be good enough to workaround the problem in most cases.
> Well yeah that is not a real fix. Its been racy for a long time.
>
>> To really fix the issue, the overall synchronization between debug slub
>> operations and slub validation needs a revisit.
> True. The sync between validation and concurrent operations was not in
> focus so far. Validation was used (at least by me) so far only to validate
> that the slab structures are still okay after running some tests.
>
> Lets just do this in steps. First patch here is a simple taking of the
> node lock in free_debug_processing. This is similar to what you have done
> but the changes are made to the debugging function instead. Then we can
> look at how to address the slab counting issue in a separate patch.
>

Thank for the quick response. I have no problem for moving the node-lock 
taking into free_debug_processing. Of the 2 problems that are reported, 
this is a more serious one and so need to be fixed sooner rather than 
later. For the other one, we can take more time to find a better solution.

So are you going to integrate your change to the mainline?

> Subject: slub: Take node lock during object free checks
>
> This is needed for proper synchronization with validate_slab()
> as pointed out by Waiman Long<Waiman.Long@...com>
>
> Reported-by: Waiman Long<Waiman.Long@...com>
> Signed-off-by: Christoph Lameter<cl@...ux.com>
>
>
> ---
>   mm/slub.c |   30 ++++++++++++++++++------------
>   1 file changed, 18 insertions(+), 12 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2012-04-27 09:40:00.000000000 -0500
> +++ linux-2.6/mm/slub.c	2012-04-27 09:50:15.000000000 -0500
> @@ -1082,13 +1082,13 @@ bad:
>   	return 0;
>   }
>
> -static noinline int free_debug_processing(struct kmem_cache *s,
> -		 struct page *page, void *object, unsigned long addr)
> +static noinline struct kmem_cache_node *free_debug_processing(
> +	struct kmem_cache *s, struct page *page, void *object,
> +	unsigned long addr, unsigned long *flags)
>   {
> -	unsigned long flags;
> -	int rc = 0;
> +	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
>
> -	local_irq_save(flags);
> +	spin_lock_irqsave(&n->list_lock, *flags);
>   	slab_lock(page);
>
>   	if (!check_slab(s, page))
> @@ -1126,15 +1126,19 @@ static noinline int free_debug_processin
>   		set_track(s, object, TRACK_FREE, addr);
>   	trace(s, page, object, 0);
>   	init_object(s, object, SLUB_RED_INACTIVE);
> -	rc = 1;
>   out:
>   	slab_unlock(page);
> -	local_irq_restore(flags);
> -	return rc;
> +	/*
> +	 * Keep node_lock to preserve integrity
> +	 * until the object is actually freed
> +	 */
> +	return n;
>
>   fail:
> +	slab_unlock(page);
> +	spin_unlock_irqrestore(&n->list_lock, *flags);
>   	slab_fix(s, "Object at 0x%p not freed", object);
> -	goto out;
> +	return NULL;
>   }
>
>   static int __init setup_slub_debug(char *str)
> @@ -1227,8 +1231,9 @@ static inline void setup_object_debug(st
>   static inline int alloc_debug_processing(struct kmem_cache *s,
>   	struct page *page, void *object, unsigned long addr) { return 0; }
>
> -static inline int free_debug_processing(struct kmem_cache *s,
> -	struct page *page, void *object, unsigned long addr) { return 0; }
> +static inline struct kmem_cache_node *free_debug_processing(
> +	struct kmem_cache *s, struct page *page, void *object,
> +	unsigned long addr, unsigned long *flags) { return NULL; }
>
>   static inline int slab_pad_check(struct kmem_cache *s, struct page *page)
>   			{ return 1; }
> @@ -2445,7 +2450,8 @@ static void __slab_free(struct kmem_cach
>
>   	stat(s, FREE_SLOWPATH);
>
> -	if (kmem_cache_debug(s)&&  !free_debug_processing(s, page, x, addr))
> +	if (kmem_cache_debug(s)&&
> +		!(n = free_debug_processing(s, page, x, addr,&flags)))
>   		return;
>
>   	do {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/