[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9efec2b-a6cc-89b6-e295-995e9afdd373@google.com>
Date: Sun, 12 Jun 2022 15:45:01 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Jann Horn <jannh@...gle.com>
cc: Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Hyeonggon Yoo <42.hyeyoo@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/slub: add missing TID updates on slab deactivation
On Wed, 8 Jun 2022, Jann Horn wrote:
> The fastpath in slab_alloc_node() assumes that c->slab is stable as long as
> the TID stays the same. However, two places in __slab_alloc() currently
> don't update the TID when deactivating the CPU slab.
>
> If multiple operations race the right way, this could lead to an object
> getting lost; or, in an even more unlikely situation, it could even lead to
> an object being freed onto the wrong slab's freelist, messing up the
> `inuse` counter and eventually causing a page to be freed to the page
> allocator while it still contains slab objects.
>
> (I haven't actually tested these cases though, this is just based on
> looking at the code. Writing testcases for this stuff seems like it'd be
> a pain...)
>
> The race leading to state inconsistency is (all operations on the same CPU
> and kmem_cache):
>
> - task A: begin do_slab_free():
> - read TID
> - read pcpu freelist (==NULL)
> - check `slab == c->slab` (true)
> - [PREEMPT A->B]
> - task B: begin slab_alloc_node():
> - fastpath fails (`c->freelist` is NULL)
> - enter __slab_alloc()
> - slub_get_cpu_ptr() (disables preemption)
> - enter ___slab_alloc()
> - take local_lock_irqsave()
> - read c->freelist as NULL
> - get_freelist() returns NULL
> - write `c->slab = NULL`
> - drop local_unlock_irqrestore()
> - goto new_slab
> - slub_percpu_partial() is NULL
> - get_partial() returns NULL
> - slub_put_cpu_ptr() (enables preemption)
> - [PREEMPT B->A]
> - task A: finish do_slab_free():
> - this_cpu_cmpxchg_double() succeeds()
> - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL]
>
>
> From there, the object on c->freelist will get lost if task B is allowed to
> continue from here: It will proceed to the retry_load_slab label,
> set c->slab, then jump to load_freelist, which clobbers c->freelist.
>
>
> But if we instead continue as follows, we get worse corruption:
>
> - task A: run __slab_free() on object from other struct slab:
> - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial)
> - task A: run slab_alloc_node() with NUMA node constraint:
> - fastpath fails (c->slab is NULL)
> - call __slab_alloc()
> - slub_get_cpu_ptr() (disables preemption)
> - enter ___slab_alloc()
> - c->slab is NULL: goto new_slab
> - slub_percpu_partial() is non-NULL
> - set c->slab to slub_percpu_partial(c)
> - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects
> from slab-2]
> - goto redo
> - node_match() fails
> - goto deactivate_slab
> - existing c->freelist is passed into deactivate_slab()
> - inuse count of slab-1 is decremented to account for object from
> slab-2
>
> At this point, the inuse count of slab-1 is 1 lower than it should be.
> This means that if we free all allocated objects in slab-1 except for one,
> SLUB will think that slab-1 is completely unused, and may free its page,
> leading to use-after-free.
>
> Fixes: c17dda40a6a4e ("slub: Separate out kmem_cache_cpu processing from deactivate_slab")
> Fixes: 03e404af26dc2 ("slub: fast release on full slab")
> Cc: stable@...r.kernel.org
> Signed-off-by: Jann Horn <jannh@...gle.com>
Acked-by: David Rientjes <rientjes@...gle.com>
Powered by blists - more mailing lists