lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150317010912.GA19483@js1304-P5Q-DELUXE>
Date:	Tue, 17 Mar 2015 10:09:12 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	Mark Rutland <mark.rutland@....com>
Cc:	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Christoph Lameter <cl@...ux.com>,
	David Rientjes <rientjes@...gle.com>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Pekka Enberg <penberg@...nel.org>,
	Steve Capper <steve.capper@...aro.org>
Subject: Re: [PATCH] mm/slub: fix lockups on PREEMPT && !SMP kernels

Hello,

On Fri, Mar 13, 2015 at 03:47:12PM +0000, Mark Rutland wrote:
> Commit 9aabf810a67cd97e ("mm/slub: optimize alloc/free fastpath by
> removing preemption on/off") introduced an occasional hang for kernels
> built with CONFIG_PREEMPT && !CONFIG_SMP.
> 
> The problem is the following loop the patch introduced to
> slab_alloc_node and slab_free:
> 
> do {
>         tid = this_cpu_read(s->cpu_slab->tid);
>         c = raw_cpu_ptr(s->cpu_slab);
> } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));
> 
> GCC 4.9 has been observed to hoist the load of c and c->tid above the
> loop for !SMP kernels (as in this case raw_cpu_ptr(x) is compile-time
> constant and does not force a reload). On arm64 the generated assembly
> looks like:
> 
> ffffffc00016d3c4:       f9400404        ldr     x4, [x0,#8]
> ffffffc00016d3c8:       f9400401        ldr     x1, [x0,#8]
> ffffffc00016d3cc:       eb04003f        cmp     x1, x4
> ffffffc00016d3d0:       54ffffc1        b.ne    ffffffc00016d3c8 <slab_alloc_node.constprop.82+0x30>
> 
> If the thread is preempted between the load of c->tid (into x1) and tid
> (into x4), and and allocation or free occurs in another thread (bumping
> the cpu_slab's tid), the thread will be stuck in the loop until
> s->cpu_slab->tid wraps, which may be forever in the absence of
> allocations on the same CPU.

Is there any method to guarantee refetching these in each loop?

> 
> The loop itself is somewhat pointless as the thread can be preempted at
> any point after the loop before the this_cpu_cmpxchg_double, and the
> window for preemption within the loop is far smaller. Given that we
> assume accessing c->tid is safe for the loop condition, and we retry
> when the cmpxchg fails, we can get rid of the loop entirely and just
> access c->tid via the raw_cpu_ptr for s->cpu_slab.

Hmm... IIUC, loop itself is not pointless. It guarantees that tid and
c (s->cpu_slab) are fetched on right and same processor and this is
for algorithm correctness.

Think about your code.

c = raw_cpu_ptr(s->cpu_slab);
tid = READ_ONCE(c->tid);

This doesn't guarantee that tid is fetched on the cpu where c is
fetched if preemption/migration happens between these operations.

If c->tid, c->freelist, c->page are fetched on the other cpu,
there is no ordering guarantee and c->freelist, c->page could be stale
value even if c->tid is recent one.

Think about following free case with your patch.

Assume initial cpu 0's state as following.
c->tid: 1, c->freelist: NULL, c->page: A

User X: try to free object X for page A
User X: fetch c (s->cpu_slab)

Preemtion and migration happens...
The other allocation/free happens... so cpu 0's state is as following.
c->tid: 3, c->freelist: NULL, c->page: B

User X: read c->tid: 3, c->freelist: NULL, c->page A (stale value)

Because tid and freelist are matched with current ones, free would
succeed, but, current c->page is B and object is for A so this success
is wrong.

Loop prevents this possibility.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ