lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230309083523.66592-1-qiuxu.zhuo@intel.com>
Date:   Thu,  9 Mar 2023 16:35:23 +0800
From:   Qiuxu Zhuo <qiuxu.zhuo@...el.com>
To:     tglx@...utronix.de
Cc:     arjan.van.de.ven@...el.com, arjan@...ux.intel.com,
        boqun.feng@...il.com, davem@...emloft.net, edumazet@...gle.com,
        kuba@...nel.org, linux-kernel@...r.kernel.org,
        mark.rutland@....com, maz@...nel.org, netdev@...r.kernel.org,
        pabeni@...hat.com, peterz@...radead.org,
        torvalds@...uxfoundation.org, wangyang.guo@...el.com,
        will@...nel.org, x86@...nel.org
Subject: Re: [patch V2 3/4] atomics: Provide rcuref - scalable reference counting

Hi Thomas,

Some comments on the comments.
If I'm wrong, please correct me ;-).

> From: Thomas Gleixner <tglx@...utronix.de>
> To: LKML <linux-kernel@...r.kernel.org>
> Cc: Linus Torvalds <torvalds@...uxfoundation.org>,
> 	x86@...nel.org, Wangyang Guo <wangyang.guo@...el.com>,
> 	Arjan van De Ven <arjan@...ux.intel.com>,
> 	"David S. Miller" <davem@...emloft.net>,
> 	Eric Dumazet <edumazet@...gle.com>,
> 	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
> 	netdev@...r.kernel.org, Will Deacon <will@...nel.org>,
> 	Peter Zijlstra <peterz@...radead.org>,
> 	Boqun Feng <boqun.feng@...il.com>,
> 	Mark Rutland <mark.rutland@....com>,
> 	Marc Zyngier <maz@...nel.org>,
> 	Arjan Van De Ven <arjan.van.de.ven@...el.com>
> Subject: [patch V2 3/4] atomics: Provide rcuref - scalable reference counting
> 
> atomic_t based reference counting, including refcount_t, uses
> atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is
> implemented with a atomic_try_cmpxchg() loop. High contention of the
> reference count leads to retry loops and scales badly. There is nothing to
> improve on this implementation as the semantics have to be preserved.
> 
> Provide rcuref as a scalable alternative solution which is suitable for RCU
> managed objects. Similar to refcount_t it comes with overflow and underflow
> detection and mitigation.
> 
> rcuref treats the underlying atomic_t as an unsigned integer and partitions
> this space into zones:
> 
>   0x00000000 - 0x7FFFFFFF	valid zone (1 .. INT_MAX references)

>From the point of rcuref_read()'s view:
0x00000000 encodes 1, ...,  then 0x7FFFFFFF should encode INT_MAX + 1 references.

>   0x80000000 - 0xBFFFFFFF	saturation zone
>   0xC0000000 - 0xFFFFFFFE	dead zone
>   0xFFFFFFFF   			no reference
> 
> rcuref_get() unconditionally increments the reference count with
> atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the
> reference count with atomic_add_negative_release().
> 
> This unconditional increment avoids the inc_not_zero() problem, but
> requires a more complex implementation on the put() side when the count
> drops from 0 to -1.
> 
> When this transition is detected then it is attempted to mark the reference
> count dead, by setting it to the midpoint of the dead zone with a single
> atomic_cmpxchg_release() operation. This operation can fail due to a
> concurrent rcuref_get() elevating the reference count from -1 to 0 again.
> 
> If the unconditional increment in rcuref_get() hits a reference count which
> is marked dead (or saturated) it will detect it after the fact and bring
> back the reference count to the midpoint of the respective zone. The zones
> provide enough tolerance which makes it practically impossible to escape
> from a zone.

[...]

> + * Why not refcount?
> + * =================
> + *
> + * In principle it should be possible to make refcount use the rcuref
> + * scheme, but the destruction race described below cannot be prevented
> + * unless the protected object is RCU managed.
> + *
> + * Theory of operation
> + * ===================
> + *
> + * rcuref uses an unsigned integer reference counter. As long as the
> + * counter value is greater than or equal to RCUREF_ONEREF and not larger
> + * than RCUREF_MAXREF the reference is alive:
> + *
> + * ONEREF   MAXREF               SATURATED             RELEASED      DEAD    NOREF
> + * 0        0x7FFFFFFF 0x8000000 0xA0000000 0xBFFFFFFF 0xC0000000 0xE0000000 0xFFFFFFFF
> + * <---valid --------> <-------saturation zone-------> <-----dead zone----->
> + *
> + * The get() and put() operations do unconditional increments and
> + * decrements. The result is checked after the operation. This optimizes
> + * for the fast path.
> + *
> + * If the reference count is saturated or dead, then the increments and
> + * decrements are not harmful as the reference count still stays in the
> + * respective zones and is always set back to STATURATED resp. DEAD. The
> + * zones have room for 2^28 racing operations in each direction, which
> + * makes it practically impossible to escape the zones.
> + *
> + * Once the last reference is dropped the reference count becomes
> + * RCUREF_NOREF which forces rcuref_put() into the slowpath operation. The
> + * slowpath then tries to set the reference count from RCUREF_NOREF to
> + * RCUREF_DEAD via a cmpxchg(). This opens a small window where a
> + * concurrent rcuref_get() can acquire the reference count and bring it
> + * back to RCUREF_ONEREF or even drop the reference again and mark it DEAD.
> + *
> + * If the cmpxchg() succeeds then a concurrent rcuref_get() will result in
> + * DEAD + 1, which is inside the dead zone. If that happens the reference
> + * count is put back to DEAD.
> + *
> + * The actual race is possible due to the unconditional increment and
> + * decrements in rcuref_get() and rcuref_put():
> + *
> + *	T1				T2
> + *	get()				put()
> + *					if (atomic_add_negative(1, &ref->refcnt))

For T2 put() here:
"if (atomic_add_negative(1, &ref->refcnt))" ->
"if (atomic_add_negative(-1, &ref->refcnt))"

> + *		succeeds->			atomic_cmpxchg(&ref->refcnt, -1, DEAD);

Is it more readable if 's/-1/NODEF/g' ?

> + *
> + *	atomic_add_negative(1, &ref->refcnt);	<- Elevates refcount to DEAD + 1
> + *
> + * As the result of T1's add is negative, the get() goes into the slow path
> + * and observes refcnt being in the dead zone which makes the operation fail.
> + *
> + * Possible critical states:
> + *
> + *	Context Counter	References	Operation
> + *	T1	0	1		init()
> + *	T2	1	2		get()
> + *	T1	0	1		put()
> + *	T2     -1	0		put() tries to mark dead
> + *	T1	0	1		get()
> + *	T2	0	1		put() mark dead fails
> + *	T1     -1	0		put() tries to mark dead
> + *	T1    DEAD	0		put() mark dead succeeds
> + *	T2    DEAD+1	0		get() fails and puts it back to DEAD
> + *
> + * Of course there are more complex scenarios, but the above illustrates
> + * the working principle. The rest is left to the imagination of the
> + * reader.
> + *
> + * Deconstruction race
> + * ===================
> + *
> + * The release operation must be protected by prohibiting a grace period in
> + * order to prevent a possible use after free:
> + *
> + *	T1				T2
> + *	put()				get()
> + *	// ref->refcnt = ONEREF
> + *	if (atomic_add_negative(-1, &ref->cnt))

For T1 put() here:
"if (atomic_add_negative(-1, &ref->cnt))" ->
"if (!atomic_add_negative(-1, &ref->cnt))"

> + *		return false;				<- Not taken
> + *
> + *	// ref->refcnt == NOREF
> + *	--> preemption
> + *					// Elevates ref->c to ONEREF

s/ref->c/ref->refcnt/g

> + *					if (!atomic_add_negative(1, &ref->refcnt))
> + *						return true;			<- taken
> + *
> + *					if (put(&p->ref)) { <-- Succeeds
> + *						remove_pointer(p);
> + *						kfree_rcu(p, rcu);
> + *					}
> + *
> + *		RCU grace period ends, object is freed
> + *
> + *	atomic_cmpxchg(&ref->refcnt, NONE, DEAD);	<- UAF

s/NONE/NOREF/g

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ