linux-kernel - Re: [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and energy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091004203639.GH6764@linux.vnet.ibm.com>
Date:	Sun, 4 Oct 2009 13:36:39 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [RFC] Userspace RCU: (ab)using futexes to save cpu cycles and
	energy

On Sun, Oct 04, 2009 at 10:37:45AM -0400, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck@...ux.vnet.ibm.com) wrote:
> > On Wed, Sep 23, 2009 at 01:48:20PM -0400, Mathieu Desnoyers wrote:
> > > Hi,
> > > 
> > > When implementing the call_rcu() "worker thread" in userspace, I ran
> > > into the problem that it had to be woken up periodically to check if
> > > there are any callbacks to execute. However, I easily imagine that this
> > > does not fit well with the "green computing" definition.
> > > 
> > > Therefore, I've looked at ways to have the call_rcu() callers waking up
> > > this worker thread when callbacks are enqueued. However, I don't want to
> > > take any lock and the fast path (when no wake up is required) should not
> > > cause any cache-line exchange.
> > > 
> > > Here are the primitives I've created. I'd like to have feedback on my
> > > futex use, just to make sure I did not do any incorrect assumptions.
> > > 
> > > This could also be eventually used in the QSBR Userspace RCU quiescent
> > > state and in mb/signal userspace RCU when exiting RCU read-side C.S. to
> > > ensure synchronize_rcu() does not busy-wait for too long.
> > > 
> > > /*
> > >  * Wake-up any waiting defer thread. Called from many concurrent threads.
> > >  */
> > > static void wake_up_defer(void)
> > > {
> > >         if (unlikely(atomic_read(&defer_thread_futex) == -1))
> > >                 atomic_set(&defer_thread_futex, 0);
> > >                 futex(&defer_thread_futex, FUTEX_WAKE,
> > >                       0, NULL, NULL, 0);
> > > }
> > > 
> > > /*
> > >  * Defer thread waiting. Single thread.
> > >  */
> > > static void wait_defer(void)
> > > {
> > >         atomic_dec(&defer_thread_futex);
> > >         if (atomic_read(&defer_thread_futex) == -1)
> > >                 futex(&defer_thread_futex, FUTEX_WAIT, -1,
> > >                       NULL, NULL, 0);
> > > }
> > 
> > The standard approach would be to use pthread_cond_wait() and
> > pthread_cond_broadcast().  Unfortunately, this would require holding a
> > pthread_mutex_lock across both operations, which would not necessarily
> > be so good for wake-up-side scalability.
> 
> The pthread_cond_broadcast() mutex is really a bugger when it comes to
> execute it at each rcu_read_unlock(). We could as well use a mutex to
> protect the whole read-side.. :-(
> 
> > That said, without this sort of heavy-locking approach, wakeup races
> > are quite difficult to avoid.
> 
> I did a formal model of my futex-based wait/wakeup. The main idea is
> that the waiter:
> 
> - Set itself to "waiting"
> - Checks the "real condition" for which it will wait (e.g. queues empty
>   when used for rcu callbacks, no more ongoing old reader thread C.S.
>   when used in synchronize_rcu())
> - Calls sys_futex if the variable have not changed.
> 
> And the waker:
> - sets the "real condition" waking up the waiter (enqueuing, or
>   rcu_read_unlock())
> - check if the waiter must be woken up, if so, wake it up by setting the
>   state to "running" and calling sys_futex.
> 
> But as you say, wakeup races are difficult (but not impossible!) to
> avoid. This is why I resorted to a formal model of the wait/wakeup
> scheme to ensure that we cannot end up in a situation where a waker
> races with the waiter and does not wake it up when it should. This is
> nothing fancy (does not model memory and instruction reordering
> automatically), but I figure that memory barriers are required between
> almost every steps of this algorithm, so by adding smp_mb() I end up
> ensure sequential behavior. I added test cases in the model to ensure
> that incorrect memory reordering _would_ cause errors by doing the
> reordering by hand in error-injection runs.

My question is whether pthread_cond_wait() and pthread_cond_broadcast()
can substitute for the raw call to futex.  Unless I am missing something
(which I quite possibly am), the kernel will serialize on the futex
anyway, so serialization in user-mode code does not add much additional
pain.

> The model is available at:
> http://www.lttng.org/cgi-bin/gitweb.cgi?p=userspace-rcu.git;a=tree;f=futex-wakeup;h=4ddeaeb2784165cb0465d4ca9f7d27acb562eae3;hb=refs/heads/formal-model
> 
> (this is in the formal-model branch of the urcu tree, futex-wakeup
> subdir)
> 
> This is modeling this snippet of code :
> 
> static int defer_thread_futex;
> 
> /*
>  * Wake-up any waiting defer thread. Called from many concurrent threads.
>  */
> static void wake_up_defer(void)
> {
>         if (unlikely(uatomic_read(&defer_thread_futex) == -1)) {
>                 uatomic_set(&defer_thread_futex, 0);
>                 futex(&defer_thread_futex, FUTEX_WAKE, 1,
>                       NULL, NULL, 0);
>         }
> }
> 
> static void enqueue(void *callback)	/* not the actual types */
> {
> 	add_to_queue(callback);
> 	smp_mb();
> 	wake_up_defer();
> }
> 
> /*
>  * rcu_defer_num_callbacks() returns the total number of callbacks
>  * enqueued.
>  */
> 
> /*
>  * Defer thread waiting. Single thread.
>  */
> static void wait_defer(void)
> {
>         uatomic_dec(&defer_thread_futex);
>         smp_mb();       /* Write futex before read queue */
>         if (rcu_defer_num_callbacks()) {
>                 smp_mb();       /* Read queue before write futex */
>                 /* Callbacks are queued, don't wait. */
>                 uatomic_set(&defer_thread_futex, 0);
>         } else {
>                 smp_rmb();      /* Read queue before read futex */
>                 if (uatomic_read(&defer_thread_futex) == -1)
>                         futex(&defer_thread_futex, FUTEX_WAIT, -1,
>                               NULL, NULL, 0);
>         }
> }
> 
> 
> Comments are welcome,

I will take a look after further recovery from jetlag.  Not yet competent
to review this kind of stuff.  Give me a few days.  ;-)

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/