[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 11 Nov 2008 21:33:24 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
David Howells <dhowells@...hat.com>,
Dmitry Torokhov <dmitry.torokhov@...il.com>,
Jiri Pirko <jpirko@...hat.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org
Subject: [RFC,PATCH] workqueues: turn queue_work() into the "barrier" for
work->func()
To clarify, I will be happy with the "no, we don't need this" comment.
But let's suppose we have
int VAR;
void work_func(struct work_struct *work)
{
if (VAR)
do_something();
}
and we are doing
VAR = 1;
queue_work(work);
I think the caller of queue_work() has all rights to expect that
the next invocation of work_func() must see "VAR == 1", but this
is not true if the work is already pending.
run_workqueue:
work_clear_pending(work)
clear_bit(WORK_STRUCT_PENDING) // no mb()
call work_func()
if (VAR)
it is possible that CPU reads VAR before before it clears _PENDING,
and queue_work() "infiltrates" in between and fails. So we can miss
an event.
I don't know if we really have such a code in kernel, and even if
we have perhaps we should fix it and do not touch workqueues. But
perhaps the current behaviour is a bit too subtle in this respect.
For example, atkbd_event_work() happens to work correctly, but only
because it does mb() implicitly.
The patch merely adds mb() after work_clear_pending(work), another
side already has the mb semantics implied by test_and_set_bit().
>From now queue_work() always acts as a barrier for work->func().
Signed-off-by: Oleg Nesterov <oleg@...hat.com>
--- K-28/kernel/workqueue.c~WQ_MB 2008-11-06 19:11:02.000000000 +0100
+++ K-28/kernel/workqueue.c 2008-11-11 21:06:20.000000000 +0100
@@ -291,6 +291,12 @@ static void run_workqueue(struct cpu_wor
BUG_ON(get_wq_data(work) != cwq);
work_clear_pending(work);
+ /*
+ * Ensure that either the concurrent queue_work() succeeds,
+ * or work->func() sees all the preceding memory changes.
+ */
+ smp_mb__after_clear_bit();
+
lock_map_acquire(&cwq->wq->lockdep_map);
lock_map_acquire(&lockdep_map);
f(work);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists