[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061222162044.GA279@tv-sign.ru>
Date: Fri, 22 Dec 2006 19:20:44 +0300
From: Oleg Nesterov <oleg@...sign.ru>
To: Andrew Morton <akpm@...l.org>
Cc: ego@...ibm.com, Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
linux-kernel@...r.kernel.org, David Wilder <dwilder@...ibm.com>,
Tom Zanussi <zanussi@...ibm.com>,
Ingo Molnar <mingo@...hat.com>,
Greg Kroah-Hartman <gregkh@...e.de>,
Christoph Hellwig <hch@...radead.org>, ltt-dev@...fik.org,
systemtap@...rces.redhat.com,
Douglas Niehaus <niehaus@...s.ku.edu>,
"Martin J. Bligh" <mbligh@...igh.org>,
Thomas Gleixner <tglx@...utronix.de>, kiran@...lex86.org,
venkatesh.pallipadi@...el.com, dipankar@...ibm.com,
vatsa@...ibm.com, torvalds@...l.org, davej@...hat.com
Subject: Re: [PATCH] Relay CPU Hotplug support
On 12/22, Andrew Morton wrote:
>
> On Fri, 22 Dec 2006 16:07:24 +0530
> Gautham R Shenoy <ego@...ibm.com> wrote:
>
> > While we are at this per-subsystem cpuhotplug "locking", here's a
> > proposal that might put an end to the workqueue deadlock woes.
>
> Oleg is working on some patches which will permit us to cancel or wait upon
> a particular work_struct, rather than upon all pending work_structs.
I hope there are completed. I am waiting for the next -mm release to
send a "final" patch, I need too look at set_wq_data/set_wq_data when
workqueue.c will be in sync with Linus's changes.
> This will fix the problem where we accidentlly wait upon some unrelated
> work_struct which takes a lock which is related to one which we already
> hold.
>
> I hope. It'll be a bit tricky to implement: if some foreign work_struct is
> running right now, we cannot wait upon it - we must non-blockingly dequeue
> the work_struct which we want to kill before it gets to run.
The previous patch I sent
[PATCH, RFC rc1-mm1] implement flush_work()
http://marc.theaimsgroup.com/?l=linux-kernel&m=116647310413104
has a race.
+static void wait_on_work(struct cpu_workqueue_struct *cwq,
+ struct work_struct *work)
+{
+ struct wq_barrier barr;
+ int running = 0;
+
+ spin_lock_irq(&cwq->lock);
+ if (get_wq_data(work) == cwq) {
+ list_del_init(&work->entry);
+ work_release(work);
+ }
If that work is pending on CPU 1 it, and this CPU goes down, it may be
moved to CPU 0 after flush_work() already checked CPU 0.
I think we can do this:
static void wait_on_work(struct cpu_workqueue_struct *cwq,
struct work_struct *work)
{
struct wq_barrier barr;
int running = 0;
spin_lock_irq(&cwq->lock);
if (unlikely(cwq->current_work == work)) {
init_wq_barrier(&barr);
insert_work(cwq, &barr.work, 0);
running = 1;
}
spin_unlock_irq(&cwq->lock);
if (unlikely(running)) {
mutex_unlock(&workqueue_mutex);
wait_for_completion(&barr.done);
mutex_lock(&workqueue_mutex);
}
}
void flush_work(struct workqueue_struct *wq, struct work_struct *work)
{
struct cpu_workqueue_struct *cwq;
cwq = get_wq_data(work);
if (!cwq)
return;
spin_lock_irq(&cwq->lock);
list_del_init(&work->entry);
work_release(work);
spin_unlock_irq(&cwq->lock);
mutex_lock(&workqueue_mutex);
if (is_single_threaded(wq)) {
/* Always use first cpu's area. */
wait_on_work(per_cpu_ptr(wq->cpu_wq, singlethread_cpu), work);
} else {
int cpu;
for_each_online_cpu(cpu)
wait_on_work(per_cpu_ptr(wq->cpu_wq, cpu), work);
}
mutex_unlock(&workqueue_mutex);
}
Do you see any problems? When wait_on_work() unlocks workqueue_mutex (or
whatever we choose to protect against CPU hotplug), CPU may go away. But
in that case take_over_work() will move a barrier we queued to another
CPU, it will be fired sometime, and wait_on_work() will be woken.
Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
take_over_work(), so cwq->thread should complete its ->worklist (and thus
the barrier), because currently we don't check kthread_should_stop() in
run_workqueue(). But even if we did, everything looks safe to me.
Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists