[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20111008145147.GA25607@redhat.com>
Date: Sat, 8 Oct 2011 16:51:48 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Bhanu Prakash Gollapudi <bprakash@...adcom.com>
Cc: Tejun Heo <tj@...nel.org>, Mike Christie <michaelc@...wisc.edu>,
Michael Chan <mchan@...adcom.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 00/11] Modified workqueue patches for your review
On 10/07, Bhanu Prakash Gollapudi wrote:
>
> Ok. I guess I plan to do something like this. This should avoid the race
> condition. I have not compiled or tested it yet, but will update you
> the progress.
Confused. I was CC'ed in the middle of discussion, I simply do not
understand what are you talking about. And since we discuss this
off-list I can't find the previous messages. I added lkml.
So, what does this patch do? Looks like, it is on top of another patch
which changes the old workqueue code to take get_online_cpus() instead
of cpu_maps_update_begin() in create/destroy.
That previous change was wrong. And how this one can help?
And could you please explain which problem (or problems) you are trying
to solve? I thought that the problem is that work->func() can't use
cpu_hotplug_begin(), in particular this means it can not call
destroy_workqueue().
> @@ -209,6 +220,7 @@ static int __ref _cpu_down(unsigned int cpu, int
> tasks_frozen)
> if (!cpu_online(cpu))
> return -EINVAL;
>
> + cpu_sync_hotplug_begin();
> cpu_hotplug_begin();
> set_cpu_active(cpu, false);
> err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> @@ -258,6 +270,7 @@ out_release:
> hcpu) == NOTIFY_BAD)
> BUG();
> }
> + cpu_sync_hotplug_done();
> return err;
> }
So, we add another global lock, it covers CPU_POST_DEAD.
> @@ -930,7 +932,9 @@ void destroy_workqueue(struct workqueue_struct *wq)
> const struct cpumask *cpu_map = wq_cpu_map(wq);
> int cpu;
>
> + cpu_sync_hotplug_begin();
> get_online_cpus();
> + cpu_sync_hotplug_done();
OK, we are going to flush the pending works. Since we drop _sync_ lock,
a work->func() can take it again.
Seems to work, but it doesn't. Suppose _cpu_down() is called, suppose
that it takes cpu_sync_hotplug_begin() before that work. Deadlock.
Once again. May be I missed something (or even everything ;) but you
should not blame 3da1c84c00c commit, it was always wrong to destroy_
from work->func(). Note that there is another problem, CPU_POST_DEAD
needs to flush the pending works too and we have another obvious source
of deadlock.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists