lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20151007132007.GA24540@redhat.com> Date: Wed, 7 Oct 2015 15:20:07 +0200 From: Oleg Nesterov <oleg@...hat.com> To: Peter Zijlstra <peterz@...radead.org> Cc: heiko.carstens@...ibm.com, linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com> Subject: Re: [RFC][PATCH] sched: Start stopper early On 10/07, Peter Zijlstra wrote: > > On Wed, Oct 07, 2015 at 02:30:46PM +0200, Oleg Nesterov wrote: > > On 10/07, Peter Zijlstra wrote: > > > > > > So Heiko reported some 'interesting' fail where stop_two_cpus() got > > > stuck in multi_cpu_stop() with one cpu waiting for another that never > > > happens. > > > > > > It _looks_ like the 'other' cpu isn't running and the current best > > > theory is that we race on cpu-up and get the stop_two_cpus() call in > > > before the stopper task is running. > > > > > > This _is_ possible because we set 'online && active' > > > > Argh. Can't really comment this change right now, but this reminds me > > that stop_two_cpus() path should not rely on cpu_active() at all. I mean > > we should not use this check to avoid the deadlock, migrate_swap_stop() > > can check it itself. And cpu_stop_park()->cpu_stop_signal_done() should > > be replaced by BUG_ON(). > > > > Probably slightly off-topic, but what do you finally think about the old > > "[PATCH v2 6/6] stop_machine: kill stop_cpus_lock and lg_double_lock/unlock()" > > we discussed in http://marc.info/?t=143750670300014 ? > > > > I won't really insist if you still dislike it, but it seems we both > > agree that "lg_lock stop_cpus_lock" must die in any case, and after that > > we can the cleanups mentioned above. > > Yes, I was looking at that, this issue reminded me we still had that > issue open. Great, thanks! But let me add that I tried to confuse you because I forgot what actually I was going to do... I meant something like the (incomplete) patch below, and after that we can change stop_two_cpus() to rely on ->enabled and remove the cpu_active() checks (again, ignoring the fact we do not want to migrate to inactive CPU). Although I need to recall/recheck this all, perhaps I missed something... So while I think we should kill lg_lock in any case, this and the patch above is absolutely off-topic, we can do this with or without lg_lock removal. Oleg. --- x/kernel/cpu.c +++ x/kernel/cpu.c @@ -344,7 +344,7 @@ static int take_cpu_down(void *_param) /* Give up timekeeping duties */ tick_handover_do_timer(); /* Park the stopper thread */ - kthread_park(current); + stop_machine_park(param->hcpu); return 0; } --- x/kernel/stop_machine.c +++ x/kernel/stop_machine.c @@ -452,6 +452,15 @@ repeat: } } +void stop_machine_park(int cpu) +{ + struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu); + + spin_lock(&stopper->lock); + stopper->enabled = false; + spin_unlock(&stopper->lock); +} + extern void sched_set_stop_task(int cpu, struct task_struct *stop); static void cpu_stop_create(unsigned int cpu) @@ -468,10 +477,10 @@ static void cpu_stop_park(unsigned int c /* drain remaining works */ spin_lock_irqsave(&stopper->lock, flags); list_for_each_entry_safe(work, tmp, &stopper->works, list) { + WARN_ON(1); list_del_init(&work->list); cpu_stop_signal_done(work->done, false); } - stopper->enabled = false; spin_unlock_irqrestore(&stopper->lock, flags); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists