lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 31 Mar 2009 21:56:16 -0700 From: Randy Dunlap <randy.dunlap@...cle.com> To: Rusty Russell <rusty@...tcorp.com.au> CC: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, "Rafael J. Wysocki" <rjw@...k.pl>, Bjorn Helgaas <bjorn.helgaas@...com>, andreas.herrmann3@....com, Andrew Morton <akpm@...ux-foundation.org> Subject: Re: 2.6.29 boot hang Randy Dunlap wrote: > Rusty Russell wrote: >> On Wednesday 01 April 2009 07:15:35 Randy Dunlap wrote: >>> On a 4-proc x86_64 (HP BladeCenter, AMD CPUs) system, booting 2.6.29 >>> (or earlier, back to 2.6.28-6921-g873392c) hangs during boot. >>> >>> git bisect says: >>> 873392ca514f87eae39f53b6944caf85b1a047cb is first bad commit >>> commit 873392ca514f87eae39f53b6944caf85b1a047cb >>> Author: Rusty Russell <rusty@...tcorp.com.au> >>> Date: Wed Dec 31 23:54:56 2008 +1030 >>> >>> PCI: work_on_cpu: use in drivers/pci/pci-driver.c >> ... >> >>> If I change CONFIG_MICROCODE_AMD=y to CONFIG_MICROCODE_AMD=n & rebuild, >>> the kernel boots successfully. >> How very very odd. My first thought was a deadlock with keventd used >> by work_on_cpu (changed in latest Linus tree), but the microcode code at >> that version doesn't use work_on_cpu. > > Yep, I thought it a bit odd also. > >> So I don't think that's it, but this patch should canonically eliminate it: >> >> Subject: work_on_cpu(): rewrite it to create a kernel thread on demand >> From: Andrew Morton <akpm@...ux-foundation.org> > > This patch doesn't apply to 2.6.29-final, but it does apply to 2.6.29-git8, > so I applied/tested it there. with surprising results (at least to me). > > 2.6.29-git8 works for me without any patches applied. After applying > this patch, I get the same boot hang that I was seeing with 2.6.29-final. That's incorrect. Sorry about that. 2.6.29-git8 with or without this patch applied work for me. > Make sense to you?? > > Thanks for your help. > >> The various implemetnations and proposed implemetnations of work_on_cpu() >> are vulnerable to various deadlocks because they all used queues of some >> form. >> >> Unrelated pieces of kernel code thus gained dependencies wherein if one >> work_on_cpu() caller holds a lock which some other work_on_cpu() callback >> also takes, the kernel could rarely deadlock. >> >> Fix this by creating a short-lived kernel thread for each work_on_cpu() >> invokation. >> >> This is not terribly fast, but the only current caller of work_on_cpu() is >> pci_call_probe(). >> >> It would be nice to find some other way of doing the node-local >> allocations in the PCI probe code so that we can zap work_on_cpu() >> altogether. The code there is rather nasty. I can't think of anything >> simple at this time... >> >> Cc: Rusty Russell <rusty@...tcorp.com.au> >> Cc: Ingo Molnar <mingo@...e.hu> >> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org> >> Signed-off-by: Rusty Russell <rusty@...tcorp.com.au> >> --- >> kernel/workqueue.c | 36 +++++++++++++++++++----------------- >> 1 file changed, 19 insertions(+), 17 deletions(-) >> >> diff -puN kernel/workqueue.c~work_on_cpu-rewrite-it-to-create-a-kernel-thread-on-demand kernel/workqueue.c >> --- a/kernel/workqueue.c~work_on_cpu-rewrite-it-to-create-a-kernel-thread-on-demand >> +++ a/kernel/workqueue.c >> @@ -985,20 +985,20 @@ undo: >> } >> >> #ifdef CONFIG_SMP >> -static struct workqueue_struct *work_on_cpu_wq __read_mostly; >> >> struct work_for_cpu { >> - struct work_struct work; >> + struct completion completion; >> long (*fn)(void *); >> void *arg; >> long ret; >> }; >> >> -static void do_work_for_cpu(struct work_struct *w) >> +static int do_work_for_cpu(void *_wfc) >> { >> - struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work); >> - >> + struct work_for_cpu *wfc = _wfc; >> wfc->ret = wfc->fn(wfc->arg); >> + complete(&wfc->completion); >> + return 0; >> } >> >> /** >> @@ -1009,17 +1009,23 @@ static void do_work_for_cpu(struct work_ >> * >> * This will return the value @fn returns. >> * It is up to the caller to ensure that the cpu doesn't go offline. >> + * The caller must not hold any locks which would prevent @fn from completing. >> */ >> long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg) >> { >> - struct work_for_cpu wfc; >> - >> - INIT_WORK(&wfc.work, do_work_for_cpu); >> - wfc.fn = fn; >> - wfc.arg = arg; >> - queue_work_on(cpu, work_on_cpu_wq, &wfc.work); >> - flush_work(&wfc.work); >> - >> + struct task_struct *sub_thread; >> + struct work_for_cpu wfc = { >> + .completion = COMPLETION_INITIALIZER_ONSTACK(wfc.completion), >> + .fn = fn, >> + .arg = arg, >> + }; >> + >> + sub_thread = kthread_create(do_work_for_cpu, &wfc, "work_for_cpu"); >> + if (IS_ERR(sub_thread)) >> + return PTR_ERR(sub_thread); >> + kthread_bind(sub_thread, cpu); >> + wake_up_process(sub_thread); >> + wait_for_completion(&wfc.completion); >> return wfc.ret; >> } >> EXPORT_SYMBOL_GPL(work_on_cpu); >> @@ -1035,8 +1041,4 @@ void __init init_workqueues(void) >> hotcpu_notifier(workqueue_cpu_callback, 0); >> keventd_wq = create_workqueue("events"); >> BUG_ON(!keventd_wq); >> -#ifdef CONFIG_SMP >> - work_on_cpu_wq = create_workqueue("work_on_cpu"); >> - BUG_ON(!work_on_cpu_wq); >> -#endif >> } >> _ -- ~Randy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists