[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080226225242.GA15926@redhat.com>
Date: Tue, 26 Feb 2008 17:52:43 -0500
From: Jason Baron <jbaron@...hat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc: akpm@...ux-foundation.org, Ingo Molnar <mingo@...e.hu>,
linux-kernel@...r.kernel.org, Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: [patch 1/7] Immediate Values - Architecture Independent Code
On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> Changelog:
>
> - section __imv is already SHF_ALLOC
> - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0. So
> the if (immediateindex) is unnecessary here.
> - Remove module_mutex usage: depend on functions implemented in module.c for
> that.
hi,
In testing this patch, i've run across a deadlock...apply_imv_update() can get
called again before, ipi_busy_loop() has had a chance to finsh, and set
wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
indefinitely and the subsequent apply_imv_update() hangs. I've shown this
deadlock below using nmi_watchdog=1 in item 1).
After hitting this deadlock i modified apply_imv_update() to wait for
ipi_busy_loop(), to finish, however this resulted in a 3 way deadlock. Process
A held a read lock on tasklist_lock, then process B called apply_imv_update().
Process A received the IPI and begins executing ipi_busy_loop(). Then process
C takes a write lock irq on the task list lock, before receiving the IPI. Thus,
process A holds up process C, and C can't get an IPI b/c interrupts are
disabled. i believe this is an inherent problem in using smp_call_function, in
that one can't synchronize the processes on each other...you can reproduce
these issues using the test module below item 2)
In order to address these issues, i've modified stop_machine_run() to take an
new third parameter, RUN_ALL, to allow stop_machine_run() to run a function
on all cpus, item 3). I then modified kernel/immediate.c to use this new
infrastructure item 4). the code in immediate.c simplifies quite a bit. This
new code has been robust through all testing thus far.
thanks,
-Jason
1)
NMI Watchdog detected LOCKUP on CPU 3
CPU 3
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7cc>] [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
RSP: 0018:ffff81007fbdff70 EFLAGS: 00000002
RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fbd2930
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fbdff78 R08: ffff81007fbdff78 R09: ffff81007dd91e80
R10: ffff810001030b80 R11: 0000000000000246 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001030a80
FS: 0000000000000000(0000) GS:ffff81007f801c80(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000025e8d68 CR3: 000000007549c000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fbd6000, task ffff81007fbd2930)
Stack: 0000000000000000 ffff81007fbdffa8 ffffffff8101ca4e ffff81007fbdffa8
ffffffff8100b0e2 0000000000000003 0000000000000040 ffff81007fbd7e60
ffffffff8100cac6 ffff81007fbd7e60 <EOI> ffff81007fbd7ee8 0000000000000246
Call Trace:
<IRQ> [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
[<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
[<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
<EOI> [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
[<ffffffff8100a723>] ? enter_idle+0x22/0x24
[<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
[<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6
---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 2
CPU 2
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 9704, comm: toggle-writer Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8101c717>] [<ffffffff8101c717>] __smp_call_function_mask+0x9f/0xc1
RSP: 0018:ffff81003ac35da8 EFLAGS: 00000297
RAX: 00000000000008fc RBX: 000000000000000b RCX: 00000000000008fc
RDX: 00000000000008fc RSI: 00000000000000fc RDI: 000000000000000b
RBP: ffff81003ac35e08 R08: ffffffff8840504f R09: 00007f01f07696f0
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8106e7b3
FS: 00007f01f07696f0(0000) GS:ffff81007f801980(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f01f0796000 CR3: 000000006bcdf000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process toggle-writer (pid: 9704, threadinfo ffff81003ac34000, task ffff81003ad14930)
Stack: ffffffff8106e7b3 0000000000000000 0000000000000002 00003fff00000000
000000000000000b ffffffff81075c03 00000010001280d2 0000000000000000
0000000000000000 ffffffff8106e7b3 000000000000000f 0000000000000005
Call Trace:
[<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
[<ffffffff81075c03>] ? __alloc_pages+0x8e/0x337
[<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
[<ffffffff8101c783>] smp_call_function_mask+0x4a/0x59
[<ffffffff8101c7ab>] smp_call_function+0x19/0x1b
[<ffffffff8106e703>] imv_update_range+0xd7/0x16e
[<ffffffff8105494e>] _module_imv_update+0x35/0x54
[<ffffffff81054982>] module_imv_update+0x15/0x23
[<ffffffff884050c4>] :toggle_tester:proc_toggle_write+0x4c/0x64
[<ffffffff810d64a4>] proc_reg_write+0x7b/0x96
[<ffffffff81099f72>] vfs_write+0xae/0x157
[<ffffffff8109a53f>] sys_write+0x47/0x70
[<ffffffff8100c0e9>] tracesys+0xdc/0xe1
Code: f8 48 3b 5d c0 48 8b 05 28 1a 41 00 75 0a bf fc 00 00 00 ff 50 38 eb 0f be fc 00 00 00 48
---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7cc>] [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
RSP: 0018:ffff81007fb6ff70 EFLAGS: 00000002
RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fb5f260
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fb6ff78 R08: ffff81007fb6ff78 R09: ffff810001016700
R10: ffff810001018b80 R11: 0000000000000001 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001018a80
FS: 0000000000000000(0000) GS:ffff81007f801680(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac9ac20 CR3: 00000000778e6000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fb6a000, task ffff81007fb5f260)
Stack: 0000000000000000 ffff81007fb6ffa8 ffffffff8101ca4e ffff81007fb6ffa8
ffffffff8100b0e2 0000000000000001 0000000000000040 ffff81007fb6be60
ffffffff8100cac6 ffff81007fb6be60 <EOI> ffff81007fb6bee8 0000000000000001
Call Trace:
<IRQ> [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
[<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
[<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
<EOI> [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
[<ffffffff8100a723>] ? enter_idle+0x22/0x24
[<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
[<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6
Code: 81 48 c7 c7 02 fd 36 81 48 89 e5 e8 7b fe ff ff c9 c3 55 48 89 e5 53 9c 5e fa f0 ff 0d 82
---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 0
Kernel panic - not syncing: Aiee, killing interrupt handler!
CPU 0
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7e6>] [<ffffffff8106e7e6>] ipi_busy_loop+0x33/0x41
RSP: 0018:ffffffff81498f70 EFLAGS: 00000006
RAX: 0000000000000004 RBX: 0000000000000000 RCX: ffffffff81394620
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff81498f78 R08: ffffffff81498f78 R09: ffff810062cc1e98
R10: ffff81000100cb80 R11: ffff810062cc1d58 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81466640 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff813d2000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac6f060 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81434000, task ffffffff81394620)
Stack: 0000000000000000 ffffffff81498fa8 ffffffff8101ca4e ffffffff8100b0e2
ffffffff8100b0e2 0000000000000000 ffffffffffffffff ffffffff81435ec0
ffffffff8100cac6 ffffffff81435ec0 <EOI> ffffffff81435f48 ffff810062cc1d58
Call Trace:
<IRQ> [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
[<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
[<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
[<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
<EOI> [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
[<ffffffff8100a723>] ? enter_idle+0x22/0x24
[<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
[<ffffffff81266ee6>] ? rest_init+0x5a/0x5c
Code: f0 ff 0d 82 6b 49 00 0f ae f0 48 63 05 78 6b 49 00 48 3b 05 5d 6b 49 00 7f ed f0 ff 0d 68
---[ end trace 738433437d960ebc ]---
2)
--- /dev/null
+++ b/samples/markers/toggle-tester.c
@@ -0,0 +1,90 @@
+/* probe-example.c
+ *
+ * toggle module to test immedidate infrastructure.
+ *
+ * (C) Copyright 2007 Jason Baron <jbaron@...hat.com>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <asm/atomic.h>
+#include <linux/seq_file.h>
+#include <linux/fs.h>
+#include <linux/proc_fs.h>
+
+static DECLARE_MUTEX(toggle_mutex);
+DEFINE_IMV(char, toggle_value) = 0;
+
+static ssize_t proc_toggle_write(struct file *file, const char __user *buf,
+ size_t length, loff_t *ppos)
+{
+ int i;
+
+ down(&toggle_mutex);
+ i = imv_read(toggle_value);
+ imv_set(toggle_value, !i);
+ up(&toggle_mutex);
+
+ return 0;
+}
+
+static int proc_toggle_show(struct seq_file *s, void *p)
+{
+ int i;
+
+ down(&toggle_mutex);
+ i = imv_read(toggle_value);
+ up(&toggle_mutex);
+
+ seq_printf(s, "%u\n", i);
+
+ return 0;
+}
+
+
+static int proc_toggle_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, proc_toggle_show, NULL);
+}
+
+
+static const struct file_operations toggle_operations = {
+ .open = proc_toggle_open,
+ .read = seq_read,
+ .write = proc_toggle_write,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+
+struct proc_dir_entry *pde;
+
+static int __init toggle_init(void)
+{
+
+ pde = create_proc_entry("toggle", 0, NULL);
+ if (!pde)
+ return -ENOMEM;
+ pde->proc_fops = &toggle_operations;
+
+
+ return 0;
+}
+
+static void __exit toggle_fini(void)
+{
+ remove_proc_entry("toggle", &proc_root);
+
+}
+
+module_init(toggle_init);
+module_exit(toggle_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jason Baron");
+MODULE_DESCRIPTION("testing immediate implementation");
3)
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 5bfc553..1ab1c5b 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -8,6 +8,9 @@
#include <asm/system.h>
#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+
+#define RUN_ALL ~0U
+
/**
* stop_machine_run: freeze the machine on all CPUs and run this function
* @fn: the function to run
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 51b5ee5..e6ee46f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -23,9 +23,17 @@ enum stopmachine_state {
STOPMACHINE_WAIT,
STOPMACHINE_PREPARE,
STOPMACHINE_DISABLE_IRQ,
+ STOPMACHINE_RUN,
STOPMACHINE_EXIT,
};
+struct stop_machine_data {
+ int (*fn)(void *);
+ void *data;
+ struct completion done;
+ int run_all;
+} smdata;
+
static enum stopmachine_state stopmachine_state;
static unsigned int stopmachine_num_threads;
static atomic_t stopmachine_thread_ack;
@@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
{
int irqs_disabled = 0;
int prepared = 0;
+ int ran = 0;
set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
@@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
prepared = 1;
smp_mb(); /* Must read state first. */
atomic_inc(&stopmachine_thread_ack);
+ } else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
+ smdata.fn(smdata.data);
+ ran = 1;
+ smp_mb(); /* Must read state first. */
+ atomic_inc(&stopmachine_thread_ack);
}
/* Yield in first stage: migration threads need to
* help our sisters onto their CPUs. */
@@ -136,12 +150,10 @@ static void restart_machine(void)
preempt_enable_no_resched();
}
-struct stop_machine_data
+static void run_other_cpus(void)
{
- int (*fn)(void *);
- void *data;
- struct completion done;
-};
+ stopmachine_set_state(STOPMACHINE_RUN);
+}
static int do_stop(void *_smdata)
{
@@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
ret = stop_machine();
if (ret == 0) {
ret = smdata->fn(smdata->data);
+ if (smdata->run_all)
+ run_other_cpus();
restart_machine();
}
@@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
unsigned int cpu)
{
- struct stop_machine_data smdata;
struct task_struct *p;
+ down(&stopmachine_mutex);
smdata.fn = fn;
smdata.data = data;
+ smdata.run_all = (cpu == RUN_ALL) ? 1 : 0;
init_completion(&smdata.done);
-
- down(&stopmachine_mutex);
-
+ smp_wmb();
/* If they don't care which CPU fn runs on, bind to any online one. */
- if (cpu == NR_CPUS)
+ if (cpu == NR_CPUS || cpu == RUN_ALL)
cpu = raw_smp_processor_id();
p = kthread_create(do_stop, &smdata, "kstopmachine");
4)
diff --git a/kernel/immediate.c b/kernel/immediate.c
index 4c36a89..39ec13e 100644
--- a/kernel/immediate.c
+++ b/kernel/immediate.c
@@ -20,6 +20,7 @@
#include <linux/immediate.h>
#include <linux/memory.h>
#include <linux/cpu.h>
+#include <linux/stop_machine.h>
#include <asm/cacheflush.h>
@@ -30,6 +31,23 @@ static int imv_early_boot_complete;
extern const struct __imv __start___imv[];
extern const struct __imv __stop___imv[];
+int started;
+
+int stop_machine_imv_update(void *imv_ptr)
+{
+ struct __imv *imv = imv_ptr;
+
+ if (!started) {
+ text_poke((void *)imv->imv, (void *)imv->var, imv->size);
+ started = 1;
+ smp_wmb();
+ } else
+ sync_core();
+
+ flush_icache_range(imv->imv, imv->imv + imv->size);
+
+ return 0;
+}
/*
* imv_mutex nests inside module_mutex. imv_mutex protects builtin
@@ -37,38 +55,6 @@ extern const struct __imv __stop___imv[];
*/
static DEFINE_MUTEX(imv_mutex);
-static atomic_t wait_sync;
-
-struct ipi_loop_data {
- long value;
- const struct __imv *imv;
-} loop_data;
-
-static void ipi_busy_loop(void *arg)
-{
- unsigned long flags;
-
- local_irq_save(flags);
- atomic_dec(&wait_sync);
- do {
- /* Make sure the wait_sync gets re-read */
- smp_mb();
- } while (atomic_read(&wait_sync) > loop_data.value);
- atomic_dec(&wait_sync);
- do {
- /* Make sure the wait_sync gets re-read */
- smp_mb();
- } while (atomic_read(&wait_sync) > 0);
- /*
- * Issuing a synchronizing instruction must be done on each CPU before
- * reenabling interrupts after modifying an instruction. Required by
- * Intel's errata.
- */
- sync_core();
- flush_icache_range(loop_data.imv->imv,
- loop_data.imv->imv + loop_data.imv->size);
- local_irq_restore(flags);
-}
/**
* apply_imv_update - update one immediate value
@@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
*/
static int apply_imv_update(const struct __imv *imv)
{
- unsigned long flags;
- long online_cpus;
-
/*
* If the variable and the instruction have the same value, there is
* nothing to do.
@@ -111,30 +94,8 @@ static int apply_imv_update(const struct __imv *imv)
if (imv_early_boot_complete) {
kernel_text_lock();
- get_online_cpus();
- online_cpus = num_online_cpus();
- atomic_set(&wait_sync, 2 * online_cpus);
- loop_data.value = online_cpus;
- loop_data.imv = imv;
- smp_call_function(ipi_busy_loop, NULL, 1, 0);
- local_irq_save(flags);
- atomic_dec(&wait_sync);
- do {
- /* Make sure the wait_sync gets re-read */
- smp_mb();
- } while (atomic_read(&wait_sync) > online_cpus);
- text_poke((void *)imv->imv, (void *)imv->var,
- imv->size);
- /*
- * Make sure the modified instruction is seen by all CPUs before
- * we continue (visible to other CPUs and local interrupts).
- */
- wmb();
- atomic_dec(&wait_sync);
- flush_icache_range(imv->imv,
- imv->imv + imv->size);
- local_irq_restore(flags);
- put_online_cpus();
+ started = 0;
+ stop_machine_run(stop_machine_imv_update, (void *)imv, RUN_ALL);
kernel_text_unlock();
} else
text_poke_early((void *)imv->imv, (void *)imv->var,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists