[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090223090735.GH9582@elte.hu>
Date: Mon, 23 Feb 2009 10:07:35 +0100
From: Ingo Molnar <mingo@...e.hu>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Vegard Nossum <vegard.nossum@...il.com>, stable@...nel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Nick Piggin <npiggin@...e.de>,
Pekka Enberg <penberg@...helsinki.fi>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: fix lazy vmap purging (use-after-free error)
* Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
> On Sat, Feb 21, 2009 at 07:00:30PM -0800, Paul E. McKenney wrote:
> > On Sat, Feb 21, 2009 at 07:37:20PM +0100, Vegard Nossum wrote:
> > > 2009/2/21 Vegard Nossum <vegard.nossum@...il.com>:
>
> [ . . . ]
>
> > > Okay, I don't really think it's an error. The if (user) test happens
> > > at the very beginning and gcc decides to reuse %edx. GDB doesn't know
> > > this, so it thinks the parameter changed, but at this point the
> > > parameter simply won't be used anymore.
> > >
> > > So you're right: The value can't be trusted (after entry, anyway).
> >
> > OK. So at least the compiler is sane. ;-)
> >
> > And the fact that RCU Classic behaves the same as hierarchical RCU
> > pretty clearly points at some issue with the quiescent-state check code:
> >
> > void rcu_check_callbacks(int cpu, int user)
> > {
> > if (user ||
> > (idle_cpu(cpu) && !in_softirq() &&
> > hardirq_count() <= (1 << HARDIRQ_SHIFT))) {
> > rcu_qsctr_inc(cpu);
> > rcu_bh_qsctr_inc(cpu);
> > } else if (!in_softirq()) {
> > rcu_bh_qsctr_inc(cpu);
> > }
> > raise_softirq(RCU_SOFTIRQ);
> > }
> >
> > In the case you traced earlier, we interrupted out of kernel code, yet
> > somehow arrived at rcu_qsctr_inc(). We know that "user" really was 0,
> > thanks to your careful analysis, so the issue must be in the other
> > clause. Since we interrupted out of mainline kernel code, in_softirq()
> > should have returned 0, and hardirq_count() should also have met the
> > above condition.
> >
> > You mentioned some concern about idle_cpu() separately, and if idle_cpu()
> > was returning 1, then RCU would most certainly decide that it was in a
> > quiescent state and that it could end the current grace period.
>
> Hello, Vegard,
>
> Could you please try out the following patch? I am not 100%
> confident of it on non-x86 architectures, nor during the time
> that non-boot CPUs start up (though this patch should not
> break non-boot CPUs any more than they might already be
> broken).
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> The boot CPU runs in the context of its idle thread during
> boot-up. During this time, idle_cpu(0) will always return
> nonzero, which will fool Classic and Hierarchical RCU into
> deciding that a large chunk of the boot-up sequence is a big
> long quiescent state. This in turn causes RCU to prematurely
> end grace periods during this time.
ah, that makes a lot of sense and explains it all! What a nasty
little bug we had all along ...
> This patch creates a new global variable that is set to 1 just
> before the boot CPU first enters the scheduler, after which
> the idle task really is idle.
>
> Located-by: Vegard Nossum <vegard.nossum@...il.com>
> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Please also add kmemcheck to the changelog while at it ;-)
> ---
>
> init/main.c | 3 +++
> kernel/rcuclassic.c | 4 +++-
> kernel/rcutree.c | 4 +++-
> 3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/init/main.c b/init/main.c
> index 8442094..51f4b71 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -121,6 +121,8 @@ static char *static_command_line;
> static char *execute_command;
> static char *ramdisk_execute_command;
>
> +int idle_task_is_really_idle; /* set to 1 late in boot. */
> +
> #ifdef CONFIG_SMP
> /* Setup configured maximum number of CPUs to activate */
> unsigned int __initdata setup_max_cpus = NR_CPUS;
> @@ -463,6 +465,7 @@ static noinline void __init_refok rest_init(void)
> * at least once to get things moving:
> */
> init_idle_bootup_task(current);
> + idle_task_is_really_idle = 1;
> preempt_enable_no_resched();
> schedule();
> preempt_disable();
Could you please use system_state instead? We could insert a new
stage - or just use SYSTEM_RUNNING as the trigger.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists