[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFTL4hxD8kjHDL04RACCqEZ829-O4VutSgbGsNXiHT=-NAHpWQ@mail.gmail.com>
Date: Thu, 15 Nov 2012 19:08:16 +0100
From: Frederic Weisbecker <fweisbec@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Paul Gortmaker <paul.gortmaker@...driver.com>
Subject: Re: [PATCH RFC] irq_work: Flush work on CPU_DYING (was: Re: [PATCH
7/7] printk: Wake up klogd using irq_work)
2012/11/15 Steven Rostedt <rostedt@...dmis.org>:
> On Thu, 2012-11-15 at 16:25 +0100, Frederic Weisbecker wrote:
>> 2012/11/15 Steven Rostedt <rostedt@...dmis.org>:
>> > On Wed, 2012-11-14 at 21:37 +0100, Frederic Weisbecker wrote:
>> >> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> >> index f249e8c..822d757 100644
>> >> --- a/kernel/time/tick-sched.c
>> >> +++ b/kernel/time/tick-sched.c
>> >> @@ -289,7 +289,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>> >> time_delta = timekeeping_max_deferment();
>> >> } while (read_seqretry(&xtime_lock, seq));
>> >>
>> >> - if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) || printk_needs_cpu(cpu) ||
>> >> + if (rcu_needs_cpu(cpu, &rcu_delta_jiffies) ||
>> >
>> > If the CPU is going offline, the printk_tick() would be executed here.
>> > But now that printk_tick() is done with the irq_work code, it wont be
>> > executed till the next tick. Could this cause a missed printk because
>> > of this, if the cpu is going offline?
>> >
>> > Actually, how does irq_work in general handle cpu offline work?
>>
>> Good point, and that's not trivial to solve.
>>
>> The hotplug down sequence does:
>>
>> ----->
>> CPU that offilines CPU offlining
>> -----------------
>> ---------------------
>> cpu_down() {
>> __stop_machine(take_cpu_down)
>>
>> take_cpu_down() {
>>
>> __cpu_disable() {
>>
>> * disable irqs in hw
>>
>> * clear from online mask
>> }
>>
>> move all tasks somewhere
>> }
>> while (!idle_cpu(offlining))
>> cpu_relax()
>>
>> cpu_die();
>> <---------
>>
>> So the offlining CPU goes to idle in the end once irqs are disabled in
>> the apic level. Does that include the timer tick? If so then the last
>> resort to offline without irq works in the queue is to make
>> take_cpu_down() ask for a retry if there are pending irq works during
>> its execution.
>>
>> Now if we have printk() calls between __cpu_disable() and the idle
>> loop, they will be lost until the next onlining. Unless we do an
>> explicit call to printk_tick() from the idle loop if the CPU is
>> offline.
>>
>> Note that !CONFIG_NO_HZ doesn't seem to handle that. Which makes me
>> wonder if the tick is really part of the whole IRQ disablement done in
>> __cpu_disable().
>
>
> How about flushing all irq_work from CPU_DYING. The notifier is called
> by stop_machine on the CPU that is going down. Grant you, the code will
> not be called from irq context (so things like get_irq_regs() wont work)
> but I'm not sure what the requirements are for irq_work in that regard
> (Peter?). But irqs are disabled and the CPU is about to go offline.
> Might as well flush the work.
>
> I ran this against my stress_cpu_hotplug script (attached) and it seemed
> to work fine. I even did a:
>
> perf record ./stress-cpu-hotplug
>
> Signed-off-by: Steven Rostedt <rostedt@...dmis.org>
>
> Index: linux-rt.git/kernel/irq_work.c
> ===================================================================
> --- linux-rt.git.orig/kernel/irq_work.c
> +++ linux-rt.git/kernel/irq_work.c
> @@ -14,6 +14,7 @@
> #include <linux/irqflags.h>
> #include <linux/sched.h>
> #include <linux/tick.h>
> +#include <linux/cpu.h>
> #include <asm/processor.h>
>
>
> @@ -105,11 +106,7 @@ bool irq_work_needs_cpu(void)
> return true;
> }
>
> -/*
> - * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> - * context with local IRQs disabled.
> - */
> -void irq_work_run(void)
> +static void __irq_work_run(void)
> {
> unsigned long flags;
> struct irq_work *work;
> @@ -128,7 +125,6 @@ void irq_work_run(void)
> if (llist_empty(this_list))
> return;
>
> - BUG_ON(!in_irq());
> BUG_ON(!irqs_disabled());
>
> llnode = llist_del_all(this_list);
> @@ -155,8 +151,23 @@ void irq_work_run(void)
> (void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
> }
> }
> +
> +/*
> + * Run the irq_work entries on this cpu. Requires to be ran from hardirq
> + * context with local IRQs disabled.
> + */
> +void irq_work_run(void)
> +{
> + BUG_ON(!in_irq());
> + __irq_work_run();
> +}
> EXPORT_SYMBOL_GPL(irq_work_run);
>
> +static void irq_work_run_cpu_down(void)
> +{
> + __irq_work_run();
> +}
> +
> /*
> * Synchronize against the irq_work @entry, ensures the entry is not
> * currently in use.
> @@ -169,3 +180,35 @@ void irq_work_sync(struct irq_work *work
> cpu_relax();
> }
> EXPORT_SYMBOL_GPL(irq_work_sync);
> +
> +#ifdef CONFIG_HOTPLUG_CPU
> +static int irq_work_cpu_notify(struct notifier_block *self,
> + unsigned long action, void *hcpu)
> +{
> + long cpu = (long)hcpu;
> +
> + switch (action) {
> + case CPU_DYING:
Looks good. Perf has already deactivated the cpu wide events on
CPU_DOWN_PREPARE. I suspect it's the only irq work enqueuer from NMI.
At this stage of cpu down hotplug, irqs are deactivated so the last
possible enqueuers before the CPU goes idle/down are from subsequent
CPU_DYING notifiers or the hotplug code.
There may be some printk() there but we never provided the guarantee
that klogd would be woken up on that thin window. !CONFIG_NO_HZ does
not call printk_needs_cpu() that takes cares of offlining. Its purpose
was actually to avoid a hang in old code anyway.
I'm adding this patch to my queue.
Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists