[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yid/hLD0hMl7kpan@alley>
Date: Tue, 8 Mar 2022 17:08:36 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: two locations: was: Re: [PATCH printk v1 03/13] printk: use
percpu flag instead of cpu_online()
On Wed 2022-03-02 15:55:23, John Ogness wrote:
> On 2022-02-16, Petr Mladek <pmladek@...e.com> wrote:
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index d1b773823d63..b346e60e9e51 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2577,11 +2577,11 @@ static int have_callable_console(void)
> >> *
> >> * Console drivers may assume that per-cpu resources have been allocated. So
> >> * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't
> >> - * call them until this CPU is officially up.
> >> + * call them until per-cpu resources have been allocated.
> >> */
> >> static inline int can_use_console(void)
> >> {
> >> - return cpu_online(raw_smp_processor_id()) || have_callable_console();
> >> + return (printk_percpu_data_ready() || have_callable_console());
> >> }
> >
> > cpu_online(raw_smp_processor_id()) check is used also in
> > call_console_drivers(). The same logic should be used in both
> > locations.
> >
> > I found this when reviewing 6th patch that replaced both checks
> > with a single one.
> >
> > Note that I am still not sure if this change is correct at all.
> > It will allow to always call the console during CPU hotplug
> > and I am not sure if it is safe. IMHO, it might cause problems when
> > a console driver uses, for example, CPU-bound workqueues.
>
> You are correct. We must take hotplug into account for !CON_ANYTIME
> consoles. There may be some hotplug callbacks that make memory
> unavailable for the console.
>
> However, I will add the use of printk_percpu_data_ready() in the
> check. !CON_ANYTIME consoles also should not be called until the per-cpu
> areas are ready. For example, it would be bad if a console queued
> irq_work before per-cpu areas are setup (cpu_online() is true during
> this time).
>
> One of my main concerns was that raw_smp_processor_id() was used for the
> check. It is conceptually wrong to exclude certain consoles based on a
> current CPU when migration is still enabled. I understand that the use
> of can_use_console() is an optimization to avoid doing extra work where
> there are no consoles available. But the task could be preemptible there
> and _conceptually_, could get moved to another CPU before its write()
> callback is called. The cpu_online() check belongs in code where
> preemption is disabled.
>
> If the context is preemptible, I do not think it will ever see
> !cpu_online(). So I think if the cpu_online() check is limited to
> unlocking when console_trylock() was used, it will be correct.
I investigated the cpu hotplug code and found the following:
1. In the cpu_down() code path, @cpu_online mask is cleared
by this call chain:
+ take_cpu_down()
+ __cpu_disable()
+ smp_ops.cpu_disable()
+ native_cpu_disable() # x86
+ cpu_disable_common()
+ remove_cpu_from_maps()
+ set_cpu_online(cpu, false)
, where take_cpu_down() is called via:
+ .teardown.single() calback for CPUHP_TEARDOWN_CPU state
+ takedown_cpu()
+ stop_machine_cpuslocked()
+ stop_cpus()
+ __stop_cpus()
+ queue_stop_cpus_work()
+ cpu_stop_queue_work()
, which queues the work in cpu_stopper thread that is bound
to the CPU:
+ cpu_stop_init()
+ smpboot_register_percpu_thread()
+ __smpboot_create_thread()
+ kthread_create_on_cpu()
Summary: @cpu_online mask is cleared on the affected CPU in
cpu_stopper thread that is bound to the same CPU.
It happens when handling CPUHP_TEARDOWN_CPU.
2. The CPU hotplug states are split into three groups:
+ code running on control CPU (another CPU)
+ low level code running on the hotplugged CPU
+ code running in the hotplug thread on the hotplugged CPU
It is described in include/linux/cpuhotplug.h:
/* PREPARE section invoked on a control CPU */
CPUHP_OFFLINE = 0,
[...]
/*
* STARTING section invoked on the hotplugged CPU in low level
* bringup and teardown code.
*/
CPUHP_AP_IDLE_DEAD,
[...]
CPUHP_AP_ONLINE,
CPUHP_TEARDOWN_CPU,
/* Online section invoked on the hotplugged CPU from the hotplug thread */
CPUHP_AP_ONLINE_IDLE,
CPUHP_AP_SCHED_WAIT_EMPTY,
[...]
CPUHP_ONLINE,
, where sched_cpu_wait_empty() is the .teardown.single callback for
CPUHP_AP_SCHED_WAIT_EMPTY. After this callback, another tasks
should not be scheduled on this CPU. Any attempt should be
catched and handled by sched_cpu_dying().
Note that CPUHP_AP_SCHED_WAIT_EMPTY is called before
CPUHP_TEARDOWN_CPU when the CPU goes down.
Summary: @cpu_only mask is cleared for the CPU when other tasks
could not longer be sheduled there.
Result: cpu_online(raw_smp_processor_id()) should be safe for our
purpose. It will return false only the task could not longer
migrate from the CPU.
I have to admit that it is far from obvious and tricky like
hell.
OK, cpu_online(raw_smp_processor_id()) check is not racy. Another
question is whether it is a good check whether the consoles are usable
or not.
I found the following:
1. My understanding is that affinity of IRQs is disabled right after
clearing @cpu_online mask:
void cpu_disable_common(void)
{
[...]
remove_cpu_from_maps(cpu);
[...]
fixup_irqs();
[...]
}
2. Timers must not be used close after clearing @cpu_online mask, see
see include/linux/cpuhotplug.h:
/* Must be the last timer callback */
CPUHP_AP_DUMMY_TIMER_STARTING,
[...]
CPUHP_AP_ONLINE,
CPUHP_TEARDOWN_CPU,
Result: From my POV, cpu_online(raw_smp_processor_id()) marks
reasonable place in the CPU hotplug code when the conosles
start/stop being usable.
But again, it is far from obvious and tricky like hell.
Summary: We need to keep cpu_online(raw_smp_processor_id()) check
to make the consoles safe during CPU hotplug.
IMHO, it is not about per-CPU variables. It is more
about timers, interrupts. The hotplugged CPU is not
ready call console code at these early hotplug stages.
> Regardless, my v2 will keep cpu_online() checks since they are necessary
> for hotplug support.
Yes, please. We should also somehow document this. But it can be done
separately. It is not necessarily in the scope of your patchset.
Best Regards,
Petr
Powered by blists - more mailing lists