lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 8 Mar 2022 17:08:36 +0100 From: Petr Mladek <pmladek@...e.com> To: John Ogness <john.ogness@...utronix.de> Cc: Sergey Senozhatsky <senozhatsky@...omium.org>, Steven Rostedt <rostedt@...dmis.org>, Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org, Greg Kroah-Hartman <gregkh@...uxfoundation.org> Subject: Re: two locations: was: Re: [PATCH printk v1 03/13] printk: use percpu flag instead of cpu_online() On Wed 2022-03-02 15:55:23, John Ogness wrote: > On 2022-02-16, Petr Mladek <pmladek@...e.com> wrote: > >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > >> index d1b773823d63..b346e60e9e51 100644 > >> --- a/kernel/printk/printk.c > >> +++ b/kernel/printk/printk.c > >> @@ -2577,11 +2577,11 @@ static int have_callable_console(void) > >> * > >> * Console drivers may assume that per-cpu resources have been allocated. So > >> * unless they're explicitly marked as being able to cope (CON_ANYTIME) don't > >> - * call them until this CPU is officially up. > >> + * call them until per-cpu resources have been allocated. > >> */ > >> static inline int can_use_console(void) > >> { > >> - return cpu_online(raw_smp_processor_id()) || have_callable_console(); > >> + return (printk_percpu_data_ready() || have_callable_console()); > >> } > > > > cpu_online(raw_smp_processor_id()) check is used also in > > call_console_drivers(). The same logic should be used in both > > locations. > > > > I found this when reviewing 6th patch that replaced both checks > > with a single one. > > > > Note that I am still not sure if this change is correct at all. > > It will allow to always call the console during CPU hotplug > > and I am not sure if it is safe. IMHO, it might cause problems when > > a console driver uses, for example, CPU-bound workqueues. > > You are correct. We must take hotplug into account for !CON_ANYTIME > consoles. There may be some hotplug callbacks that make memory > unavailable for the console. > > However, I will add the use of printk_percpu_data_ready() in the > check. !CON_ANYTIME consoles also should not be called until the per-cpu > areas are ready. For example, it would be bad if a console queued > irq_work before per-cpu areas are setup (cpu_online() is true during > this time). > > One of my main concerns was that raw_smp_processor_id() was used for the > check. It is conceptually wrong to exclude certain consoles based on a > current CPU when migration is still enabled. I understand that the use > of can_use_console() is an optimization to avoid doing extra work where > there are no consoles available. But the task could be preemptible there > and _conceptually_, could get moved to another CPU before its write() > callback is called. The cpu_online() check belongs in code where > preemption is disabled. > > If the context is preemptible, I do not think it will ever see > !cpu_online(). So I think if the cpu_online() check is limited to > unlocking when console_trylock() was used, it will be correct. I investigated the cpu hotplug code and found the following: 1. In the cpu_down() code path, @cpu_online mask is cleared by this call chain: + take_cpu_down() + __cpu_disable() + smp_ops.cpu_disable() + native_cpu_disable() # x86 + cpu_disable_common() + remove_cpu_from_maps() + set_cpu_online(cpu, false) , where take_cpu_down() is called via: + .teardown.single() calback for CPUHP_TEARDOWN_CPU state + takedown_cpu() + stop_machine_cpuslocked() + stop_cpus() + __stop_cpus() + queue_stop_cpus_work() + cpu_stop_queue_work() , which queues the work in cpu_stopper thread that is bound to the CPU: + cpu_stop_init() + smpboot_register_percpu_thread() + __smpboot_create_thread() + kthread_create_on_cpu() Summary: @cpu_online mask is cleared on the affected CPU in cpu_stopper thread that is bound to the same CPU. It happens when handling CPUHP_TEARDOWN_CPU. 2. The CPU hotplug states are split into three groups: + code running on control CPU (another CPU) + low level code running on the hotplugged CPU + code running in the hotplug thread on the hotplugged CPU It is described in include/linux/cpuhotplug.h: /* PREPARE section invoked on a control CPU */ CPUHP_OFFLINE = 0, [...] /* * STARTING section invoked on the hotplugged CPU in low level * bringup and teardown code. */ CPUHP_AP_IDLE_DEAD, [...] CPUHP_AP_ONLINE, CPUHP_TEARDOWN_CPU, /* Online section invoked on the hotplugged CPU from the hotplug thread */ CPUHP_AP_ONLINE_IDLE, CPUHP_AP_SCHED_WAIT_EMPTY, [...] CPUHP_ONLINE, , where sched_cpu_wait_empty() is the .teardown.single callback for CPUHP_AP_SCHED_WAIT_EMPTY. After this callback, another tasks should not be scheduled on this CPU. Any attempt should be catched and handled by sched_cpu_dying(). Note that CPUHP_AP_SCHED_WAIT_EMPTY is called before CPUHP_TEARDOWN_CPU when the CPU goes down. Summary: @cpu_only mask is cleared for the CPU when other tasks could not longer be sheduled there. Result: cpu_online(raw_smp_processor_id()) should be safe for our purpose. It will return false only the task could not longer migrate from the CPU. I have to admit that it is far from obvious and tricky like hell. OK, cpu_online(raw_smp_processor_id()) check is not racy. Another question is whether it is a good check whether the consoles are usable or not. I found the following: 1. My understanding is that affinity of IRQs is disabled right after clearing @cpu_online mask: void cpu_disable_common(void) { [...] remove_cpu_from_maps(cpu); [...] fixup_irqs(); [...] } 2. Timers must not be used close after clearing @cpu_online mask, see see include/linux/cpuhotplug.h: /* Must be the last timer callback */ CPUHP_AP_DUMMY_TIMER_STARTING, [...] CPUHP_AP_ONLINE, CPUHP_TEARDOWN_CPU, Result: From my POV, cpu_online(raw_smp_processor_id()) marks reasonable place in the CPU hotplug code when the conosles start/stop being usable. But again, it is far from obvious and tricky like hell. Summary: We need to keep cpu_online(raw_smp_processor_id()) check to make the consoles safe during CPU hotplug. IMHO, it is not about per-CPU variables. It is more about timers, interrupts. The hotplugged CPU is not ready call console code at these early hotplug stages. > Regardless, my v2 will keep cpu_online() checks since they are necessary > for hotplug support. Yes, please. We should also somehow document this. But it can be done separately. It is not necessarily in the scope of your patchset. Best Regards, Petr
Powered by blists - more mailing lists