[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tudqwegy.fsf@stepbren-lnx.us.oracle.com>
Date: Wed, 26 Jan 2022 10:15:25 -0800
From: Stephen Brennan <stephen.s.brennan@...cle.com>
To: Sergey Senozhatsky <senozhatsky@...omium.org>,
John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Petr Mladek <pmladek@...e.com>,
Steven Rostedt <rostedt@...dmis.org>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] printk: disable optimistic spin during panic
Sergey Senozhatsky <senozhatsky@...omium.org> writes:
> On (22/01/26 10:51), John Ogness wrote:
>> > Is there something that prevents panic CPU from NMI hlt CPU which is
>> > in console_trylock() under raw_spin_lock_irqsave()?
>> >
>> > CPU0 CPU1
>> > console_trylock_spinnning()
>> > console_trylock()
>> > down_trylock()
>> > raw_spin_lock_irqsave(&sem->lock)
>> >
>> > panic()
>> > crash_smp_send_stop()
>> > NMI -> HALT
>>
>> This is a good point. I wonder if console_flush_on_panic() should
>> perform a sema_init() before it does console_trylock().
>
> A long time ago there was zap_locks() function in printk, that used
> to re-init console semaphore and logbuf spin_lock, but _only_ in case
> of printk recursion (which was never reliable)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/printk/printk.c?h=v4.9.297#n1557
>
> This has been superseded by printk_safe per-CPU buffers so we removed
> that function.
>
> So it could be that may be we want to introduce something similar to
> zap_locks() again.
>
> All reasonable serial consoles drivers should take oops_in_progress into
> consideration in ->write(), so we probably don't care for console_drivers
> spinlocks, etc. but potentially can do a bit better on the printk side.
I see the concern here. If a CPU is halted while holding
console_sem.lock spinlock, then the very next printk would hang, since
each vprintk_emit() does a trylock.
Now in my thousands of iterations of tests, I haven't been lucky enough
to interrupt a CPU in the middle of this critical section. The critical
section itself is incredibly short and so it's hard to do it. Not
impossible, I'd imagine.
We can't fix it in console_flush_on_panic(), because that is called much
later, after we've called the panic notifiers, which definitely
printk(). If we wanted to re-initialize the console_sem, we'd want it
done earlier in panic(), directly after the NMI was sent.
My understanding was that we can't be too cautious regarding the console
drivers. Sure, they _shouldn't_ have any race conditions, but once we're
in panic we're better off avoiding the console drivers unless it's our
last choice. So, is it worth re-initializing the console_sem early in
panic, which forces all the subsequent printk to go out to the consoles?
I don't know.
One alternative is to do __printk_safe_enter() at the beginning of
panic. This effectively guarantees that no printk will hit the console
drivers or even attempt to grab the console_sem. Then, we can do the
kmsg_dump, do a crash_kexec if configured, and only when all options
have been exhausted would we reinitialize the console_sem and flush to
the console. Maybe this is too cautious, but it is an alternative.
Stephen
Powered by blists - more mailing lists